# Bigram Markov Chain Model

Now you'll build a more complex Markov chain that uses the last _two_ words (or bigram) to predict the next word. Now your dict `chain` should map a _tuple_ of words to a list of words that appear after it. So for example, one entry of this dict might be

```
chain = {
    ("it", "is"): ["the", "the", "not", "a", "a", "not", "the"],
    ...
}
```

As before, you should also include tags that indicate the beginning and end of a song, as well as line breaks. That is, a tuple might contain tags like `"<START>"`, `"<END>"`, and `"<N>"`, in addition to regular words. So if the song starts with the line "Is this the real life?" and ends with the line "Nothing really matters to me.", you would have a dictionary that looks like
```
chain = {
    (None, "<START>"): ["Is", ...],
    ("<START>", "Is"): ["this", ...],
    ("Is", "this"): ["the", ...],
    ("this", "the"): ["real", ...],
    ("the", "real"): ["life?", ...],
    ("real", "life?"): ["<N>", ...],
    ("<N>", "Nothing"): ["really", ...],
    ("Nothing", "really"): ["matters", ...],
    ("really", "matters"): ["to", ...],
    ("matters", "to"): ["me.", ...],
    ("to", "me."): ["<END>", ...],
    ...
}
```

In [1]:
def train_markov_chain(lyrics):
    """
    Args:
      - lyrics: a list of strings, where each string represents
                the lyrics of one song by an artist.
    
    Returns:
      A dict that maps a tuple of 2 words ("bigram") to a list of
      words that follow that bigram, representing the Markov
      chain trained on the lyrics.
    """
    chain = {(None, "<START>"): [],
             (None, "<N>"): []}
    for lyric in lyrics:
        words_dirty = lyric.split(" ")
        words = []
  
        for word in words_dirty:
            if '\n' in word: 
                new_words = word.split('\n')
                words.append(new_words[0])
                words.append('<N>')
                words.append(new_words[1])
            else:
                words.append(word)
        
        
        if(words[0] == ''):
            continue
          
        chain[(None, "<START>")].append(words[0])

    
        for i in range(0,len(words)-1):
            if i == 0:
                first = "<START>"
                second = words[i]
                next_word = words[i+1]

                
                
                
                
                if (first,second) in chain:
                    chain[(first,second)].append(next_word)
                else:
                    chain[(first,second)] = [next_word]
                continue
            else:
                first = words[i-1]
                second = words[i]
            
    
            
            
            
            if i == len(words)-2:
                next_word = "<END>"
            else:
                next_word = words[i+1]

            if second == "<N>":
                chain[(None,"<N>")].append(next_word)
            
            pair = (first, second)
            
            if pair in chain:
                chain[pair].append(next_word)
            else:
                chain[pair] = [next_word]
 
            
            

    return chain

In [2]:
# Load the pickled lyrics object that you created in Part 1.
import pickle
lyrics = pickle.load(open("lyrics.pkl", "rb"))

# Call the function you wrote above.
chain = train_markov_chain(lyrics)
#print(chain)
# What words tend to start a song (i.e., what words follow the <START> tag?)
print(chain[(None, "<START>")][:])

# What words tend to begin a line (i.e., what words follow the line break tag?)
print(chain[(None, "<N>")][:20])

['A', 'Uh,', 'A', "I've", 'I', 'I', "That's", 'Set', 'Round', 'Too', 'And', 'My', 'Double', 'Hand', 'Taxi', 'Yeah,', 'Pool', 'A', 'My', 'I', 'Bad', 'These', 'Saturdays', 'I', 'I', 'I', 'My', 'When', 'Wiseman', 'The', 'She', 'Round', 'A', 'We', 'You', "It's", 'The', 'Turn', 'I', 'He', "River's", 'If', 'She', 'Oh,', 'I', 'Mosh', 'This', 'How', 'I', 'My', 'Fertilizer', 'That', 'Verse', 'i', 'I', 'Darker', 'Hand', 'Stop', '(I', 'When', 'Frank', "I'm", 'Look,', 'Could', 'Talk', 'Something', 'Golden', "I've", 'I', 'Lobster', 'I', 'When', 'I', 'Lost', 'Golden', 'Human', 'I', 'Could', 'And', 'Shout', 'Bitches', 'Bitches', 'Gold', 'When']
['Excuse', 'In', 'My', 'when', '(Ooh,', "I've", '(You', "I've", 'Do', 'Do', "'Cause", "'Cause", 'Enough', 'Got', 'Since', "That's", 'Got', "Thinkin'", '(Ooh,', "I've"]


Now, let's generate new lyrics using the Markov chain you constructed above. To do this, we'll begin at the `(None, "<START>")` state and randomly sample a word from the list of first words. Then, we'll randomly sample each next word from the list of words that appeared after the current word in the training data. We will continue this until we reach the `"<END>"` state. This will give us the complete lyrics of a randomly generated song!

In [3]:
import random

def generate_new_lyrics(chain):
    """
    Args:
      - chain: a dict representing the Markov chain,
               such as one generated by generate_new_lyrics()
    
    Returns:
      A string representing the randomly generated song.
    """
    
    # a list for storing the generated words
    words = []
    # generate the first word
    curr_word = random.choice(chain[(None, "<START>")])
    words.append(curr_word)

    p1 = "<START>"
    p2 = curr_word
    while(curr_word != "<END>"):
        next_word = random.choice(chain[(p1,p2)])
        words.append(next_word)
        p1 = p2
        p2 = next_word
        curr_word = next_word
        

    
    # join the words together into a string with line breaks
    lyrics = " ".join(words[:-1])
    return "\n".join(lyrics.split("<N>"))

In [4]:
print(generate_new_lyrics(chain))

I thought it weak 
 But boy you need a cold shower 
 You cunt I just wanna talk, and conversate 
 Cause the hardest thing to say it, 
 Other than how I do 
 I ain't on no sales floor 
 Away turf, no Astro 
 Mesmerized how the strobes glow 
 Oh yeah, oh yeah Throw team on back like I never ask advice from him 'cause what could he know? 
 Never make him love me 
 Despite the life You've had a Pilot Jones Tonight she came stumblin' across my lawn again 
 We'd drive to Syd's, had the X6 back then 
 Fell asleep in the heat of it all 
 Girl you know I won't be going backwards 
 [Hook] my golden girl 
 who's this other guy that sending you these flowers 
 oh now you wanna get her involved 
 Madly involved Hittin' stones in glass homes 
 You're wet and you're warm just like you 
 Breath till I evaporated 
 My crew saved your crew like niggas came through the valley 
 Freeway Despite our history, yeah 
 Sleeve rips off, I slip, I fall short of what a life 
 Sweet Mother Mary, sweet Father Josep

### Grader's Comments

- 
- 

[This question is worth 20 points.]

In [5]:
# This cell should only be modified only by a grader.
scores = [None]