# Unigram Markov Chain Model

You will build a Markov chain for the artist whose lyrics you scraped in Part 1. To do this, you have to go through the lyrics and learn the word transitions for that artist. You will store this information in a dict called `chain`, which maps each word to a list of words that appear after it in the training data. So for example, one entry of this dict might be

```
chain = {
    "it": ["is", "runs", "is", "is", "was", "is", "was"],
    ...
}
```

You should include a few additional states, besides words, in your Markov chain. You should have `"<START>"` and `"<END>"` states so that we can keep track of what words songs are likely to begin and end on. You should also include a state called `"<N>"` to denote line breaks so that you can keep track of where lines begin and end. 

So if the song starts with the line "Is this the real life?" and ends with the line "Nothing really matters to me.", you would have a dictionary that looks like
```
chain = {
    "<START>": ["Is", ...],
    "Is": ["this", ...],
    "this": ["the", ...],
    "the": ["real", ...],
    "real": ["life?", ...],
    "life?": ["<N>", ...],
    "<N>": ["Nothing", ...],
    "Nothing": ["really", ...],
    "really": ["matters", ...],
    "matters": ["to", ...],
    "to": ["me", ...],
    "me.": ["<END>", ...],
    ...
}
```


In [14]:
def train_markov_chain(lyrics):
    """
    Args:
      - lyrics: a list of strings, where each string represents
                the lyrics of one song by an artist.
    
    Returns:
      A dict that maps a single word ("unigram") to a list of
      words that follow that word, representing the Markov
      chain trained on the lyrics.
    """
    chain = {"<START>": [],
             "<N>": []}
    for lyric in lyrics:
        words_dirty = lyric.split(" ")
        words = []
  
        for word in words_dirty:
            if '\n' in word: 
                new_words = word.split('\n')
                words.append(new_words[0])
                words.append('<N>')
                words.append(new_words[1])
            else:
                words.append(word)
                
        chain["<START>"].append(words[0])

        
        for i in range(0,len(words)):
            word = words[i]
            if i == len(words) -1:
                next_word = "<END>"
            else:
                next_word = words[i+1]
            
     

            if word in chain:        
                chain[word].append(next_word)
            else:
                chain[word] = [next_word]
                
        
    return chain

In [18]:
# Load the pickled lyrics object that you created in Part 1.
import pickle
lyrics = pickle.load(open("lyrics.pkl", "rb"))

# Call the function you wrote above.
chain = train_markov_chain(lyrics)
print(chain)
# What words tend to start a song (i.e., what words follow the <START> tag?)
print(chain["<START>"][:20])

# What words tend to begin a line (i.e., what words follow the line break tag?)
print(chain["<N>"][:20])

{'': ['<END>', 'We', 'We', 'We', '<END>', '<END>', '<END>', '<END>'], "sippin'": ['on'], 'drastically': ['<N>'], 'done': ['latch', 'had', '<N>', 'saw'], 'naw': ['<N>'], "Thuggin',": ["hustlin'"], 'views': ['<N>'], 'Belly': ['behind'], 'Bushwick': ['Billy'], '(who?)': ['<N>'], 'ill': ['never'], 'mood': ['to', 'to'], 'rush': ['hour', '<N>'], 'in': ['Idaho', 'my', '<N>', 'cycles', 'cycles,', 'traffic,', 'the', 'my', 'the', 'my', 'the', 'my', 'the', 'the', '<N>', 'Idaho', 'my', 'my', 'the', 'the', 'my', 'my', 'school', 'the', 'the', 'you', 'it', 'the', 'the', 'the', 'her', 'the', 'my', 'my', 'my', 'places', 'and', 'my', 'my', 'just', 'the', 'the', "Houston's", "daddy's", "daddy's", "daddy's", 'awe', 'Shibuya', 'the', 'a', 'the', 'the', 'the', 'the', 'the', 'the', 'the', 'the', 'the', 'public', 'the', 'hell', 'the', 'hell', 'Colorado', 'hell', 'the', 'hell', 'my', 'my', 'this', 'the', 'this', 'your', 'your', '<N>', 'Idaho', 'my', 'my', 'some', 'the', 'line', "'em", 'your', 'me', 'the', 'the

Now, let's generate new lyrics using the Markov chain you constructed above. To do this, we'll begin at the `"<START>"` state and randomly sample a word from the list of first words. Then, we'll randomly sample each next word from the list of words that appeared after the current word in the training data. We will continue this until we reach the `"<END>"` state. This will give us the complete lyrics of a randomly generated song!

You may find the `random.choice()` function helpful for this question.

In [19]:
import random

def generate_new_lyrics(chain):
    """
    Args:
      - chain: a dict representing the Markov chain,
               such as one generated by generate_new_lyrics()
    
    Returns:
      A string representing the randomly generated song.
    """
    
    # a list for storing the generated words
    words = []
    # generate the first word
    curr_word = random.choice(chain["<START>"])
    words.append(curr_word)
    while(curr_word != "<END>"):
        next_word = random.choice(chain[curr_word])
        words.append(next_word)
        curr_word = next_word
    
    # YOUR CODE HERE
    
    
    # join the words together into a string with line breaks
    lyrics = " ".join(words[:-1])
    return "\n".join(lyrics.split("<N>"))

In [21]:
print(generate_new_lyrics(chain))

I got me 
 But if you and more vivid when I'm sorry when I feel more she wrote 
 When I'm underworld 
 Gorgeous, baby (oh oh myyy greedy loveee x2 
 You're smokin' stones in the lake 
 I was your with me 
 Pretty girls can huddle then 
 It's hell just say that boy they always I better than most of the truth now 
 I can watch my bank account 
 Fell asleep in love with my fingertips and my head, yeah, oh sweet Queen Betty 
 Working at home 
 Solo, solo I'm awake I'm still blinded of a train, lost and swim good, fuck you not think so familiar 
 I feel ashamed 
 I'd rather chip my room before I get a pistol 
 Kick off the park, or in LA 
 And when i want to a job since you switch the way, fuck three 
 All this time we have to live it


### Grader's Comments

- 
- 

[This question is worth 20 points.]

In [None]:
# This cell should only be modified only by a grader.
scores = [None]