# Song Lyrics Generator
## Due Tuesday, May 17 at 8 AM

In this lab, you will scrape the web to get lyrics from your favorite artist. Then, you will train a Markov Chain model on these lyrics. Finally, you will use your Markov chain to generate new (random) lyrics.

## Question 1: Web Scraping (40 points)

Find a website that has lyrics for all the songs by your favorite artists. Then scrape the lyrics into a Python list called `lyrics`, where each element of the list represents the lyrics of one song.

Tips:
- Find a webpage that has links to all the songs, like [this one](http://www.azlyrics.com/n/nirvana.html). [NOTE: It seems like azlyrics.com does not allow you to scrape their webpages, so you'll have to find another source.] Then, you can write code to visit all the links and scrape each page one by one.
- Make sure you use time.sleep(0.1) to stagger your requests so that you do not get banned by the website for making too many requests.

In [1]:
import requests
import time
from bs4 import BeautifulSoup
import bs4
import re
lyrics = []

indexpage = requests.get("http://www.allthelyrics.com/lyrics/prince")
indexsource = BeautifulSoup(indexpage.text, "html.parser")
links = indexsource.find_all("div")[8].find_all("a")
time.sleep(0.1)
for i in range(42,142):
    link = "http://www.allthelyrics.com" + links[i]["href"]
    lyric_page = requests.get(link)
    lyric_source = BeautifulSoup(lyric_page.text, "html.parser")
    lyric_string = str(lyric_source.findAll("div", { "class" : "content-text-inner" })[0])
    lyric_string = re.sub("\[(\s|\S)*?\]", "", lyric_string)
    lyric_string = re.sub("x[0-9]", "", lyric_string)
    lyric_string = re.sub("<(\s|\S)*?>", "", lyric_string)
    lyric_string = re.sub("\((\s|\S)*?\)", "", lyric_string)
    lyric_string = re.sub("\{(\s|\S)*?\}", "", lyric_string)
    lyric_string = re.sub("chorus|Chorus|CHORUS", "", lyric_string)
    lyric_string = re.sub(r'[\t\ ]+', ' ', lyric_string)
    lyric_string = lyric_string.strip()
    lyrics.append(lyric_string)
    time.sleep(0.1)

In [2]:
print(lyrics[0])

If u ain't got no place 2 stay
Come on baby 'round this way
Stay with me baby
But let me tell u how it's gonna b
There's a theocratic order.
There's a theocratic order now
This is how it's gonna b
If u wanna b with me
Ain't no room 4 disagree
1+1+1 is 3
Take ur time and think it thru
If this is what u wanna do
I ain't really that hard 2 please
Cuz 1+1+1 is 3
Stroke ur hair a hundred times
Let me c what I can find
D u know about the order.
Do u know about the order, now?
The Banished Ones:
"We are the Banished Ones and we have come 2 dance
If u will not let us, we'll have 2 kick ur pants!"
Who's that knockin' on r door?
Didn't we throw u out b4?
I'm 'bout 2 get rowdy!
I'm 'bout 2 get rowdy, now!
Make me wanna do something.
We could b surrounded in the palace
"Everybody wants 2 get u!"
I don't care
How many y'all just came 2 dance?
Let me c u shake ur pants
We don't give a duck what u got on
U just need 2 work that sexy body all nite long
Come on
Where them Banished Ones at?
"Said they '

## Question 2: Training a Markov Chain (30 points)

Markov chains are mathematical systems that hop (a.k.a. "transition") randomly between various states. Please read [this visual explanation](http://setosa.io/ev/markov-chains/) for a high-level overview. The distinguishing feature of a Markov chain is that the next state only depends on which state the chain is in now; it doesn't depend on the past history of the chain.

We can use Markov chains to model human language. Each word is a "state", and the next word in a sentence only depends on the current word, not any words that came before. This model makes sense because if we know that the current word is "it", the next word is very likely to be "is", less likely to be "runs", and never going to be "pineapple". On the other hand, if the current word is "the", then the next word might be "pineapple", but it can't be "is". The current word tells us a lot about what the next word might be.

We will build a Markov chain model for the artist whose lyrics you scraped in Question 1. To do this, we have to go through the lyrics and learn the word transitions for that artist. We will store this information in a dict called `transitions`, which maps each word to a list of words that appear after it in the training data. So for example, one entry of this dict might be

```
transitions = {
    "it": ["is", "runs", "is", "is", "was", "is", "was"],
    ...
}
```

You should include a few additional states, besides words, in your Markov chain. You should have `"<START>"` and `"<END>"` states so that we can keep track of what words songs are likely to begin and end on.  So if the song starts on the word "it" and ends with the word "me.", you would have 
```
transitions = {
    "<START>": ["it", ...],
    "me.": ["<END>", ...],
    ...
}
```
You should also include a state called `"<N>"` to denote line breaks so that we know where lines begin and end.

In [3]:
def train_markov_chain(lyrics):
    transitions = {"<START>": []}
    transitions["<N>"] = []
    for lyric in lyrics:
        lines = re.split("\n", lyric)
        filtered_lines = []
        for line in lines:
            if line != '':
                filtered_lines.append(line)
        for j in range(0, len(filtered_lines)):
            words = re.split(" ", filtered_lines[j])
            for k in range(0,len(words)):
                if (k == 0 and j == 0):
                    transitions["<START>"].append(words[k])
                if (k == len(words) - 1 and j < len(filtered_lines) -1):
                    transitions["<N>"].append(re.split(" ", filtered_lines[j+1])[0])
                if words[k] not in transitions:
                    if (k < len(words)-1):
                        transitions[words[k]] = [words[k+1]]
                    if (j < len(filtered_lines) -1 and k == len(words)-1 ):
                        transitions[words[k]] = ["<N>"]
                    if (k == len(words) -1 and j == len(filtered_lines) - 1):
                        transitions[words[k]] = ["<END>"]
                else:
                    if (k < len(words)-1):
                        transitions[words[k]].append(words[k+1])
                    if (j < len(filtered_lines) -1 and k == len(words)-1):
                        transitions[words[k]].append("<N>")
                    if (k == len(words) -1 and j == len(filtered_lines) - 1):
                        transitions[words[k]].append("<END>")
    return transitions

In [4]:
chain = train_markov_chain(lyrics)
print(chain["<START>"][:20])
print(chain["<N>"][:20])

['If', 'Called', '18', "Don't", 'I', 'Serve', 'Ha', 'Who', 'How', 'Yeah', 'Well,', 'If', 'One', "'Bout", 'Using', 'Long', 'Long', 'It', "It's", 'Performed']
['Come', 'Stay', 'But', "There's", "There's", 'This', 'If', "Ain't", '1+1+1', 'Take', 'If', 'I', 'Cuz', 'Stroke', 'Let', 'D', 'Do', 'The', '"We', 'If']


## Question 3: Generating New Lyrics (20 points)

Finally, let's generate new lyrics using the Markov chain you constructed in Question 2. To do this, we'll begin at the `"<START>"` state and randomly sample a word from the list of first words. Then, we'll randomly sample each next word from the list of words that appeared after the current word in the training data. We will continue this until we reach the `"<END>"` state. This will give us the complete lyrics of a randomly generated song!

You may find the `choice()` function in the `random` package helpful for this question.

In [13]:
import random

def generate_new_lyrics(chain):
    # a list for storing the generated words
    words = []
    # generate the first word
    words.append(random.choice(chain["<START>"]))
    word = words[0]
    counter = 1
    while(word != "<END>"):
        words.append(random.choice(chain[word]))
        word = words[counter]
        counter = counter + 1
    # join the words together into a string with line breaks
    lyrics = " ".join(words[:-1])
    return "\n".join(lyrics.split("<N>"))

In [14]:
print(generate_new_lyrics(chain))

Long hard as I like this where it's just bang on my life 
 But I'm the police  
 June brought an answer your lovely face 
 Put on girl 
 And I'm about 2 U? 
 Until I was finally rub your eyes 
 A hundred fifty dollar glasses though awakened from the end of 
 Dance, dance on a day 
 Everybody say, say no more than a million days 
 Thank you some of 
 Let the whores 
 Praise me,  
 I need another lover like the phone!" 
 Black night 
 Get your own ideas! 
 Do Me Baby, u could I knew U when U better dead, they  
 Get freaky, let that night long, deep blue sea 
 Take a word, we gonna do? 
 WORK! 
 Fly with a man 
 night./Damn! 
 Oh yeah! 
  
 Then cuddle up the herbs help me 
 but they only one thing I might be killin' one 
 Let the mass illusion, war drums beat you right 
 Like the Johnny's slippery, we'll unlock the tears 
 In your body  - Daddy Pop Daddy Pop Daddy - Go go 
 Damned if your face in God? Do I ain't funkin' just 4 that I'd dig U more time 
 U don't wanna show no me see your