# Song Lyric Generator

**Margot Murvihill**

In [1]:
import requests
import time
import pickle
import random
from bs4 import BeautifulSoup

The website below has links to Anderson Paak's songs. This cell scrapes the links to each song from his main page and puts them into a list called links.

In [2]:
links = []
#Change URL to get lyrics from any artist
resp = requests.get("http://www.songlyrics.com/anderson-paak-lyrics/")
soup = BeautifulSoup(resp.content, "html.parser")
table = soup.find("table", {"class" : "tracklist"})
for item in table.find_all("tr"):
    links.append(item.find("a")["href"])
    

Going through each link and get the lyrics for each song and add them to a list called lyrics.

In [3]:
lyrics = []

import time

for link in links:
    resp = requests.get(link)
    soup = BeautifulSoup(resp.content, "html.parser")
    lyrics.append(soup.find("p", {"id" : "songLyricsDiv"}).text)
    #Adding a time delay between requests
    time.sleep(0.5)

In [4]:
# Cleaning
for i in range(len(lyrics)):
    song = lyrics[i]
    song = song.lower()
    song = song.replace("\n\n", "\n")
    song = song.replace("\n", " <N> ")
    song = song.replace("?", " ")
    song = song.replace(",", "")
    song = song.replace("-", " ")
    song = song.replace("'", "")
    song = "<START> " + song + " <END>"
    lyrics[i] = song

for i in range(len(lyrics)):
    song = lyrics[i]
    song = lyrics[i].split()
    lyrics[i] = song   

In [5]:
#Pickle holds our data
pickle.dump(lyrics, open("lyrics.pkl", "wb"))

Now we will create a list of every word in Anderson Paak's songs, and for each word we will have another list of all the words that followed that word in his song. 

In [6]:
def train_markov_chain(lyrics):
    """
    Args:
      - lyrics: a list of strings, where each string represents
                the lyrics of one song by an artist.
    
    Returns:
      A dict that maps a single word ("unigram") to a list of
      words that follow that word, representing the Markov
      chain trained on the lyrics.
    """
    chain = {"<START>": []}
    for lyric in lyrics:
        for i in range(len(lyric)):
            word = lyric[i]
            if chain.get(word)== None:
                if (i+1) < len(lyric):
                    chain[word] = [lyric[i+1]]
            else:
                if (i+1) < len(lyric):
                    chain[word].append(lyric[i+1])
        
        
    return chain

In [7]:
# Load the pickled lyrics object.
lyrics = pickle.load(open("lyrics.pkl", "rb"))

# Creating Markov Chain
chain = train_markov_chain(lyrics)

# Words that tend to start a song (i.e., what words follow the <START> tag)
print(chain["<START>"])

# Words that tend to begin a line (i.e., what words follow the line break tag)
print(chain["<N>"][:20])

['we', 'my', 'if', 'im', 'we', 'she', 'ugh', 'all', 'i', 'dogtown', 'you', 'my', 'why', 'you', 'we', 'we', 'we', 'we', 'we', 'we', 'na', 'look', 'we', 'i', 'i', 'venice', 'produced', 'feat.', 'feat.', 'produced', '(feat.', '(feat.', '[verse', '(feat.', 'produced', 'produced', 'produced', 'produced', 'produced', 'na', 'i', '"such', 'my', 'my', 'you', 'feat.', 'feat.', '[instrumental:]', '(feat.', 'feat.']
['(until', 'strawberry', '(i', 'how', '(forever', 'if', '(fruit', 'say', 'spent', 'threw', 'with', 'im', 'went', '(oh', 'yeah', 'i', 'and', '(nothing', 'but', 'see']


Using the Markov chain we made above, we will essentially create a path through the song. We will randomly select a word that typically starts the song, then randomly select a word that follows that word, and randomly select a word that follows that word, and so on until we reach the end.

In [8]:
def generate_new_lyrics(chain):
    """
    Args:
      - chain: a dict representing the Markov chain,
               such as one generated by generate_new_lyrics()
    
    Returns:
      A string representing the randomly generated song.
    """
    
    # a list for storing the generated words
    words = []
    # generate the first word
    word = random.choice(chain["<START>"])
    words.append(word)
    while word != "<END>":
        word = random.choice(chain[word])
        words.append(word)
    
    
    # join the words together into a string with line breaks
    lyrics = " ".join(words[:-1])
    return "\n".join(lyrics.split("<N>"))

In [9]:
#Our generated Anderson Paak song
print(generate_new_lyrics(chain))

venice california 
 (momma can you got tropical haze go 
 
 (nothing short of this high baby 
 "come down 
 plenty of a cold you 
 wait 
 matte black 
 i wanna make it 
 its no no patience 
 said "get what you carry me get this whole world was walking down 
 met her so until we do 
 and mumbling bout to the feelin that *exhales* get deported 
 (until its paid for it carry me 
 but the bank statements 
 ima ride 
 
 i miss that shit change but you ever really cant beat it must live on the kissin attention the only message i cant ooh yeah) 
 half an important factor for it go and the element of chronic smoke if you 
 i wanted to show them taxes 
 to do 
 dance then its too floored to tell uncle sam barsh & anderson .paak] 
 break 
 i tried my patience 
 you carry me on and shit 
 you carry you and all the tuggin 
 knees cant pull it feel me ) 
 did you feel me i call you carry me what you jump in this bitchs stomach 
 yeah 
 i said "go and pour) 
 and we have my first into the fightin an

We will now train another Markov chain using bigrams. Essentially the same logic, except we are finding the words that follow every pair of words instead of a single word.

In [10]:
for song in lyrics:
    song.insert(0, None)
    song.append(None)

In [11]:
def train_markov_chain(lyrics):
    """
    Args:
      - lyrics: a list of strings, where each string represents
                the lyrics of one song by an artist.
    
    Returns:
      A dict that maps a tuple of 2 words ("bigram") to a list of
      words that follow that bigram, representing the Markov
      chain trained on the lyrics.
    """
    chain = {}
    for lyric in lyrics:
        for i in range(len(lyric)):
            if (i+2) < len(lyric):
                word = (lyric[i], lyric[i+1])
                if chain.get(word) == None:
                    chain[word] = [lyric[i+2]]
                else:
                    chain[word].append(lyric[i+2])
        

    return chain

In [12]:
# Creating Markov Chain with bigrams
chain = train_markov_chain(lyrics)

# Words that tend to start a song (i.e., what words follow the <START> tag)
print(chain[(None, "<START>")])

['we', 'my', 'if', 'im', 'we', 'she', 'ugh', 'all', 'i', 'dogtown', 'you', 'my', 'why', 'you', 'we', 'we', 'we', 'we', 'we', 'we', 'na', 'look', 'we', 'i', 'i', 'venice', 'produced', 'feat.', 'feat.', 'produced', '(feat.', '(feat.', '[verse', '(feat.', 'produced', 'produced', 'produced', 'produced', 'produced', 'na', 'i', '"such', 'my', 'my', 'you', 'feat.', 'feat.', '[instrumental:]', '(feat.', 'feat.']


In [13]:
def generate_new_lyrics(chain):
    """
    Args:
      - chain: a dict representing the Markov chain,
               such as one generated by generate_new_lyrics()
    
    Returns:
      A string representing the randomly generated song.
    """
    
    # a list for storing the generated words
    words = []
    # generate the first word
    tuple = (None, "<START>")
    word = random.choice(chain[tuple])
    words.append(word)
    while tuple[1] != "<END>":
        tuple = (tuple[1], random.choice(chain[tuple])) 
        words.append(tuple[1])
    
    
    # join the words together into a string with line breaks
    lyrics = " ".join(words[:-1])
    return "\n".join(lyrics.split("<N>"))

Generated Anderson Paak song with bigrams. Do you think it's better?

In [14]:
print(generate_new_lyrics(chain))

(feat. we do 
 a bit more time 
 where all these animals all this shaman i got needs 
 and put it all but not in my body spinning around in the hand (look) 
 one in the 5th and pulled up in the fall i caught a glimpse of my fortune meeting 
 when i drive 
 i got that good conversation 
 im lifting it up 
 if you see what you put me through 
 gotta make me follow through 
 it was late in the basement 
 kills me with an open door 
 uh huh 
 and look at my tree i see is angels demons 
 soft lights a bitch sounds 
 from this connectivity 
 you know i might just roll out to the pre roll 
 now let me take these bitches off 
 energy carried on and on 
 (momma can you carry me ) 
 (momma can you carry me ) 
 psychologist: can you carry me ) 
 it must be what you put me through 
 
 [verse 2: schoolboy q] 
 smiles around the clock 
 dance around the town 
 patience thinner than her pantyhose 
 get over here and empty your pockets 
 i want the right one 
 car keys and the free lunch 
 living unde