# Markov Model Lyrics Generator

If you would like to generate lyrics for an artist of your choosing, replace the http://www.lyricsfreak.com link with a different artist.

The following web scraping works for lyricsfreak.com. However, if you wish to use another site, find a site that has links to a large number of songs by your chosen artist. Scrape the page, extract the hyperlinks, and issue new HTTP requests to each hyperlink to get each song. Use `time.sleep(0.1)` to stagger your HTTP requests so that you do not get banned by the website for making too many requests. 

Skip to section **Bigram Markov Chain Model** section of this notebook once you have scraped the lyrics into a Python list called `lyrics`, where each element of the list represents the lyrics of one song. Make sure to add `"<START>"`, `"<END>"`, and `"<N>"` tags in your lyrics to denote the start and end of songs, as well as new lines.


# Scraping Song Lyrics

In [1]:
import requests
import time
from bs4 import BeautifulSoup
import pandas as pd

In [2]:
resp = requests.get("http://www.lyricsfreak.com/k/kanye+west/")
soup = BeautifulSoup(resp.content, "html.parser")

In [3]:
links = []
for a in soup.find_all('a'):
    link = a.get('href')
    if link is not None and '/k/' in link:
        links.append("http://www.lyricsfreak.com" + link)

links = pd.DataFrame(links, columns=["Link"])

In [4]:
lyrics = []
for link in links.Link:
    time.sleep(0.1)
    resp = requests.get(link)
    soup = BeautifulSoup(resp.content, "html.parser")
    text = soup.find(id="content_h").prettify()
    lyrics.append(text)

In [6]:
cleaned_lyrics = []
for text in lyrics:
    cleaned = ''.join(e for e in text.lower() if e.isalnum() | (e == ' '))
    cleaned = cleaned.replace("br", "<N>").split()[3:-1]

    cleaned.insert(0, "<START>")
    cleaned.append("<END>")
    cleaned_lyrics.append(cleaned)

lyrics = cleaned_lyrics

# Bigram Markov Chain Model

Markov chain that uses the last two words (bigram) to predict the next word.


In [13]:
def train_markov_chain(lyrics):
    chain = {(None, "<START>"): []}
    for lyric in lyrics:
        lyric.insert(0, None)
        for word1, word2, next_word in zip(lyric, lyric[1:], lyric[2:]):
            if (word1, word2) in chain.keys():
                chain[(word1, word2)].append(next_word)
            else:
                chain[(word1, word2)] = [next_word]
    return chain


In [14]:
chain = train_markov_chain(lyrics)

In [15]:
import random

def generate_new_lyrics(chain):
    song = []
    song.append("<START>")
    word = random.choice(chain[(None, "<START>")])
    song.append(word)
    
    i = 0
    while word != "<END>":
        word = random.choice(chain[(song[i], word)])
        song.append(word)
        i += 1
        
    lyrics = " ".join(song[:-1])
    lyrics = "\n".join(lyrics.split("<N>"))
    return lyrics

# Generate Lyrics

In [16]:
print(generate_new_lyrics(chain))

<START> dogs barking 
 intro choir 
 higher 
 do anybody make real shit 
 whole buncha lot of flirting involved 
 but you know i know 
 till im beer on the phone 
 ohhh waaaa oow i want more 
 well you need a new space 
 still i feel its fadin 
 
 you got me sayin a sister who was uninvited who was hip hops 
other 
 on the freeway 
 then imma probe you 
 threw the mud 
 whos provoking you 
 need a news crews presence 
 speedboat swerve homie watch out for a rocawear 
 more populaire cause i just blame everything on you turnin me no 
 cuz im so goose 
 summer time no juice 
 big faced hundreds and whatever other synonyms 
 strippers named cinnamon 
 more specifically they can drop me some kicks 
 so 
 right now let me go let me see you in the lac truck 
 ill fly away 
 dont say nothin bout what ye said 
 dont look down its the are o c 
 ima get a shot of us 
 
 outro kanye west 
 album college dropout unreleased 
 song whole life to steven king 
 you do it 
 screams from the chi town sl