# Final Notebook - Rap Generator

Nowadays, mainstream music has become so formulaic that a computer could probably write it. I decided to put that to the test by creating a program to generate song lyrics for a given artist.

## Strategy
1. First, we will fetch collections of lyrics for a given artist.
2. Next, generate an n-gram model for that artist.
3. Define a stress and rhyme pattern for a song.
4. Repeatedly generate n-grams, accepting only those that match the above pattern, until song is complete.

In [42]:
# Import our API keys
from utility.config import *

# We will use this package to make API calls
import requests
import urllib
from lyricsgenius import Genius

genius = Genius(ACCESS_TOKEN)

HEADERS = {'User-Agent': 'XY', 'Content-type': 'application/json', 'Authorization': 'Bearer ' + ACCESS_TOKEN}
API_URL = 'https://api.genius.com/'

## Fetching songs
First, let's decide which artist we are going to use. The following cell searches for a given artist with Genius's API. Enter in the artist for whom you'd like to generate lyrics–it will work better if they have a large discography.

In [16]:
artist_found = False
artist_id = ''
artist_name = ''

while not artist_found:
    print('What artist would you like to search for?')
    query = input().strip()
    
    # encode spaces
    params = urllib.parse.urlencode({'q': query}, quote_via=urllib.parse.quote)
    
    resp = requests.get(API_URL + 'search', headers=HEADERS, params=params).json()
    
    if len(resp['response']['hits']) == 0:
        print('No hits found')
        continue
    
    for hit in resp['response']['hits']:
        if hit['type'] == 'song':
            if hit['result']['primary_artist']['name'].lower() == query.lower():
                print('Did ' + query + ' sing \'' + hit['result']['title'] + '\'? Enter y for yes.')
                confirm = input()
                if confirm.strip() == 'y':
                    # We have found the right artist! Let's get out of here.
                    artist_id = hit['result']['primary_artist']['id']
                    artist_name = hit['result']['primary_artist']['name']
                    artist_found = True
                    break
    
print(artist_name)
print(artist_id)

What artist would you like to search for?


 Gucci Mane


Did Gucci Mane sing 'I Get the Bag'? Enter y for yes.


 y


Gucci Mane
13


Now, we must fetch a reasonable amount of songs for the artist. We will repeatedly query the Genius API for songs for the artist. We might also want to eliminate songs with features on them, since that will muddy our n-gram grammar. Note: this step should take quite a bit, as it has to make many API calls.

In [32]:
from IPython.display import clear_output

artist_songs = []

more_to_get = True
cur_page = 1 # Used to page through all songs

while more_to_get:
    params = {'sort': 'popularity', 'page': cur_page}
    resp = requests.get(API_URL + 'artists/' + str(artist_id) + '/songs', headers=HEADERS, params=params).json()
        
    if len(resp['response']['songs']) == 0:
        more_to_get = False
        break
        
    for song in resp['response']['songs']:
        # Let's make sure the song matches a few basic conditions
        # First: is the main artist correct?
        if not song['primary_artist']['id'] == artist_id:
            break
        
        # Next: is there a feature?
        if 'feat.' in song['full_title'].lower() or 'ft.' in song['full_title'].lower():
            break
            
        # If neither, we are good to go.
        artist_songs.append(song)
        clear_output(wait=True)
        print(song['full_title'])
        
    
    cur_page += 1
    
    if len(artist_songs) > 500:
        more_to_get = False
        break

print('Done. Got ' + str(len(artist_songs)) + ' songs.')

Classical (Intro) [Instrumental]
Done. Got 64 songs.


To make it easier to work with, we will convert this to a pandas dataframe. Then, we will use `lyricsgenius` to download the song lyrics and create a list.

In [47]:
import pandas as pd

df = pd.DataFrame.from_dict(artist_songs)


for i, row in df.iterrows():
    print("Loading lyrics for " + row['title'])
    song_obj = genius.search_song(song_id=row['id'])
    if song_obj:
        df.at[i,'lyrics'] = song_obj.lyrics
    clear_output(wait=True)
    
df

Unnamed: 0,annotation_count,api_path,full_title,header_image_thumbnail_url,header_image_url,id,lyrics_owner_id,lyrics_state,path,pyongs_count,...,song_art_image_url,stats,title,title_with_featured,url,song_art_primary_color,song_art_secondary_color,song_art_text_color,primary_artist,lyrics
0,6,/songs/2824442,Gucci Please by Gucci Mane,https://images.genius.com/16151fb279d827b8c494...,https://images.genius.com/16151fb279d827b8c494...,2824442,1091826,complete,/Gucci-mane-gucci-please-lyrics,4.0,...,https://images.genius.com/16151fb279d827b8c494...,"{'unreviewed_annotations': 0, 'hot': False, 'p...",Gucci Please,Gucci Please,https://genius.com/Gucci-mane-gucci-please-lyrics,#c43015,#0b695e,#fff,"{'api_path': '/artists/13', 'header_image_url'...",[Hook]\nGucci please\nTell me that you love me...
1,7,/songs/2933313,Nonchalant by Gucci Mane,https://images.genius.com/0ef2012df27122dee69b...,https://images.genius.com/0ef2012df27122dee69b...,2933313,1091826,complete,/Gucci-mane-nonchalant-lyrics,2.0,...,https://images.genius.com/0ef2012df27122dee69b...,"{'unreviewed_annotations': 0, 'hot': False, 'p...",Nonchalant,Nonchalant,https://genius.com/Gucci-mane-nonchalant-lyrics,#f00e0e,#440404,#fff,"{'api_path': '/artists/13', 'header_image_url'...",[Verse 1]\nNow my watch so fuckin' bright\nIt ...
2,4,/songs/2873817,Addicted by Gucci Mane,https://images.genius.com/d11f455a37c86be21372...,https://images.genius.com/d11f455a37c86be21372...,2873817,1091826,complete,/Gucci-mane-addicted-lyrics,4.0,...,https://images.genius.com/d11f455a37c86be21372...,"{'unreviewed_annotations': 0, 'hot': False, 'p...",Addicted,Addicted,https://genius.com/Gucci-mane-addicted-lyrics,#486db6,#04060a,#fff,"{'api_path': '/artists/13', 'header_image_url'...","[Intro]\nWhen I was on drugs so bad, you know,..."
3,3,/songs/63868,Photoshoot by Gucci Mane,https://images.genius.com/73d7d63105a496a82021...,https://images.genius.com/73d7d63105a496a82021...,63868,65770,complete,/Gucci-mane-photoshoot-lyrics,12.0,...,https://images.genius.com/73d7d63105a496a82021...,"{'unreviewed_annotations': 0, 'hot': False, 'p...",Photoshoot,Photoshoot,https://genius.com/Gucci-mane-photoshoot-lyrics,#ee0606,#78160b,#fff,"{'api_path': '/artists/13', 'header_image_url'...","[Intro]\nYeah\nListen to this track, bitch\n\n..."
4,6,/songs/3099482,Bucket List by Gucci Mane,https://images.genius.com/176144ac2270009b2293...,https://images.genius.com/176144ac2270009b2293...,3099482,3915568,complete,/Gucci-mane-bucket-list-lyrics,,...,https://images.genius.com/176144ac2270009b2293...,"{'unreviewed_annotations': 1, 'hot': False, 'p...",Bucket List,Bucket List,https://genius.com/Gucci-mane-bucket-list-lyrics,#d73440,#a4303c,#fff,"{'api_path': '/artists/13', 'header_image_url'...",[Intro]\nBucket list\nCut it up\nHuh\nIt's Guc...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
59,0,/songs/6216403,Round 1 Is Everything (Exclusive Remix) by Guc...,https://images.genius.com/c909bdb7535f05d12b62...,https://images.genius.com/c909bdb7535f05d12b62...,6216403,10489257,unreleased,/Gucci-mane-round-1-is-everything-exclusive-re...,,...,https://images.genius.com/c909bdb7535f05d12b62...,"{'unreviewed_annotations': 0, 'hot': False}",Round 1 Is Everything (Exclusive Remix),Round 1 Is Everything (Exclusive Remix),https://genius.com/Gucci-mane-round-1-is-every...,#e3160f,#811916,#fff,"{'api_path': '/artists/13', 'header_image_url'...",
60,0,/songs/6186012,DJ Bobby Black (Outro) by Gucci Mane,https://images.genius.com/0feb54b476e630f43051...,https://images.genius.com/0feb54b476e630f43051...,6186012,8299558,complete,/Gucci-mane-dj-bobby-black-outro-lyrics,,...,https://images.genius.com/0feb54b476e630f43051...,"{'unreviewed_annotations': 0, 'hot': False}",DJ Bobby Black (Outro),DJ Bobby Black (Outro),https://genius.com/Gucci-mane-dj-bobby-black-o...,#ccbc8c,#dcccb1,#000,"{'api_path': '/artists/13', 'header_image_url'...",[DJ Bobby Black]\nYou guys know how we do arou...
61,0,/songs/5419091,Rick Ross Speaks / DJ Khaled Speaks by Gucci Mane,https://images.genius.com/c1e86882e49586297dcb...,https://images.genius.com/c1e86882e49586297dcb...,5419091,8299558,complete,/Gucci-mane-rick-ross-speaks-dj-khaled-speaks-...,,...,https://images.genius.com/c1e86882e49586297dcb...,"{'unreviewed_annotations': 0, 'hot': False}",Rick Ross Speaks / DJ Khaled Speaks,Rick Ross Speaks / DJ Khaled Speaks,https://genius.com/Gucci-mane-rick-ross-speaks...,#de240f,#8b1c15,#fff,"{'api_path': '/artists/13', 'header_image_url'...","[Rick Ross]\nYeah, what's up? It's the boss, R..."
62,0,/songs/6771178,Lemonade (Instrumental) by Gucci Mane,https://images.genius.com/2c515dbd3881143c9d6d...,https://images.genius.com/2c515dbd3881143c9d6d...,6771178,8299558,complete,/Gucci-mane-lemonade-instrumental-lyrics,,...,https://images.genius.com/2c515dbd3881143c9d6d...,"{'unreviewed_annotations': 0, 'hot': False}",Lemonade (Instrumental),Lemonade (Instrumental),https://genius.com/Gucci-mane-lemonade-instrum...,#e94126,#632a21,#fff,"{'api_path': '/artists/13', 'header_image_url'...",


Next, we will generate a large string by combining all songs, which we will then parse.

In [69]:
all_lyrics = ''
for lyric in df['lyrics']:
    if lyric and type(lyric) == str:
        all_lyrics = all_lyrics + lyric
        
print(all_lyrics[:1000])

[Hook]
Gucci please
Tell me that you love me, can I be your main squeeze?
Tell me that you love me, Gucci, Gucci, Gucci please
Tell me that you love me, can I be your main squeeze?
Tell me that you love me, tell me, tell me Gucci please
Gucci please, Gucci, Gucci, Gucci Please
Gucci please, Gucci, Gucci, Gucci Please
Tell me that you love me, can I be your main squeeze?
Tell me that you love me, tell me tell me Gucci please

[Verse 1]
Baby freeze, I know a nigga look like 900 keys
Baby say my ring look like a hundred Ps
I got college bitches fallin' out calm down breathe
If yall keep runnin' on the stage then I'mma have to leave
I'm bout to drop the top so I can feel the summer breeze
Its a cold blooded motherfuckin rollie on my sleeve
It feels like I'm the freshest nigga out or is it me
He getting a lil' money but that nigga isn’t me
G-U-C-C-Icy I'm the one girls wanna see
G-U-W-O-P I'm the one boys wanna be
Lil mama got her hair fixed is hers or it weave
Shit I don’t give a damn just

## Data Prep
For our parsing, we will do a few things.
- Remove anything in \[square brackets\]. These are used to indicate verses and choruses.
- Slice the string by lines. Each line will become a single list within the master list.
- Tokenize

Important note: many of these songs contain **adlibs** which are marked with parenthesis. I'd like to keep these as a single token, rather than splitting them up, since they can be inserted in the song freely.

In [230]:
import re
from nltk import tokenize
from nltk.tokenize import SExprTokenizer

# Remove annotations
parsed_lyrics = re.sub(r"\[.*?\]", "", all_lyrics)
# Remove double newlines
parsed_lyrics = re.sub(r"[\n]+", "\n", parsed_lyrics)
# Remove start/end newlines
parsed_lyrics = parsed_lyrics.strip()
# Remove apostrophes, there's lots of contractions used which can make it messy
parsed_lyrics = re.sub(r"'|’", "", parsed_lyrics)
# Remove empty parens
parsed_lyrics = re.sub(r"\(\)", "", parsed_lyrics)

# Tokenize
sent_tokens = parsed_lyrics.split("\n")

# Tokenize words. We are using SExprTokenizer to keep parenthesis as a single token
# There's a weird issue where it only works if parenthesis are in the sentence, so we'll add some extra parens and remove them right away.
text_tokens = []
tk = SExprTokenizer()
text_tokens = [tk.tokenize(s + " ()")[0:-1] for s in sent_tokens]

# One more issue: this tokenizer doesn't do parenthesis. We will iterate over every token and run word_tokenize on it if it doesn't start with parenthesis.
all_sentences = []
for sentence in text_tokens:
    new_sentence = []
    for word in sentence:
        if not re.match(r"\(.*?\)", word):
            # Not an adlib, run tokenize
            word_tokenized = tokenize.word_tokenize(word)
            new_sentence += word_tokenized
        else:
            new_sentence.append(word)
    all_sentences.append(new_sentence)

print(text_tokens[:100])

[['Gucci', 'please'], ['Tell', 'me', 'that', 'you', 'love', 'me,', 'can', 'I', 'be', 'your', 'main', 'squeeze?'], ['Tell', 'me', 'that', 'you', 'love', 'me,', 'Gucci,', 'Gucci,', 'Gucci', 'please'], ['Tell', 'me', 'that', 'you', 'love', 'me,', 'can', 'I', 'be', 'your', 'main', 'squeeze?'], ['Tell', 'me', 'that', 'you', 'love', 'me,', 'tell', 'me,', 'tell', 'me', 'Gucci', 'please'], ['Gucci', 'please,', 'Gucci,', 'Gucci,', 'Gucci', 'Please'], ['Gucci', 'please,', 'Gucci,', 'Gucci,', 'Gucci', 'Please'], ['Tell', 'me', 'that', 'you', 'love', 'me,', 'can', 'I', 'be', 'your', 'main', 'squeeze?'], ['Tell', 'me', 'that', 'you', 'love', 'me,', 'tell', 'me', 'tell', 'me', 'Gucci', 'please'], ['Baby', 'freeze,', 'I', 'know', 'a', 'nigga', 'look', 'like', '900', 'keys'], ['Baby', 'say', 'my', 'ring', 'look', 'like', 'a', 'hundred', 'Ps'], ['I', 'got', 'college', 'bitches', 'fallin', 'out', 'calm', 'down', 'breathe'], ['If', 'yall', 'keep', 'runnin', 'on', 'the', 'stage', 'then', 'Imma', 'have', '

## Creating the Model
Now we will make a bigram model for our artist. I have selected bigrams as song lyrics tend to be somewhat short, so we might run into issues with sparsity otherwise.

In [231]:
from nltk.lm.preprocessing import padded_everygram_pipeline
from nltk.lm import MLE

n = 3 # Choose your n-gram size here! Choose 2, 3, or 4

train, vocab = padded_everygram_pipeline(n, all_sentences)
lm = MLE(n)
lm.fit(train, vocab)
len(lm.vocab)

4103

In [260]:
lm.vocab.counts.most_common(20)

[('<s>', 5866),
 ('</s>', 5866),
 (',', 1826),
 ('I', 823),
 ('a', 690),
 ('the', 581),
 ('Im', 476),
 ('my', 419),
 ('me', 375),
 ('to', 331),
 ('you', 309),
 ('like', 293),
 ('it', 277),
 ('and', 276),
 ('on', 270),
 ('Gucci', 253),
 ('that', 249),
 ('got', 240),
 ('in', 232),
 ('nigga', 208)]

In [274]:
from random import randrange

def generate_sentence():
    rando = " ".join(list(lm.generate(200,text_seed=["<s>"])))
    rando = re.sub(r"(<s>|</s>)", "", rando)
    rando = re.sub(r"( ,)", ",", rando)
    return rando.strip()

print(generate_sentence())

I come through light up your funeral


## Using our Model

Now we have a reasonable model for an artist. Let's demonstrate a few ways we can use it to generate interesting lyrics.

First, we will generate simple rhyming couplets with no concern about meter or length. 