# Intro
---
In this project, I generate fake lyrics for the Red Hot Chili Peppers using Keras and an LSTM RNN. Here is a quick read on this technique: https://towardsdatascience.com/recurrent-neural-networks-and-lstm-4b601dd822a5. I used BeautifulSoup to scrape Red Hot Chili Peppers lyrics from https://www.lyrics.com/ and the textgenrnn library for natural language generation.

I also wrote a function at the bottom of this notebook that anyone can use  to create a dataframe of songs, time durations, and lyrics from any artist on https://www.lyrics.com/

## Imports

In [1]:
import pandas as pd
import numpy as np

#for web scraping
from bs4 import BeautifulSoup
import requests

#for natural language generation
import sys
from keras.models import Sequential
from keras.layers import LSTM, Activation, Flatten, Dropout, Dense, \
                         Embedding, TimeDistributed, CuDNNLSTM
from keras.callbacks import ModelCheckpoint
from keras.utils import np_utils
from textgenrnn import textgenrnn
import os

import warnings
warnings.filterwarnings("ignore")

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


## Part 1: Web Scraping
---

In [2]:
url = 'https://www.lyrics.com/artist/Red%20Hot%20Chili%20Peppers'
r = requests.get(url)
soup = BeautifulSoup(markup = r.text, features = 'html.parser')

My process for scraping lyrics.com was to play around with the website HTML and use inspect element to find what class certain features belonged to. For example, I clicked on the album cover and saw that albums were in an h3 class titled "artist-album-label." I then scraped from these classes using BeautifulSoup:

*step 1*

<img src='album_select.png' alt='album'>

*step 2*

<img src='example_html.png' alt='ex_html'>

In [3]:
albums = soup.find_all('h3', {'class': 'artist-album-label'})

In [4]:
for i in albums[:10]:
    print(i.text)

Uncensored
The Red Hot Chili Peppers [1984]
Freaky Styley [1985]
Freaky Styley [Bonus Tracks] [1985]
Freaky Styley [Japan Bonus Tracks] [1985]
Hollywood (Africa) [1985]
The Uplift Mofo Party Plan [1987]
The Uplift Mofo Party Plan [1987]
Uplift Mofo Party Plan [Japan Bonus Tracks] [1987]
Mother's Milk [1989]


In [5]:
songs = soup.find_all('td',{'class': 'tal qx'})

In [6]:
for i in songs[20:40]:
    print(i.text)

Baby Appeal
 
American Ghost Dance
3:44
Battle Ship
1:53
The Brothers Cup
3:27
Catholic School Girls Rule
1:55
Freaky Styley
3:39
Hollywood (Africa)
5:03
Jungle Man
4:09
Lovin' and Touchin'
:36
Sex Rap
1:54


Every other entry was the song duration (with some songs missing this data), so I found I could separate the titles and durations using list splicing:

In [7]:
#songs
for i in songs[:20:2]:
    print(i.text)

Blackeyed Blonde
Buckle Down
Green Heaven
Mommy, Where's Daddy?
Out in L.A.
Police Helicopter
Sex Rap [Anthony's Rap]
Thirty Dirty Birds
Yertle the Turtle
You Always Sing the Same


In [8]:
#durations
for i in songs[21:41:2]:
    print(i.text)

 
3:44
1:53
3:27
1:55
3:39
5:03
4:09
:36
1:54


Each song also had an embedded hypertext reference to the lyrics page:

In [9]:
songs[0].a.attrs['href']

'/lyric/529874/Red+Hot+Chili+Peppers/Blackeyed+Blonde'

I used all of this to create a dataframe with songs, time durations, and lyrics:

In [10]:
sng = []
tim = []
lyr = []
for i in songs[::2]:
    sng.append(i.text)
for i in songs[1::2]:
    tim.append(i.text)

base_url = 'https://www.lyrics.com/'
for i in range(len(songs)):
    #no link to lyrics
    if songs[i].a is None:
        pass
    else:
        lyr_url = base_url + songs[i].a.attrs['href']
        r = requests.get(lyr_url)
        soup = BeautifulSoup(r.text, 'html.parser')
        lyrics = soup.find('pre', {'id': 'lyric-body-text'}).text
        lyr.append(lyrics)
        
df = pd.DataFrame({'song': sng, 'time': tim, 'lyrics': lyr})

In [11]:
df.head()

Unnamed: 0,song,time,lyrics
0,Blackeyed Blonde,,Pumpin' the blood through the heart of New Orl...
1,Buckle Down,,Hah!\r\nOn the ice\r\nNo holdin'\r\nMy soul\r\...
2,Green Heaven,,"About this planet, there is something I know\r..."
3,"Mommy, Where's Daddy?",,"Mommy, where's daddy?\r\nMommy, where's daddy?..."
4,Out in L.A.,,We're all a bunch of brothers livin' in a cool...


## Part 2: Data Cleaning
---

In [12]:
df['lyrics'][0]

"Pumpin' the blood through the heart of New Orleans\r\nShe's the mystic heat of the Bourbon street dream\r\nShe's just made out of flesh and bones\r\n\r\nBut let me tell you little boy\r\nYou better leave her alone\r\nLeroy Brown thought he was bad too\r\n'Till she left him floatin' in the old bayou\r\n\r\nShe's the kind of girl\r\nShe's built like a bomb\r\nShe's the blackeyed blackeyed\r\nBlackeyed blond, get down!\r\n\r\nThat blackeyed beauty with the golden crotch\r\nFrench electric sex a cock shocking swamp fox\r\nHeaten queen of sleeze she's hot to box\r\n\r\nBut let me tell you little boy\r\nShe'll clean your clock\r\nA slick and sly spy\r\nStuck in the muck of the moat\r\n\r\nBlew his mind to find a diamond in the boat\r\nDouble-o-dooms day for Mr. James Bond\r\nShe's the blackeyed blackeyed\r\nBlackeyed blond, Get down!"

Each new line was separated by '\r\n' which is the newline character in Windows, so I replaced these with spaces:

In [13]:
def lyr_fixer(x):
    fixed = ''
    lyr_lst = x.split('\r\n')
    for item in lyr_lst[:-1]:
        fixed += item + ' '
    fixed += lyr_lst[-1]
    return fixed

In [14]:
df['lyrics'] = df['lyrics'].apply(lyr_fixer)

In [15]:
df['lyrics'][0]

"Pumpin' the blood through the heart of New Orleans She's the mystic heat of the Bourbon street dream She's just made out of flesh and bones  But let me tell you little boy You better leave her alone Leroy Brown thought he was bad too 'Till she left him floatin' in the old bayou  She's the kind of girl She's built like a bomb She's the blackeyed blackeyed Blackeyed blond, get down!  That blackeyed beauty with the golden crotch French electric sex a cock shocking swamp fox Heaten queen of sleeze she's hot to box  But let me tell you little boy She'll clean your clock A slick and sly spy Stuck in the muck of the moat  Blew his mind to find a diamond in the boat Double-o-dooms day for Mr. James Bond She's the blackeyed blackeyed Blackeyed blond, Get down!"

All better. Now I save my lyrics in a text file so I can use it in the LSTM RNN model:

In [16]:
lyrics = list(df['lyrics'])

In [17]:
with open('lyrics_text.txt','w',encoding='utf-8') as filehandle:
    for item in lyrics:
        filehandle.write('%s\n' % item)

## Part 3: Model Building
---
I ran into computing power problems here, since I wanted to create a more complex neural network. This model has 50 nodes, 4 layers, and 20 epochs. It took around 4 hours to generate on my laptop:

In [20]:
model_cfg = {
    'rnn_size': 50,
    'rnn_layers': 4,
    'rnn_bidirectional': True,
    'max_length': 15,
    'max_words': 10000,
    'dim_embeddings': 100,
    'word_level': False,
}
train_cfg = {
    'line_delimited': True,
    'num_epochs': 20,
    'gen_epochs': 25,
    'batch_size': 750,
    'train_size': 0.8,
    'dropout': 0.0,
    'max_gen_length': 300,
    'validation': True,
    'is_csv': False
}

In [21]:
model_name = '50nds_4Lrs_20epchs_Model'
textgen = textgenrnn(name=model_name)

train_function = textgen.train_from_file if train_cfg['line_delimited'] else textgen.train_from_largetext_file

train_function(
    file_path='lyrics_text.txt',
    new_model=True,
    num_epochs=train_cfg['num_epochs'],
    gen_epochs=train_cfg['gen_epochs'],
    batch_size=train_cfg['batch_size'],
    train_size=train_cfg['train_size'],
    dropout=train_cfg['dropout'],
    max_gen_length=train_cfg['max_gen_length'],
    validation=train_cfg['validation'],
    is_csv=train_cfg['is_csv'],
    rnn_layers=model_cfg['rnn_layers'],
    rnn_size=model_cfg['rnn_size'],
    rnn_bidirectional=model_cfg['rnn_bidirectional'],
    max_length=model_cfg['max_length'],
    dim_embeddings=model_cfg['dim_embeddings'],
    word_level=model_cfg['word_level'])

602 texts collected.
Training new model w/ 4-layer, 50-cell Bidirectional LSTMs
Training on 224,951 character sequences.
Epoch 1/20
Epoch 2/20
Epoch 3/20


Epoch 4/20
Epoch 5/20


Epoch 6/20
Epoch 7/20


Epoch 8/20
Epoch 9/20


Epoch 10/20
Epoch 11/20


Epoch 12/20
Epoch 13/20


Epoch 14/20
Epoch 15/20


Epoch 16/20
Epoch 17/20


Epoch 18/20
Epoch 19/20


Epoch 20/20


In [22]:
textgen.model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input (InputLayer)              (None, 15)           0                                            
__________________________________________________________________________________________________
embedding (Embedding)           (None, 15, 100)      7800        input[0][0]                      
__________________________________________________________________________________________________
rnn_1 (Bidirectional)           (None, 15, 100)      60400       embedding[0][0]                  
__________________________________________________________________________________________________
rnn_2 (Bidirectional)           (None, 15, 100)      60400       rnn_1[0][0]                      
__________________________________________________________________________________________________
rnn_3 (Bid

## Part 4: See What Robot Anthony Kiedis Would Sing
---
As you scroll through the output you'll notice several headers in the format:

####################

Temperature: x.x
####################

Temperature is a hyperparameter that controls randomness of predictions. The top few predictions have a temperature of 0.2 which is not very random, while the temperature at the bottom is 1.0 which is very random.

In [23]:
textgen = textgenrnn(weights_path='50nds_4Lrs_20epchs_Model_weights.hdf5',
                       vocab_path='50nds_4Lrs_20epchs_Model_vocab.json',
                       config_path='50nds_4Lrs_20epchs_Model_config.json')

textgen.generate_samples(50)

####################
Temperature: 0.2
####################
The one will be foot an it won't be too long, oh no   Let me say "hey"  I want to party on your pussy, baby I want to party on your pussy, baby I want to party on your pussy, baby I want to party on your pussy, baby I want to party on your pussy, baby I want to party on your pussy, baby I want to p

What I want to do  When you show me your soul  Sentimental get away The brothers cup) (We're the brothers cup) (We're the brothers cup) (We're the brothers cup) (We're the brothers cup) we like to me and it won't be long No it won't be too long, oh no   I can tell you  You've got to get it put it i

I don't believe it's bad Slit my throat it's all I ever  Please don't turn me to the place I want to go back  He star My love is like a rollercoaster baby shakin' my soul  I know you can  Like I said you want to do  We are the brothers cup) (We're the brothers cup) (We're the brothers cup) (We're t









I don't believe it's bad Slit 


I don't believe it's bad Slit my throat it's all I ever wanted was a colors of my friend   Come on, dea

Sometimes I have to the price get chare  Come again goes  The satry on my man  It's better burn skinny sweaty man in the sky Knock my backwoods now The things I know  And if I could be so long   Where I could do it in the sky (Testify, testify, kick a hole right into the two hear more than ever  Tr

Because the wiser From the bit and the living and I'm a guided But I know you can  Do you want to know I'm falling into guru muhk  I want to party on your pussy, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah  Standing in line 

What I'm talking in the sky when you share Hey hey Mother mind I'm such a life suck my mind  Love into get it up and jump Get up and jump, get up and jump, get up and jump, get up and jump, get up and jump, get up and jump Get up and jump Get up and jump, get up and jump, get up and jump, ge


The way I don't believe it's bad Slit my throat it's all I ever wanted was you need me your surw Chee standing on the other side  Come again some come is my aeroplane Something realing Tell me a come and the brightress in my aer so every day for the flowers are the banana  Everyone knows anything t

I think you did don't you love me like you used to do?  Who hold made is rock me down the flain of nature Because the sun me into the same Bust fast away  Let's up and jump Get up and jump, get up and jump, get up and jump, get up and jump Get up and jump, get up and jump, get up and jump Get up an

That we know our sing, sing mighty, if you want it all faller when she said now Good God in your teacup girl what I want to go  Deep inside the bright You could do it with a stragie me down the cold decorns and the way I tried to say I know you want to rock in my popped the mountains in the sky (Te

What you out of the banana What I've got you've got to be afraid  Fight like a brave Don't be a 


For it And she got to say for sure The belictome inside, well, I don't think it's new that it all alonceate and ed I've will for face Dirty but I'm a part of all fill ground   And I just don't know both never get who knee aw yeah  Well, I like dirt I never met I take it on the girling into get conn

I don't know with a mona

Pass ship it all The stars us I'm not before They do must she chill I know You get a little losing me is that it's about to walk have beneath to see the sky Ain't everybody head the cold danced assed Her, this is with my aeroplane To take a burn a little bit of a clouds crimes to Berated examonun  



Farthing Engle the give the underma'ns Count Me and what you want to keep it pumping down and over to your bust I'm night, hlymio out show me up and go What I saw your heart something 'to do our nostle   Slow from the Flack Give it away now

Don't you love me fob  Two helpshtrations My friend  My kow that we're gonna setting of the Man night Those go down for your pa

## Extra Bonus Code
---
I realized that this code is reproducable for any artist on lyrics.com. This code allows you to name any singer and if their songs are listed on their artist page, it will return a dataframe with all of the artist's songs, song durations, and lyrics. Just copy this code and call lookup(). Enjoy!

In [24]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests

def lookup():
    artist = input('Enter an artist name: ')
    url = 'https://www.lyrics.com/artist/' + artist.split(' ')[0].lower()
    for i in range(1,len(artist.split())):
        url += '%20' + artist.split(' ')[i].lower()
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    
    son = []
    tim = []
    songs = soup.find_all('td', {'class': 'tal qx'})
    for i in songs[::2]:
        son.append(i.text)
    for i in songs[1::2]:
        tim.append(i.text)
    base_url = 'https://www.lyrics.com/'
    lyr = []
    for i in range(len(songs)):
        if songs[i].a is None:
            pass
        else:
            lyr_url = base_url + songs[i].a.attrs['href']
            r = requests.get(lyr_url)
            soup = BeautifulSoup(r.text, 'html.parser')
            lyr.append(soup.find('pre', {'id': 'lyric-body-text'}).text)
    df = pd.DataFrame({'song': son, 'time': tim, 'lyrics': lyr})
    return df