<span style="font-size: 2em; font-weight:bold">AI 70's Country</span>

In [1]:
import warnings
warnings.filterwarnings('ignore')

import scipy
import numpy as np
import matplotlib
import pandas as pd
import statsmodels
import sklearn
import tensorflow
import keras

from keras.models import Sequential, load_model
from keras.layers import Dense,LSTM
from keras.utils import to_categorical
from keras.preprocessing.sequence import pad_sequences
from keras.callbacks import History
import string
import pyarrow as pa


import plotly.graph_objs as go
import plotly
from plotly.offline import plot, iplot, init_notebook_mode
init_notebook_mode(connected = True)
import plotly.offline as offline
from plotly import tools

import json

Using TensorFlow backend.


As always, data prep is the hardest part of the project.  Because I am going to use a validation set in training my model, and because Keras uses the last n% of the data as the validation set, I want to shuffle the lyrics so that my validation set contains a better representation of all the data - not just the last song.  I also want to get the most originality that I can out of the model, so I will eliminate duplicate lyrics (some songs have refrains that repeat multiple times).  

In [2]:
lyricsCSV = pd.read_csv('lyrics.csv',encoding='ISO-8859-1')
lyricsCSV.sort_values('lyrics',inplace=True)
lyricsCSV.drop_duplicates(keep='first',inplace=True)
lyricsCSV = lyricsCSV.sample(frac=1)



lyricsCSV.to_csv('lyrics.txt',sep='\t',index=False)
l = open('lyrics.txt','r')
lyrics = l.read()
l.close()

Remove lines and print

In [3]:
tokens = lyrics.split()
lyrics = ' '.join(tokens)
print(lyrics)

lyrics Even with someone they love From crying when he calls your name A raisin' me a family and workin' on a farm (La la la la la la la) (La la la la la) """If you wasn't wearin' that black robe I'd take out in back of this courthouse" And shakin' me up so that all I really know And standin' up for things they believe in I, I will always, always love you I needed you and you were there I walk away from trouble when I can Here you come again lookin' better than a body has a right to Rhinestone cowboy To take you to his mansion in the sky? Cussin' at a can that he was kickin Don't let 'em pick guitars or drive them old trucks That didn't hurt I'd be carrying the pots you made And thank God you're a country boy He'd never stood one single time to prove the county wrong And my soft shoes shining She's a good-hearted woman in love with a good-timin' man Like I aint got nothing on All day long in the field a hoin' corn I could sing you a tune and promise you the moon He shoveled coal to mak

One final thing, when I look at the above lyrics, I seem to see a LOT of quotation marks.  So, I am going to just replace those with a space.  

In [4]:
lyrics = lyrics.replace('"',' ')

Now, we can build sequences of characters that will be used to predict a final character

In [5]:
length = 100 # Length of the characer sequences (because we have so much verbage,
             # we can use a relatively large number)
sequences = list()
for i in range(length, len(lyrics)):
    seq = lyrics[i-length:i+1]
    sequences.append(seq)

Create and save a .txt file of our sequences with line endings

In [6]:
data = '\n'.join(sequences)
file = open('char_sequences.txt','w')
file.write(data)
file.close()

Create a dictionary of character:number mappings 

In [7]:
file = open('char_sequences.txt','r')
raw_text = file.read()
file.close()

lines = raw_text.split('\n')

chars = sorted(list(set(raw_text)))
mapping = dict((c, i) for i, c in enumerate(chars))

# Save the mapping as json 
json_map = json.dumps(mapping)
__ = open('mapping.json','w')
__.write(json_map)
__.close()

Use the dictionary to create sequences of numbers only (numbers that describe the characters)

In [8]:
sequences = list()
for line in lines:
    encoded_seq = [mapping[char] for char in line]
    sequences.append(encoded_seq)
    


Create input sets (with 99 characters) and output sets (1 character) and then one-hot code the sets so we can use them to train the model.

In [9]:
vocab_size = len(mapping)
sequences = np.array(sequences)
X, y = sequences[:,:-1], sequences[:,-1]
sequences = [to_categorical(x, num_classes=vocab_size) for x in X] #one-hot code input
X = np.array(sequences)
y = to_categorical(y, num_classes=vocab_size) #one-hot code output

Fit the model

In [10]:
units = 512
epochs = 100

# Define early stopping callbacks
es = keras.callbacks.EarlyStopping(monitor = 'val_acc',min_delta = .01, patience = 2, mode = 'auto',verbose=1)


# define and fit model
model = Sequential()
model.add(LSTM(units, input_shape=(X.shape[1], X.shape[2]))) # changed verbosity to 1  This takes a long time to run
                                                           # so you will want verbosity at some level.
model.add(Dense(vocab_size, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

modelLyrics = model.fit(X, y, epochs = epochs, validation_split = 0.2, shuffle = True, verbose=1,callbacks=[es])


#save model and history
model.save('model.units512.es_val_acc')

json_hist = json.dumps(modelLyrics.history)
__ = open('modelLyricsHistory.json','w')
__.write(json_hist)
__.close()



Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.
Train on 17874 samples, validate on 4469 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 00010: early stopping


In [11]:
history = modelLyrics.history

In [12]:
json2 = json.dumps(history)

In [13]:
# store history

Create a function that encodes a kickoff text string and then plugs it into our trained model

In [14]:
__ = open('verboseUnits512.txt','r')
verbose = __.read()
__.close()

verbose = verbose.replace('loss: ',"$")
verbose = verbose.replace('acc: ',"+")

loss = ' '.join([word for word in verbose.split() if word.startswith("$")])
loss = loss.split("$")[1:]
loss = list(map(float,loss))

acc = ' '.join([word for word in verbose.split() if word.startswith("+")])
acc = acc.split("+")[1:]
acc = list(map(float,acc))


df = pd.DataFrame({'loss':loss,'acc':acc})

loss = go.Scatter(y=df.loss,x=df.index,line={'color':'#072b61'},name='loss')
acc = go.Scatter(y=df.acc,x=df.index,line={'color':'#146320'},name='acc',yaxis='y2')

layout = go.Layout(title='loss and acc vs Epoch',xaxis={'title':'Epoch'},yaxis = {'title':'loss','showgrid':False},yaxis2 = {'title':'acc','overlaying':'y','side':'right','showgrid':False})

fig = go.Figure(data=[loss,acc],layout=layout)

offline.iplot(fig)

In [15]:
# generate a sequence of characters with a language model
def generate_seq(model, mapping, seq_length, seed_lyric, n_chars):
    lyrics = seed_lyric
    for __ in range(n_chars):
    # encode the characters as integers
        encoded = [mapping[char] for char in lyrics]
    # truncate sequences to a fixed length
        encoded = pad_sequences([encoded], maxlen=seq_length, truncating='pre')
    # one hot encode
        encoded = to_categorical(encoded, num_classes=len(mapping))
    # predict character
        yhat = model.predict_classes(encoded, verbose=0)
    # reverse map integer to character
        out_char = ''
        for char, index in mapping.items():
            if index == yhat:
                out_char = char
                break
    # append to input
        lyrics += char
    return lyrics

Is there a song other than Stairway to Heaven that I could have used for the kickoff sequence?

In [16]:
startLyrics = "Playing ethnicky jazz To parade your snazz On your five-grand stereo Braggin' that you know How the "

Run the model and print the lyrics

In [17]:
lyricsFinal = generate_seq(model,mapping,length,startLyrics,1000)
print(lyricsFinal)

Playing ethnicky jazz To parade your snazz On your five-grand stereo Braggin' that you know How the bading the song the whole U.S.A My days are all the good times the badin' on the sun hear I'd not in's back home again Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolene, Jolen

In [18]:
df = pd.read_json('modelLyricsHistory.json')

In [19]:
df

Unnamed: 0,val_loss,val_acc,loss,acc
0,2.421204,0.337436,2.818626,0.248406
1,2.269156,0.374804,2.31816,0.353586
2,2.052064,0.405684,2.170305,0.39191
3,1.927132,0.437234,1.957093,0.433814
4,1.871062,0.464981,1.810224,0.469621
5,1.877762,0.453793,1.66651,0.512141
6,1.800563,0.486462,1.51554,0.550856
7,1.777263,0.498769,1.348565,0.599362
8,1.828017,0.498098,1.149072,0.657268
9,1.906298,0.485343,0.957075,0.714166
