<span style="font-size: 2em; font-weight:bold">AI 70's Country</span>

In [1]:
import warnings
warnings.filterwarnings('ignore')

import scipy
import numpy as np
import matplotlib
import pandas as pd
import statsmodels
import sklearn
import tensorflow
import keras

from keras.models import Sequential, load_model
from keras.layers import Dense,LSTM
from keras.utils import to_categorical
from keras.preprocessing.sequence import pad_sequences
from keras.callbacks import History
import string
import pyarrow as pa


import plotly.graph_objs as go
import plotly
from plotly.offline import plot, iplot, init_notebook_mode
init_notebook_mode(connected = True)
import plotly.offline as offline
from plotly import tools

import json

Using TensorFlow backend.


As always, data prep is the hardest part of the project.  Because I am going to use a validation set in training my model, and because Keras uses the last n% of the data as the validation set, I want to shuffle the lyrics so that my validation set contains a better representation of all the data - not just the last song.  I also want to get the most originality that I can out of the model, so I will eliminate duplicate lyrics (some songs have refrains that repeat multiple times).  

In [2]:
lyricsCSV = pd.read_csv('lyricsTrain.csv',encoding='ISO-8859-1')
lyricsCSV.sort_values('lyrics',inplace=True)
lyricsCSV.drop_duplicates(keep='first',inplace=True)
lyricsCSV = lyricsCSV.sample(frac=1)



lyricsCSV.to_csv('lyrics.txt',sep='\t',index=False)
l = open('lyrics.txt','r')
lyrics = l.read()
l.close()

Remove lines and print

In [3]:
tokens = lyrics.split()
lyrics = ' '.join(tokens)
print(lyrics)

lyrics That I should have been home yesterday, yesterday Got us feuding like the Hatfields and McCoys And combed my hair Hello sunshine They love our milk and honey I'd play Sally Goodin all day if I could You needed me, you needed me No one will ever know just how much I love you so And somewhere far away And Jerry Jeff's train songs and Blue Eyes Cryin' in the Rain Shine on me sunshine I turned around and there was a big old cop Well I wouldn't trade my life for diamonds and jewels """When you hot, you hot""" People see us everywhere they all think you really care I don't need my name in the marquee lights On the road to my horizon Gotta go, I love you But I lit my first and watched a small kid The disappearing dreams of yesterday Can forget I've ever known her And here I am'a walking down sixty-six Would you marry me anyway When ever I chance to meet, some old friends on the street Save your love through sorrow The judge was a fishin' buddy that I recognized But what would it matter

One final thing, when I look at the above lyrics, I seem to see a LOT of quotation marks.  So, I am going to just replace those with a space.  

In [4]:
lyrics = lyrics.replace('"',' ')

Now, we can build sequences of characters that will be used to predict a final character

In [5]:
length = 100 # Length of the characer sequences (because we have so much verbage,
             # we can use a relatively large number)
sequences = list()
for i in range(length, len(lyrics)):
    seq = lyrics[i-length:i+1]
    sequences.append(seq)

Create and save a .txt file of our sequences with line endings

In [6]:
data = '\n'.join(sequences)
file = open('char_sequences.txt','w')
file.write(data)
file.close()

Create a dictionary of character:number mappings 

In [7]:
file = open('char_sequences.txt','r')
raw_text = file.read()
file.close()

lines = raw_text.split('\n')

chars = sorted(list(set(raw_text)))
mapping = dict((c, i) for i, c in enumerate(chars))

# Save the mapping as json 
json_map = json.dumps(mapping)
__ = open('mapping.json','w')
__.write(json_map)
__.close()

Use the dictionary to create sequences of numbers only (numbers that describe the characters)

In [8]:
sequences = list()
for line in lines:
    encoded_seq = [mapping[char] for char in line]
    sequences.append(encoded_seq)
    


Create input sets (with 99 characters) and output sets (1 character) and then one-hot code the sets so we can use them to train the model.

In [9]:
vocab_size = len(mapping)
sequences = np.array(sequences)
X, y = sequences[:,:-1], sequences[:,-1]
sequences = [to_categorical(x, num_classes=vocab_size) for x in X] #one-hot code input
X = np.array(sequences)
y = to_categorical(y, num_classes=vocab_size) #one-hot code output

Fit the model

In [10]:
#Define parameters
units = 512
epochs = 100
validationSplit = 0.2
shuffle = True

#Define callbacks
es = keras.callbacks.EarlyStopping(monitor = 'val_acc',min_delta = .01, patience = 10, mode = 'max',verbose=1)


# define and fit model
model = Sequential()
model.add(LSTM(units, input_shape=(X.shape[1], X.shape[2])))
model.add(Dense(vocab_size, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

modelLyrics = model.fit(X, y, epochs = epochs, validation_split = validationSplit, 
                        shuffle = shuffle, verbose=1,callbacks=[es])


#save model and history
model.save('model.units512.es_val_acc.2')

json_hist = json.dumps(modelLyrics.history)
__ = open('modelLyricsHistory2.json','w')
__.write(json_hist)
__.close()



Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.
Train on 17874 samples, validate on 4469 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 00017: early stopping


In [11]:
# store history

In [12]:
history = pd.DataFrame(modelLyrics.history)
history.to_csv('modelLyricsHistory.csv',index=False)

In [13]:
history

Unnamed: 0,val_loss,val_acc,loss,acc
0,2.462709,0.324681,2.794648,0.25663
1,2.239567,0.375699,2.302918,0.355041
2,2.068797,0.392034,2.16853,0.395547
3,1.949321,0.436787,1.957283,0.43387
4,1.885776,0.454464,1.820162,0.466935
5,1.846147,0.466995,1.673535,0.508112
6,1.822989,0.482211,1.509013,0.558241
7,1.825488,0.488029,1.335102,0.599698
8,1.872218,0.4887,1.136174,0.65917
9,1.936872,0.485791,0.923837,0.721271


Create a function that encodes a kickoff text string and then plugs it into our trained model

In [14]:
# generate a sequence of characters with a language model
def generate_seq(model, mapping, seq_length, seed_lyric, n_chars):
    lyrics = seed_lyric
    for __ in range(n_chars):
    # encode the characters as integers
        encoded = [mapping[char] for char in lyrics]
    # truncate sequences to a fixed length
        encoded = pad_sequences([encoded], maxlen=seq_length, truncating='pre')
    # one hot encode
        encoded = to_categorical(encoded, num_classes=len(mapping))
    # predict character
        yhat = model.predict_classes(encoded, verbose=0)
    # reverse map integer to character
        out_char = ''
        for char, index in mapping.items():
            if index == yhat:
                out_char = char
                break
    # append to input
        lyrics += char
    return lyrics

Is there a song other than Stairway to Heaven that I could have used for the kickoff sequence?

In [19]:
startLyrics = "As I walk through the valley of the shadow of death I take a look at my life and realize there's not"

Run the model and print the lyrics

In [20]:
model = load_model('model.units512.es_val_acc.2')
lyricsFinal = generate_seq(model,mapping,length,startLyrics,1000)
print(lyricsFinal)

As I walk through the valley of the shadow of death I take a look at my life and realize there's nothing shinin' on me To face the world out on my own again In the summertime we'd all get a brand new pair I'll fix your lunch Cussin' at a can that he was kickin Honey, come back where you belong to only me. And standin' up for things they believe in Buy some soumthing old the blie of my shist in our live and lat I cried a tear, you wiped it dry You gave me strength to stand alone again But when they're runnin' down our country, man Your smile is like a breath of spring That's all I'm taking with me He like the night life, the bright lights are callin' ya, honey. All you gotta do is smile that smile Let's go to Luckenbach, Texas This successful life we're livin' So smile for a while and let's be jolly Dark and dusty, painted on the sky Then I fumbled through my closet There once was a time when I could not imagine And turned my hind and little fine to pay And I will always love you Delta 

In [17]:
df = pd.read_json('modelLyricsHistory.json')

In [18]:
df

Unnamed: 0,val_loss,val_acc,loss,acc
0,2.391513,0.338072,2.686099,0.280099
1,2.203031,0.387909,2.237684,0.37789
2,1.951543,0.440829,2.020886,0.428327
3,1.848852,0.462065,1.823551,0.470543
4,1.79937,0.475595,1.665167,0.512417
5,1.761656,0.49803,1.499977,0.557458
6,1.747415,0.506936,1.325721,0.605926
7,1.775016,0.511731,1.13038,0.66287
8,1.864464,0.500771,0.932956,0.720414
9,1.938054,0.506594,0.741048,0.781255
