<span style="font-size: 2em; font-weight:bold">AI 70's Country</span>

In [1]:
import warnings
warnings.filterwarnings('ignore')

import scipy
import numpy as np
import matplotlib
import pandas as pd
import statsmodels
import sklearn
import tensorflow
import keras

from keras.models import Sequential, load_model
from keras.layers import Dense,LSTM
from keras.utils import to_categorical
from keras.preprocessing.sequence import pad_sequences
from keras.callbacks import History
import string
import pyarrow as pa


import plotly.graph_objs as go
import plotly
from plotly.offline import plot, iplot, init_notebook_mode
init_notebook_mode(connected = True)
import plotly.offline as offline
from plotly import tools

import json

Using TensorFlow backend.


As always, data prep is the hardest part of the project.  Because I am going to use a validation set in training my model, and because Keras uses the last n% of the data as the validation set, I want to shuffle the lyrics so that my validation set contains a better representation of all the data - not just the last song.  I also want to get the most originality that I can out of the model, so I will eliminate duplicate lyrics (some songs have refrains that repeat multiple times).  

In [2]:
lyricsCSV = pd.read_csv('lyricsTrain.csv',encoding='ISO-8859-1')
lyricsCSV.sort_values(lyricsCSV.columns[0],inplace=True)
lyricsCSV.drop_duplicates(keep='first',inplace=True)
lyricsCSV = lyricsCSV.sample(frac=1)



lyricsCSV.to_csv('lyrics.txt',sep='\t',index=False)
l = open('lyrics.txt','r')
lyrics = l.read()
l.close()

Remove lines and print

In [3]:
tokens = lyrics.split()
lyrics = ' '.join(tokens)
print(lyrics)

I beg your pardon Or let go oh-whoa-whoa-whoa Now you be careful And here I am'a walking down sixty-six I'll just cry all night long Except I can't sleep Do these old shoes look funny Walk away from trouble if you can Where you belong to only me. "I said ""Hey, judge, old buddy, old pal""" When ever I chance to meet, some old friends on the street And held me up and gave me dignity All you gotta do is smile that smile Of someone fryin chicken Yeah, city folk drivin' in a black limousine Honey, I know I've said it too many times before We were poor but we had love Lay it soft upon my skin And them that do sometimes won't know how to take him Please don't take him just because you can But when he lo-oves me, he really lo-oves me Thinking over things I wish I'd said (ooh-ooh) I'd rather wonder a little and have his lovin' They been walking thirty miles Give me your tomorrow But if you ever want somebody to just love ya, and some day you I don't care what's right or wrong I can't believe i

One final thing, when I look at the above lyrics, I seem to see a LOT of quotation marks.  So, I am going to just replace those with a space.  

In [4]:
lyrics = lyrics.replace('"',' ')

Now, we can build sequences of characters that will be used to predict a final character

In [5]:
length = 100 # Length of the characer sequences (because we have so much verbage,
             # we can use a relatively large number)
sequences = list()
for i in range(length, len(lyrics)):
    seq = lyrics[i-length:i+1]
    sequences.append(seq)

Create and save a .txt file of our sequences with line endings

In [6]:
data = '\n'.join(sequences)
file = open('char_sequences.txt','w')
file.write(data)
file.close()

Create a dictionary of character:number mappings 

In [7]:
file = open('char_sequences.txt','r')
raw_text = file.read()
file.close()

lines = raw_text.split('\n')

chars = sorted(list(set(raw_text)))
mapping = dict((c, i) for i, c in enumerate(chars))

# Save the mapping as json 
json_map = json.dumps(mapping)
__ = open('mapping.json','w')
__.write(json_map)
__.close()

Use the dictionary to create sequences of numbers only (numbers that describe the characters)

In [8]:
sequences = list()
for line in lines:
    encoded_seq = [mapping[char] for char in line]
    sequences.append(encoded_seq)
    


Create input sets (with 99 characters) and output sets (1 character) and then one-hot code the sets so we can use them to train the model.

In [9]:
vocab_size = len(mapping)
sequences = np.array(sequences)
X, y = sequences[:,:-1], sequences[:,-1]
sequences = [to_categorical(x, num_classes=vocab_size) for x in X] #one-hot code input
X = np.array(sequences)
y = to_categorical(y, num_classes=vocab_size) #one-hot code output

Fit the model

In [10]:
units = 512
epochs = 100

# Define early stopping callbacks
es = keras.callbacks.EarlyStopping(monitor = 'val_acc',min_delta = .01, patience = 10, mode = 'max',verbose=1)


# define and fit model
model = Sequential()
model.add(LSTM(units, input_shape=(X.shape[1], X.shape[2]))) # changed verbosity to 1  This takes a long time to run
                                                           # so you will want verbosity at some level.
model.add(Dense(vocab_size, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

modelLyrics = model.fit(X, y, epochs = epochs, validation_split = 0.2, shuffle = True, verbose=1,callbacks=[es])


#save model and history
model.save('model.units512.es_val_acc.2')

json_hist = json.dumps(modelLyrics.history)
__ = open('modelLyricsHistory.json','w')
__.write(json_hist)
__.close()



Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.
Train on 23356 samples, validate on 5839 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 00018: early stopping


In [11]:
history = modelLyrics.historyI

In [12]:
json2 = json.dumps(history)

In [13]:
# store history

Create a function that encodes a kickoff text string and then plugs it into our trained model

In [14]:
__ = open('verboseUnits512.txt','r')
verbose = __.read()
__.close()

verbose = verbose.replace('loss: ',"$")
verbose = verbose.replace('acc: ',"+")

loss = ' '.join([word for word in verbose.split() if word.startswith("$")])
loss = loss.split("$")[1:]
loss = list(map(float,loss))

acc = ' '.join([word for word in verbose.split() if word.startswith("+")])
acc = acc.split("+")[1:]
acc = list(map(float,acc))


df = pd.DataFrame({'loss':loss,'acc':acc})

loss = go.Scatter(y=df.loss,x=df.index,line={'color':'#072b61'},name='loss')
acc = go.Scatter(y=df.acc,x=df.index,line={'color':'#146320'},name='acc',yaxis='y2')

layout = go.Layout(title='loss and acc vs Epoch',xaxis={'title':'Epoch'},yaxis = {'title':'loss','showgrid':False},yaxis2 = {'title':'acc','overlaying':'y','side':'right','showgrid':False})

fig = go.Figure(data=[loss,acc],layout=layout)

offline.iplot(fig)

FileNotFoundError: [Errno 2] No such file or directory: 'verboseUnits512.txt'

In [15]:
# generate a sequence of characters with a language model
def generate_seq(model, mapping, seq_length, seed_lyric, n_chars):
    lyrics = seed_lyric
    for __ in range(n_chars):
    # encode the characters as integers
        encoded = [mapping[char] for char in lyrics]
    # truncate sequences to a fixed length
        encoded = pad_sequences([encoded], maxlen=seq_length, truncating='pre')
    # one hot encode
        encoded = to_categorical(encoded, num_classes=len(mapping))
    # predict character
        yhat = model.predict_classes(encoded, verbose=0)
    # reverse map integer to character
        out_char = ''
        for char, index in mapping.items():
            if index == yhat:
                out_char = char
                break
    # append to input
        lyrics += char
    return lyrics

Is there a song other than Stairway to Heaven that I could have used for the kickoff sequence?

In [16]:
startLyrics = "Playing ethnicky jazz To parade your snazz On your five-grand stereo Braggin' that you know How the "

Run the model and print the lyrics

In [19]:
model = load_model('model.units512.es_val_acc')
lyricsFinal = generate_seq(model,mapping,length,startLyrics,1000)
print(lyricsFinal)

Playing ethnicky jazz To parade your snazz On your five-grand stereo Braggin' that you know How the live through sorrow You look into mine becter Lise ain't nobody feelin' no pain So high that I could almost leve you And some day you clockes my and some day you clockes my and some day you clockes my and some day you clockes my and some day you clockes my and some day you clockes my and some day you clockes my and some day you clockes my and some day you clockes my and some day you clockes my and some day you clockes my and some day you clockes my and some day you clockes my and some day you clockes my and some day you clockes my and some day you clockes my and some day you clockes my and some day you clockes my and some day you clockes my and some day you clockes my and some day you clockes my and some day you clockes my and some day you clockes my and some day you clockes my and some day you clockes my and some day you clockes my and some day you clockes my and some day you clockes my

In [20]:
df = pd.read_json('modelLyricsHistory.json')

In [21]:
df

Unnamed: 0,val_loss,val_acc,loss,acc
0,2.391513,0.338072,2.686099,0.280099
1,2.203031,0.387909,2.237684,0.37789
2,1.951543,0.440829,2.020886,0.428327
3,1.848852,0.462065,1.823551,0.470543
4,1.79937,0.475595,1.665167,0.512417
5,1.761656,0.49803,1.499977,0.557458
6,1.747415,0.506936,1.325721,0.605926
7,1.775016,0.511731,1.13038,0.66287
8,1.864464,0.500771,0.932956,0.720414
9,1.938054,0.506594,0.741048,0.781255
