# DL. Task 3. LSTM

<h5 style = "text-align : right;">Louis Salomé</h5>

<br>Here are the main parts of this notebook :
1. Pick poems
3. Train LSTM as language model on your data (with 1 or 2 layers)
4. Train GRU (with 1 or 2 layers)
5. Compare metrics of GRU and LSTM. Compare poetries, generated by your models.


In [1]:
### import part

# Process data
import numpy as np

# Process strings
import re

# Measure time
import time
from time import time

#Simulate randomness
import random as rd

# Load Data
import keras

# Neural Networks
from keras import optimizers
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten, Input,  Embedding, LSTM, GRU
from keras import backend as K
from keras.preprocessing.sequence import pad_sequences
from keras.preprocessing.text import Tokenizer
from keras.callbacks import EarlyStopping
import keras.utils as ku 

# sklearn
import sklearn
from sklearn.metrics import accuracy_score

# tensorflow
import tensorflow as tf

# Use only one of the 4 GPUs
# Using the command nvidia-sim we can chose the freest GPU
import sys
import os 
os.environ['CUDA_VISIBLE_DEVICES'] = '1' # Here is the NUMBER_OF_GPU
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)

Using TensorFlow backend.


# 1. Take a poem

In [2]:
data = """Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;
Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,
And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.
I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference.
I have been one acquainted with the night.
I have walked out in rain—and back in rain.
I have outwalked the furthest city light.
I have looked down the saddest city lane.
I have passed by the watchman on his beat
And dropped my eyes, unwilling to explain.
I have stood still and stopped the sound of feet
When far away an interrupted cry
Came over houses from another street,
But not to call me back or say good-bye;
And further still at an unearthly height,
One luminary clock against the sky
Proclaimed the time was neither wrong nor right. 
I have been one acquainted with the night.
My long two-pointed ladder's sticking through a tree 
Toward heaven still, 
And there's a barrel that I didn't fill 
Beside it, and there may be two or three 
Apples I didn't pick upon some bough. 
But I am done with apple-picking now. 
Essence of winter sleep is on the night, 
The scent of apples: I am drowsing off. 
I cannot rub the strangeness from my sight 
I got from looking through a pane of glass 
I skimmed this morning from the drinking trough 
And held against the world of hoary grass. 
It melted, and I let it fall and break. 
But I was well 
Upon my way to sleep before it fell, 
And I could tell 
What form my dreaming was about to take. 
Magnified apples appear and disappear, 
Stem end and blossom end, 
And every fleck of russet showing clear. 
My instep arch not only keeps the ache, 
It keeps the pressure of a ladder-round. 
I feel the ladder sway as the boughs bend. 
And I keep hearing from the cellar bin 
The rumbling sound 
Of load on load of apples coming in. 
For I have had too much 
Of apple-picking: I am overtired 
Of the great harvest I myself desired. 
There were ten thousand thousand fruit to touch, 
Cherish in hand, lift down, and not let fall. 
For all 
That struck the earth, 
No matter if not bruised or spiked with stubble, 
Went surely to the cider-apple heap 
As of no worth. 
One can see what will trouble 
This sleep of mine, whatever sleep it is. 
Were he not gone, 
The woodchuck could say whether it's like his 
Long sleep, as I describe its coming on, 
Or just some human sleep.
The way a crow 
Shook down on me 
The dust of snow 
From a hemlock tree 
Has given my heart 
A change of mood 
And saved some part 
Of a day I had rued.
This saying good-bye on the edge of the dark
And cold to an orchard so young in the bark
Reminds me of all that can happen to harm
An orchard away at the end of the farm
All winter, cut off by a hill from the house.
I don't want it girdled by rabbit and mouse,
I don't want it dreamily nibbled for browse
By deer, and I don't want it budded by grouse.
(If certain it wouldn't be idle to call
I'd summon grouse, rabbit, and deer to the wall
And warn them away with a stick for a gun.)
I don't want it stirred by the heat of the sun.
(We made it secure against being, I hope,
By setting it out on a northerly slope.)
No orchard's the worse for the wintriest storm;
But one thing about it, it mustn't get warm.
"How often already you've had to be told,
Keep cold, young orchard. Good-bye and keep cold.
Dread fifty above more than fifty below."
I have to be gone for a season or so.
My business awhile is with different trees,
Less carefully nourished, less fruitful than these,
And such as is done to their wood with an axe—
Maples and birches and tamaracks.
I wish I could promise to lie in the night
And think of an orchard's arboreal plight
When slowly (and nobody comes with a light)
Its heart sinks lower under the sod.
But something has to be left to God.
The land was ours before we were the land’s.
She was our land more than a hundred years
Before we were her people. She was ours
In Massachusetts, in Virginia,
But we were England’s, still colonials,
Possessing what we still were unpossessed by,
Possessed by what we now no more possessed.
Something we were withholding made us weak
Until we found out that it was ourselves
We were withholding from our land of living,
And forthwith found salvation in surrender.
Such as we were we gave ourselves outright
(The deed of gift was many deeds of war)
To the land vaguely realizing westward,
But still unstoried, artless, unenhanced,
Such as she was, such as she would become.
Others taunt me with having knelt at well-curbs
Always wrong to the light, so never seeing
Deeper down in the well than where the water
Gives me back in a shining surface picture
Me myself in the summer heaven godlike
Looking out of a wreath of fern and cloud puffs.
Once, when trying with chin against a well-curb,
I discerned, as I thought, beyond the picture,
Through the picture, a something white, uncertain,
Something more of the depths—and then I lost it.
Water came to rebuke the too clear water.
One drop fell from a fern, and lo, a ripple
Shook whatever it was lay there at bottom,
Blurred it, blotted it out. What was that whiteness?
Truth? A pebble of quartz? For once, then, something.
"""
data = re.sub(r'[^\w\s]','',data)

# 2. LSTM

Code from : https://medium.com/@shivambansal36/language-modelling-text-generation-using-lstms-deep-learning-for-nlp-ed36b224b275

In [3]:
tokenizer = Tokenizer()

def dataset_preparation(data):
    # get tokens
    corpus = data.lower().split("\n")    
    tokenizer.fit_on_texts(corpus)
    total_words = len(tokenizer.word_index) + 1
    print("Total number of words : ",total_words)
    
    # convert corpus into a flat dataset
    input_sequences = []
    for line in corpus:
        token_list = tokenizer.texts_to_sequences([line])[0]
        for i in range(1, len(token_list)):
            n_gram_sequence = token_list[:i+1]
            input_sequences.append(n_gram_sequence)
    print("len(input_sequences)  : ",len(input_sequences))
            
    # uniformize data
    max_sequence_len = max([len(x) for x in input_sequences])
    input_sequences = np.array(pad_sequences(input_sequences,   
                          maxlen=max_sequence_len, padding='pre'))
    
    """
    Sentence: "they are learning data science"
    PREDICTORS             | LABEL
    they                   | are
    they are               | learning
    they are learning      | data
    they are learning data | science
    """
    
    # get labels
    predictors, label = input_sequences[:,:-1],input_sequences[:,-1]
    label = ku.to_categorical(label, num_classes=total_words)
    return predictors, label, max_sequence_len ,total_words

In [31]:
# build our recurrent neural network
def create_model(predictors, label, max_sequence_len, total_words, myLayer,optimizer="adam",lr=0.001,units=150, twoLayers=False):
    input_len = max_sequence_len - 1
    model = Sequential()
    model.add(Embedding(total_words, 10, input_length=input_len))
    if not(twoLayers):
        model.add(myLayer(units = units))
    else :
        model.add(myLayer(units = units, return_sequences = True))
        model.add(myLayer(units = units))
        
    # set optimizer
    if optimizer=="adam":
        # default lr = 0.001
        this_optimizer = keras.optimizers.Adam(lr=lr)
    if optimizer=="adadelta":
        # default lr = 1
        this_optimizer = keras.optimizers.Adadelta(lr=100*lr)
    if optimizer=="rmsprop":
        # default lr = 0.002
        this_optimizer = keras.optimizers.RMSprop(lr=2*lr)
        
    model.add(Dropout(0.1))
    model.add(Dense(total_words, activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer=optimizer)
    model.fit(predictors, label, epochs=200,batch_size=32 ,verbose=0)
    return model

In [32]:
# use rnn to generate text
def generate_text(seed_text, next_words, max_sequence_len, model):
    for j in range(next_words):
        token_list = tokenizer.texts_to_sequences([seed_text])[0]
        token_list = pad_sequences([token_list], 
                                   maxlen= max_sequence_len-1, 
                                   padding='pre')
        predicted = model.predict_classes(token_list, verbose=0)
  
        output_word = ""
        for word, index in tokenizer.word_index.items():
            if index == predicted:
                output_word = word
                break
        seed_text += " " + output_word
    return seed_text

In [6]:
# load data
X, Y, max_len, total_words = dataset_preparation(data)

Total number of words :  468
len(input_sequences)  :  916


In [33]:
%%time
# create model
myLayer = LSTM
modelLSTM        = create_model(X, Y, max_len, total_words, myLayer)
modelLSTM2layers = create_model(X, Y, max_len, total_words, myLayer, twoLayers = True)

CPU times: user 6min 26s, sys: 36 s, total: 7min 2s
Wall time: 4min 1s


# 3. GRU

In [34]:
%%time
# create model
myLayer = GRU
modelGRU        = create_model(X, Y, max_len, total_words,myLayer)
modelGRU2layers = create_model(X, Y, max_len, total_words,myLayer,twoLayers=True)

CPU times: user 5min 36s, sys: 33.2 s, total: 6min 9s
Wall time: 3min 28s


# 4. Generate poetry

In [9]:
# create a dictionary
def create_dict():
    corpus = data.lower().split("\n")
    words = []
    for i in range(len(corpus)) :
        words += corpus[i].split(' ')
    return words

#create a new beginning of line
def new_beginning(words):
    l = len(words)
    ind = rd.randint(0,l-2)
    new = ''
    new += words[ind]
    new += ' '
    new += words[ind+1]
    new = new[0].upper() + new[1:].lower()
    return new

# generate a line
def lets_test(str,model):
    return generate_text(str, 4, max_len, model)

# generate a poem
def my_poems(model1, model2):
    words = create_dict()
    poem1, poem2 = '', ''
    for i in range(8):
        beginning = new_beginning(words)
        poem1 += lets_test(beginning, model1)
        poem2 += lets_test(beginning, model2)
        poem1 += '\n'
        poem2 += '\n'
    return poem1, poem2

The two following poems are generated line per line. We give the algorithm 2 words to begin each line, and it generates the end of the lines.

In [35]:
# Run this cell over and over to see new poems

# pick 2 models from
models = [modelLSTM, modelLSTM2layers, modelGRU, modelGRU2layers]

model1, model2 = models[1],  models[3]
poemLSTM, poemGRU = my_poems(model1, model2)

print("Poem generated with model1 :\n\n"+poemLSTM)
print("****************************\n")
print("Poem generated with model2 :\n\n"+poemGRU)

Poem generated with model1 :

To way it wouldnt be idle
By rabbit it out on a
Summer heaven appear and disappear comes
Done to blotted it out what
 and think of an russet
For the houses from another street
In the no from another street
Of winter no worth against being

****************************

Poem generated with model2 :

To way it i dont it
By rabbit it out on a
Summer heaven could snow whether plight
Done to the land vaguely realizing
 and i keep hearing from
For the strangeness from my sight
In the great harvest i myself
Of winter it is done through



# 5. Accuracy & Hyperparameters

In [36]:
Y_flat = np.argmax(Y,axis=1)

def prediction(model):
    Y_pred = model.predict(X)
    Y_pred_flat = np.argmax(Y_pred,axis=1)
    acc = accuracy_score(Y_pred_flat,Y_flat)
    return round(100*acc,2)

print("LSTM 1 layer  accuracy : ",prediction(modelLSTM))
print("LSTM 2 layers accuracy : ",prediction(modelLSTM2layers))
print("GRU  1 layer  accuracy : ",prediction(modelGRU))
print("GRU  2 layer  accuracy : ",prediction(modelGRU2layers))

LSTM 1 layer  accuracy :  91.48
LSTM 1 layers accuracy :  86.03
GRU  1 layer  accuracy :  92.36
GRU  2 layer  accuracy :  91.81


Let's tune some hyperparameters to find better accuracy. We will play with :
+ optimizer
+ learning rate
+ units


In [37]:
# our small searchspace
optimizer_list = ["adam","adadelta","rmsprop"]
lr_list = [0.005,0.002,0.001,0.0005]
units_list = [50,100,150,200,250]

In [47]:
%%time

N_iter = 10
for i in range(N_iter):
    a,b,c = rd.randint(0,len(optimizer_list)-1), rd.randint(0,len(lr_list)-1), rd.randint(0,len(units_list)-1)
    optimizer = optimizer_list[a]
    lr = lr_list[b]
    units = units_list[c]
    configuration = "optimizer     : "+optimizer+"\n learning rate : "+str(lr) +"\n units         : "+ str(units)
    if rd.random()<0.5:
        configuration +="\n RNNtype       : "+"LSTM"
        myLayer = LSTM
    else :
        configuration +="\n RNNtype       : "+"GRU"
        myLayer = GRU
    model = create_model(X, Y, max_len, total_words, myLayer,optimizer=optimizer,lr=lr,units=units, twoLayers=False)
    print("*********************\n",
          configuration,"\n ===> accuracy :",
          prediction(model),"\n")

*********************
 optimizer     : adadelta
 learning rate : 0.001
 units         : 100
 RNNtype       : LSTM 
 ===> accuracy :  9.17 

*********************
 optimizer     : rmsprop
 learning rate : 0.005
 units         : 50
 RNNtype       : GRU 
 ===> accuracy :  77.84 

*********************
 optimizer     : adam
 learning rate : 0.001
 units         : 250
 RNNtype       : LSTM 
 ===> accuracy :  92.36 

*********************
 optimizer     : rmsprop
 learning rate : 0.001
 units         : 100
 RNNtype       : LSTM 
 ===> accuracy :  85.7 

*********************
 optimizer     : adam
 learning rate : 0.0005
 units         : 200
 RNNtype       : LSTM 
 ===> accuracy :  92.25 

*********************
 optimizer     : rmsprop
 learning rate : 0.0005
 units         : 150
 RNNtype       : LSTM 
 ===> accuracy :  88.1 

*********************
 optimizer     : rmsprop
 learning rate : 0.001
 units         : 150
 RNNtype       : LSTM 
 ===> accuracy :  88.86 

*********************
 optim

Our conclusions on hyperparameters are :
+ We have a lot of words in our data so "units" which is the dimensionality of the output space, needs to be large. 250 is correct.
+ GRU is globally a bit more precise than LSM but it doesn't make the difference. We can chose GRU to have lower computation time.
+ Optimizer "adam" worked the best. But there is still a problem with learning_rate proportionality for the other optimizers...

Our best improvement overall was to set a big number of epochs (200) to train on this text. This is another advantage of GRU, it trains in fewer epochs. The batch size was set at 32 after some tests. It was making the loss decrease the fastest.