[View in Colaboratory](https://colab.research.google.com/github/mukul-rathi/workshop-deep-learning/blob/master/LSTM.ipynb)

# LSTM 

This is the notebook accompanying the LSTM workshop. 




In [0]:
import numpy as np

#import the keras functions
from keras.preprocessing import sequence
from keras.models import Model, Sequential
from keras.layers import  Input, Embedding, Dense, LSTM
from keras.optimizers import Adam
from  keras.utils import to_categorical as OneHotEncode

#import the IMDB dataset
from keras.datasets import imdb

Now to import the data:

In [0]:
max_features = 20000
maxlen = 200

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)


#Preprocess the data so the same length (padding as necessary)
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)



In [0]:
def initLSTM():
  rnn = Sequential()
  rnn.add(Embedding(max_features, 128)) #word embeddings for each of words in input
  rnn.add(LSTM(128, recurrent_dropout = 0.2)) #LSTM cell
  rnn.add(Dense(1, activation='sigmoid')) #output prediction
  return rnn


In [0]:
rnn = initLSTM()

Next to compile and train the model:


In [23]:
# try using different optimizers and different optimizer configs
rnn.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

rnn.fit(x_train, y_train,
          batch_size=512,
          epochs=5,
          validation_data=(x_test, y_test))



Train on 25000 samples, validate on 25000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f859b8faef0>

In [24]:
score, acc = rnn.evaluate(x_test, y_test,
                            batch_size=512)
print("Test accuracy: " + str(acc))

Test accuracy: 0.7975199996376038


## See specific predictions:

In [0]:

#create a mapping from the index to the word
idx_to_word = {(v+3):k for k,v in imdb.get_word_index().items()}
idx_to_word.update({0:"<PAD>", 1: "<START>", 2: "<UNK>",3:"<UNUSED>"}) #first 3 indices are special tokens 

vocab_size = np.max(list(idx_to_word.keys()))
#this is a helper function - good to debug performance of model during training
def print_review(x):
    text = ""
    for idx in x:
        text += idx_to_word.get(idx, "<UNK>") + " " #if word not in dictionary it is unknown
    return text
  
  


In [30]:
#choose a random review 
review_num = np.random.randint(0,x_test.shape[0])
review = x_test[review_num]
review = np.reshape(review, (1, x_test.shape[1]))

print("The predicted sentiment is: " + str((rnn.predict(review))))
print(print_review(review[0]))
print("The actual sentiment is: " + str(y_test[review_num]))

The predicted sentiment is: [[0.9916632]]
the spectators look different at the same scenes when they are told first from <UNK> point of view then from one the main actors do very good and especially the growing love between the two women is convincingly developed with a first culmination in a very tender love scene between the two and finally forgiving all the evil they were ready to do and did to each other because they still love each other br br for each of her books the author sarah waters has thoroughly investigated what life was like in british 19th century while in <UNK> the velvet it was the world of the vaudeville theaters and the beginning of social movements in affinity the dreadful reality of women <UNK> and the fashionable <UNK> of spirits in she depicts the public ceremony of hanging people in london and the inhuman treatment of persons supposed or declared disturbed in <UNK> based on the reading of sources and scientific research this is very well transferred to the film