[View in Colaboratory](https://colab.research.google.com/github/mukul-rathi/workshop-deep-learning/blob/master/LSTM.ipynb)

# Intro to Deep Learning 3

This is the notebook accompanying the second Hackers at Cambridge Deep Learning workshop.
In this workshop, you'll implement your own *LSTM neural network* using the **Keras** deep learning framework.


First let's import dependencies:

In [0]:
import numpy as np

#import the keras functions
from keras.preprocessing import sequence
from keras.models import Model, Sequential
from keras.layers import  Input, Embedding, Dense, LSTM
from keras.optimizers import Adam

#import the IMDB dataset
from keras.datasets import imdb

## Reading In the Data:

As in the first and second workshops, you'll want to load in the data, using a nice **load data()** function.

We'll be loading in the movie reviews from the IMDB dataset. `num_words` is a parameter that specifies the number of words our lexicon should contain. 

`x_train / x_test` contain the reviews, and `y_train / y_test` contain the binary labels (0 = negative 1 = positive).

Since some reviews can be very long and so take long to train on, we want to clip the number of words to a maximum length (here we have chosen 200 words since most of the reviews are less than 200 words). 

**EXTENSION:** Look at the distribution of the lengths of the reviews and change the maxlen accordingly to encompass more of the reviews.

### Useful Functions:

    sequence.pad_sequences(input, maxlen).
    

In [0]:
num_words = 20000
maxlen = 200

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=num_words)


#Preprocess the data so the same length (padding as necessary)
x_train = None
x_test = None



## Creating the model:
          
Keras is a high level deep learning framework that makes it really easy to create our own model - in this workshop we will be using the **Sequential API**. 

There are four steps to creating a model in Keras:
 * Define 
 * Compile
 * Fit
 * Evaluate
 
 
 We'll look at each stage in turn:

### Define the model:

This is where we specify the layers in our model.

We are using the Sequential API so to start we'll define the model:

      model = Sequential()
      
This is the base to start on. We can then sequentially add the layers to the model using 

            model.add(layer object)
      
 e.g. `Dense()` - this is a fully-connected layer
 
 so you might have:
 
          model = Sequential()
          model.add(Dense(units=128, activation='relu'))
          model.add(Dense(units=10, activation='softmax'))
 
 and now  *model* would be a 2 layer neural network.
 
 The layers take in different arguments - in the example above Dense() took arguments the number of neurons (128, 10 respectively) and the activation function used. 
 
For our LSTM we'll need 3 different layers:

* [Embeddings](https://keras.io/layers/embeddings/) - this will map each of the words in the lexicon to its word embedding. It takes two arguments `(input dim, output_dim)` - the input dim will be the number of words, and the output dim we can set arbitrarily to 128 (the length of the word vector. )

* [LSTM](https://keras.io/layers/recurrent/) - this is the recurrent part of the model itself - it will take parameters the `input_dim` (the length of the word vectors) and an additional parameter called `recurrent_dropout = 0.2` (we set the value 0.2 to prevent the model *overfitting*).

* [Dense](https://keras.io/layers/core/#dense) has arguments `units=n`  (number of neurons in layer) `activation = __ `  (`'relu', 'softmax', 'sigmoid', 'tanh'` are possible activations). We want one class (positive/negative) - so one neuron, and since it is a probability, we'd like the sigmoid function.
      

In [0]:
def initLSTM():
  lstm = None #fill in the layers
  return lstm


In [0]:
lstm = initLSTM()
lstm.summary() #print the description of the layers in the model

### Compiling the Model:

This is where we specify the loss function we are using, the optimizer we are using and the metrics we want to track when training, as well as a bunch of other information pertaining to training.

        model.compile(arguments)
        
 where the arguments we are interested in are
 
      loss='binary_crossentropy',
      optimizer='adam',
       metrics=['accuracy']
       
       
  `binary_crossentropy` refers to our loss function for probability of pos/neg sentiment. 
  
 `'adam'` refers to the Adam optimiser, a variant of gradient descent which allows for much faster convergence to the minimum.
  

In [0]:
# compile the model




#Extension: try compiling using different optimizers and different optimizer configs





### Fitting the model

This is where we actually train our model, feel free to alter the hyperparameters (e.g. number of epochs to train it for longer etc.)

      model.fit(x=, y=, epochs=, batch_size=, validation_data=(x_test, y_test))

x, y are the training set

The validation_data argument takes as input the **validation dataset**. Here we have split the dataset into train:test split, however typically we would split into train:validation:test - and the purpose of the validation data is to tune the hyperparameters (so we try to maximise performance on the validation data set).

For today, we'll just use x_test, y_test, though I encourage you (as an extension) to split the data up so you have a separate validation set.


In [0]:
#fit the model 



### Evaluating the model

Finally, we can use the model.evaluate function to evaluate how well the model has done on the test set. Here x, y are the test set inputs/labels respectively.

        loss, accuracy = model.evaluate(x=, y=)
        
   

In [0]:
loss, acc = None
print("Test accuracy: " + str(acc))

## See specific predictions:

If you're curious to see how the model performs on random reviews from the test set, run the following cells.

We need a print_review function, since the vector x_train actually contains the *ids* of the words in the lexicon, not the words themselves.

We map the ids to the words, and then iterate through the review to print the overall review.


In [0]:

#create a mapping from the index to the word
idx_to_word = {(v+3):k for k,v in imdb.get_word_index().items()}
idx_to_word.update({0:"<PAD>", 1: "<START>", 2: "<UNK>",3:"<UNUSED>"}) #first 3 indices are special tokens 

vocab_size = np.max(list(idx_to_word.keys()))


#this is a helper function - good to debug performance of model during training
def print_review(x):
    text = ""
    for idx in x:
        text += idx_to_word.get(idx, "<UNK>") + " " #if word not in dictionary it is unknown
    print(text)
  
  


In [0]:
#choose a random review 
review_num = np.random.randint(0,x_test.shape[0])
review = x_test[review_num]
review = np.reshape(review, (1, x_test.shape[1]))

print("The predicted sentiment is: " + str((lstm.predict(review))))
print_review(review[0])
print("The actual sentiment is: " + str(y_test[review_num]))