# LSTM Recurrent Neural Network
**Author**: Adi Bronshtein, DC

**Background:** we're going to use a _Recurrent Neural Network_ for text classification. The key feature of RNNs is that the data loop back in the network. This gives RNNs a type of "memory" it can use to better understand sequential data. A popular choice type of RNN is the _Long Short-Term Memory_ (LSTM) network which allows for information to loop backwards in the network.

In [2]:
# the regular imports 
import numpy as np
import matplotlib.pyplot as plt
from keras.datasets import imdb # check out the dataset info at https://keras.io/datasets/
from keras.models import Sequential # import the type of model we'll use
from keras.layers import Dense, LSTM, Dropout # import the layers
from keras.layers.embeddings import Embedding # import another kind of layers
from keras.preprocessing import sequence

# set random seed for reproducibility
np.random.seed(42)

## Simple LSTM for Sequence Classification

### The IMDB Movie Review Dataset
Dataset of 25,000 movies reviews from IMDB, labeled by sentiment (positive/negative). Reviews have been preprocessed, and each review is encoded as a sequence of word indexes (integers). For convenience, words are indexed by overall frequency in the dataset, so that for instance the integer "3" encodes the 3rd most frequent word in the data. This allows for quick filtering operations such as: "only consider the top 10,000 most common words, but eliminate the top 20 most common words."
As a convention, "0" does not stand for a specific word, but instead is used to encode any unknown word. 

Check out the full description and how to use the dataset and the `"load_data()"` method in the [Keras Documentation](https://keras.io/datasets/#imdb-movie-reviews-sentiment-classification)

#### Load Dataset On Movie Review Text

In [8]:
# load in the dataset using load_data, but only keep the top 5000 words. Other words would be 0. 

# imdb.load_data returns two tuples: (x_train, x_test) and (y_train, y_test). See the (link to) documentation above!

In [9]:
# shorten/pad the input sequence - to make each observation have 500 features (you can change that value)


#### View First Observation’s Raw Data

In [None]:
# View first observation


#### View First Observation’s Feature Data


In [5]:
# View first observation


### Create LSTM Neural Network Architecture

In [9]:
# set the vector length

# instantiate the neural network

# first layer - Embedded layer with a length of 32 vectors (represent each word)

# second layer - LSTM (long short-term memory) layer with 100 neurons

# last layer - a fully densed (connected) layer with sigmoid activation function (binary classification)

# compiling the network. Using binary crossentropy for log loss, adam as optimizer and accuracy as our metric

# shows us all the model's informantion






### Train LSTM Neural Network Architecture

In [None]:
'''fit the model and assign it to a history object (to get info from fitted model later)
fitting on the trainig data and using the test data as validation/evaluation.
when you have more time, try epochs=3 (or more!)''' 
# Train neural network


### Evaluate the Model

In [4]:
# evaluation of the model using the accuracy score 


### Visualize Neural Network Performance History 

For the sake of time, we're only running one epoch during the lesson, so this visualization would be meaningless.   
Try running this code later, when you have time, after using, say, 10 or 15 epochs and seeing the training and testing accurcy. 

In [None]:
# Get training and test accuracy histories


# Create count of the number of epochs

# Visualize accuracy history


## LSTM For Sequence Classification With Dropout

As we discussed earlier, RNNs like LSTM are pretty prone for overfitting. We can add a Dropout layer between the the Embedding and LSTM layers and the LSTM and Dense output layers. Each Dropout layer will drop a user-defined hyperparameter of units in the previous layer every batch. Remember in Keras the input layer is assumed to be the first layer and not added using the add. Therefore, if we want to add dropout to the input layer, the layer we add in our is a dropout layer. This layer contains both the proportion of the input layer’s units to drop 0.2 and input_shape defining the shape of the observation data. Next, after we add a dropout layer with 0.5 after each of the hidden layers.

### Create LSTM Neural Network Architecture (with Dropout)

In [None]:
# Start neural network

# Add a dropout layer for input layer

# Add fully connected layer with a ReLU activation function

# Add a dropout layer for previous hidden layer

# Add fully connected layer with a ReLU activation function

# Add a dropout layer for previous hidden layer

# Add fully connected layer with a sigmoid activation function


### Compile the Network

In [None]:
# Compile neural network


### Train LSTM Neural Network Architecture

In [None]:
# Train neural network


### Evaluate the Model

In [None]:
# Get the accuracy score


### Visualize Neural Network Performance History 

For the sake of time, we're only running one epoch during the lesson, so this visualization would be meaningless.   
Try running this code later, when you have time, after using, say, 10 or 15 epochs and seeing the training and testing accurcy. 

In [None]:
# Get training and test accuracy histories


# Create count of the number of epochs

# Visualize accuracy history
