# LSTM Sentiment Analysis with Keras
How can we take a bunch of movie reviews from IMDB and use code to classify whether or not the review is positive or negative?

This notebook goes along with the code in [Learning Intelligence 25]. 

For further reference, check out the [Keras example](https://github.com/keras-team/keras/blob/master/examples/imdb_lstm.py). 

## What is Sentiment Analysis?

Sentiment analysis looks at a body of information and decides whether it's good or bad. 

For example, IMDB is a movie review site with millions of reviews. However, some reviewers don't leave star ratings. How could IMDB automatically assign star ratings to reviews? Sentiment analysis. 

What words could you look for in a review which determine whether it's good or bad?

Perhaps looking for the word 'great' in a review would lead to a high rating. 

"This movie was great, if you think lemon juice in the eyes is great."

Not so fast. This is one of the hard problems of natural language processing (NLP), taking text or language in its natural form and analysing it. 

Despite this hard problem, using Keras we can quickly build a model which achieves around 90% accuracy on predicting whether a sentiment is good or bad. 

## What is Keras?

Keras is a deep learning library used to build deep learning models quickly. 

Keras is based on Python. 

Because deep learning is a very empircal science (lots of trial and error) Keras is great for building an initial prototype and iterating quickly. 

See more at [Keras.io](https://keras.io/).

## Installing what's required

If you've never used Keras before, you'll have to install it on your computer. 

On a Mac, open Terminal, on a Windows PC, you'll have to open the equivalent of a command line.

This will open your home directory (basically the big folder on your computer where everything lives).

On my computer it looks like this: ![picture of home directory in terminal](images/terminal_at_home.png)

## Installing Jupyter Notebooks and Keras

If you've never used Jupyter Notebooks or Keras before, you can install them quite easily. 

Run these two codes in your terminal window. 

`pip install keras`

`pip install jupyter`

Don't worry too much about what pip means - it's a way to install files on your computer. 

It should look something like this: ![picture of terminal command to install Keras](images/installing_keras.png)


## Changing into another folder

I created a folder on my desktop called "Sentiment-Analysis-with-Keras". 

I can get there by using the command `cd desktop/sentiment-analysis-with-keras`. Where, cd = change directory (a fancy name for folder).

When we enter the code, we're now in the "Sentiment-Analysis-with-Keras" folder.

![picture of being in the right folder in a terminal window](images/in_right_folder.png)

## Starting a Jupyter Notebook
To get into a jupyter notebook, use the command `jupyter notebook`.

![picture of entering the command jupyter notebook into terminal](images/jupyter_notebook.png)

## Creating a new notebook

To create a new notebook like this one, select new in the top right corner. Mine has existing notebooks because I've been working on this project. 

Choose Python 3 when the menu drops down. 

![picture of jupyter notebook homescreen](images/jupyter_notebook_homepage.png)

## Now to start coding!

You've done all the steps necessary to build this model for yourself, except the actual code!

If you've never used a jupyter notebook before, refer to [this guide](https://www.datacamp.com/community/blog/jupyter-notebook-cheat-sheet) for how to get code to run.

We won't be using anything other than Shift + Enter for this notebook. This command runs the current cell you're in.

## Importing the dependencies

Imagine you're starting an assignment. The only way to get information for your assignment is by going to the library and getting books which relate to your project. 

This is what we're doing here. We need to use a number of things from the Keras library so we're importing (borrowing) them here. 

What does the dot mean in between `keras.datasets`?

Keras is a big library. That dot means we're going into the `datasets` section of the library. Much like you would go to the science section of a regular library.

So the statement `from keras.datasets import imdb` is actually saying: Go to the datasets section of the keras library and borrow the imdb dataset.

In [24]:
# Import the dependencies
from keras.datasets import imdb
from keras.models import Sequential
from keras.layers import Dense, LSTM
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence

print("Imported dependencies.")

Imported dependencies.


## Setting up the data

In deep learning there's often a training set and test set of data. 

The training set is used for your model to learn on. Essentially, we show our model a bunch of examples and it begins to learn the patterns in those examples. 

Once it knows the patterns, we can test how accurate those patterns are on the test set (a section of data the model has never seen before).

So the line,
`(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=max_words)` is actually saying:

Load the data from the imdb dataset and split it into a training set and a test set and make sure the maximum number of words in each set is 5000.

The imdb dataset is a dataset of 25,000 movie reviews built into the Keras library.

In [12]:
# Define the number of words you want to use
max_words = 5000

# Define the training and test dataset
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=max_words)

print("Created test and training data.")

Created test and training data.


## Padding the input sequences

Not every review is the same length. 

Same reviews could even be one word long.

"nice"

Deep learning models look best when all of the data is in a similar shape. Imagine trying to fit 1000 different size marbles through the same size hole. 

`Pad_sequences` will add 0's to any reviews which don't have a length of 500 (this is what we decided the max length to be, you can increase it). 

For example, our one word review above would become:
"nice 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0... x 500"

The same goes for any reviews longer than 500 characters, they will be shortened to a maximum of 500.

So what `X_train = sequence.pad_sequences(X_train, maxlen=max_review_length)` is saying is: Take the reviews in the X_train dataset and if they are shorter than 500 characters, add 0's to the end and if they're longer than 500 characters, cut them down to 500.

In [13]:
# Define the maximum length of a review
max_review_length = 500

# Pad the input sequences with 0's to make them all the same length
X_train = sequence.pad_sequences(X_train, maxlen=max_review_length)
X_test = sequence.pad_sequences(X_test, maxlen=max_review_length)

print("Padded the input sequences with 0's to all be the same length.")

Padded the input sequences with 0's to all be the same length.


## Creating the model

Now our data is ready for some modelling!

Deep learning models have layers. 

The top layer takes in the data we've just prepared, the middle layers do some math on this data and the final layer produces an output we can hopefully make use of.

In our case, our model has three layers, an Embedding layer, an LSTM layer and a Dense layer.

Our model begins with the line `model = Sequential()`. Think of this as simply stating "our model will flow from input to output layer in a sequential manner" or "our model goes one step at a time".

### Embedding layer

The Embedding layer makes creates a database of the relationships between words.

`model.add(Embedding(max_words, embedding_vector_length, input_length=max_review_length))` is saying: add an Embedding layer to our model and use it to turn each of our words into a list of numbers 32 digits long which have some mathematical relationship to each other. 

So each of our words will become a vector, 32 digits long, of numbers.

For example, the = [0.556433, 0.223122, 0.789654....].

Don't worry for now how this is computed, Keras does it for us. 

### LSTM layer

`model.add(LSTM(100))` is saying: add a LSTM layer after our embedding layer in our model and give it 100 units.

LSTM = Long short-term memory. Think of LSTM's as a tap, a tap whichs decides which words flow through the model and which words don't. This layer uses 100 taps to decide which words matter the most in each review.

### Dense layer

`model.add(Dense(1, activation='sigmoid'))` is saying: add a Dense layer to the end of our model and use a sigmoid activation function to produce a meaningful output. 

A dense layer is also known as a fully-connected layer. This layer connects the 100 LSTM units in the previous layer to 1 unit. This last unit them takes all this information and runs it through a sigmoid function.

Essentially, the sigmoid function will decide if the information should be given a 1 or a -1. 1 for positive and -1 for negative. This is will decided on based on the information passed through by the LSTM layer.


In [16]:
# Define how long the embedding vector will be
embedding_vector_length = 32

# Define the layers in the model
model = Sequential()
model.add(Embedding(max_words, embedding_vector_length, input_length=max_review_length))
model.add(LSTM(100))
model.add(Dense(1, activation='sigmoid'))

print("Model created.")

Model created.


## Compiling the model

The layers of our model our done. But we still have to put some finishing touches on it before it's ready to run. 

`model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']` is saying: Stack the layers of our model on top of each other and assign a binary crossentropy loss function, use Adam for the optimizer and track accuracy metrics. 

### Binary crossentropy

Think of binary crossentropy as a function which helps decide whether the output of a layer should be a 0 or 1. Binary = 0 or 1. We only want 0 or 1 as the output, because we only care about postive (1) or negative (0). If we cared about more than two categories, we would use a different loss function.

### Adam

If the model is having a hard time deciding whether an output should be 0 or 1, the Adam optimizer helps out. Adam is the name of a popular optimization function. The optimization function helps the model make better decisions on 0 or 1.

### Model metrics

Tracking the accuracy metrics will show us some live stats on how our model is doing during training (more on this soon).

In [18]:
# Compile the model and define the loss and optimization functions
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

print("Model compiled, ready to be fit to the training data.")

Model compiled, ready to be fit to the training data.


## Summarize the model

Making a summary of the model will give us an idea of what's happening at each layer. 

In the embedding layer, each of our words is being turned into a list of numbers 32 digits long. Because there are 5000 words (`max_words`), there are 160,000 parameters (32 x 5000). 

Parameters are individual pieces of information. The goal of the model is to take a large number of parameters and reduce them down to something we can understand and make use of (less parameters). 

The LSTM layer reduces the number of parameters to 53,200 (5000 x 100 + 32 x 100).

The final dense layer connects each of the outputs of the LSTM units into one cell (100 + 1). 

In [29]:
# Summarize the different layers in the model
print(model.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_8 (Embedding)      (None, 500, 32)           160000    
_________________________________________________________________
lstm_8 (LSTM)                (None, 100)               53200     
_________________________________________________________________
dense_7 (Dense)              (None, 1)                 101       
Total params: 213,301
Trainable params: 213,301
Non-trainable params: 0
_________________________________________________________________
None


## Fitting the model to the training data

Now our model is compiled, it's ready to be set loose on our training data.

We're going to run 3 cycles (`epochs=3`) on groups of 64 reviews at a time (`batch_size=64`).

Because of our loss and optimzation functions, the model accuracy should improve after each cycle.

`model.fit(X_train, y_train, epochs=3, batch_size=64)` is saying: fit the model we've built on the training dataset for 3 cycles and go over 64 reviews at a time.

Feel free to change the number of epochs (more cycles) or batch_size (more or less information each step) to see how the accuracy changes.

This will take a little time depending on how powerful your computer is. On my MacBook Pro, it took around 10-minutes. 

In [20]:
# Fit the model to the training data
model.fit(X_train, y_train, epochs=3, batch_size=64)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x130e19630>

## Evaluating the model on the test data

Now our model is trained on the test set with an accuracy of 90.21% - not bad!

The final step is to find out how well our trained model does on the test dataset. The data comes from the same initial library but the model has never seen it. 

Think of this as studying for an exam. Your teacher tells you the exam will be on things you've learned in class. Training the model is like studying the things you've learned in class. Evaluating the model is like taking the exam. 

`model_scores = model.evaluate(X_test, y_test, verbose=0)` is saying: take our trained model and see how it performs on the test dataset, we don't want the fancy progress bars so verbose is set to 0. 

In [28]:
# Evaluate the trained model on the test data
model_scores = model.evaluate(X_test, y_test, verbose=0)

# Print out the accuracy of the model on the test set
print("Model accuracy on the test dataset: {0:.2f}%".format(model_scores[1]*100))

Model accuracy on the test dataset: 87.25%


## Summary

Our model finished with 87.25% accuracy on the test dataset!

That means when given any random review, our model has an ~87.25% chance of deciding correctly whether the review is positive or negative. 

Classifying 25,000 movie reviews with an accuracy that high in about 10-minutes is pretty good. 

How fast could you read 25,000 movie reviews?

That's the power of deep learning. And this is only the beginning.

There are ways to make our model even faster (run on a GPU) and get better results (make the model more complex). But I'll leave these for you to figure out. 

## Extensions

What could you do to improve this model?

What questions do you have?

Could this notebook be improved? Or does it have any errors?

If you do any of these, leave a comment on one of my videos or send me an email: daniel@mrdbourke.com. I'd love to take a look!

As always,

Keep learning.