# Sentiment analysis
In this assignment, we will learn to create neural network model for [sentiment analysis](https://en.wikipedia.org/wiki/Sentiment_analysis) using [neural network](https://en.wikipedia.org/wiki/Neural_network) approach. 

### Import Google Colab drive helper and mount the Google drive.

In [0]:
from google.colab import drive

drive.mount('/content/drive')

In [0]:
import os

current_path = '/content/drive/My Drive/compare/keras/'
os.chdir(current_path)

!ls -al

### Import required python modules.

In [0]:
import numpy as np
np.random.seed(7)

from keras.datasets import imdb

### Load the IMDb dataset.

Keras has a built-in [IMDb movie reviews dataset](https://keras.io/datasets). We will use this dataset.
- Set vocabulary size = 5000.
- Set embedding size = 32.
- Load IMDb dataset.

### Hints
- [IMDb movie reviews dataset](https://keras.io/datasets)

In [0]:
vocabulary_size = 5000
embedding_size = 32

(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=vocabulary_size)
print('Loaded IMDB dataset with {} training samples and {} test samples.'.format(len(X_train), len(X_test)))

In [0]:
sample_index = np.random.randint(0, X_train.shape[0])

print('Review ', X_train[sample_index])
print('Label ', y_train[sample_index])

In [0]:
word_to_identifier = imdb.get_word_index()
identifier_to_word = {i: word for word, i in word_to_identifier.items()}

print('Review with words' , [identifier_to_word.get(i, ' ') for i in X_train[sample_index]])
print('Label ' , y_train[sample_index])

In [0]:
print('Maximum review length: {}'.format(len(max((X_train + X_test), key=len))))
print('Minimum review length: {}'.format(len(min((X_test + X_test), key=len))))

### Pad sequences

In order to feed this data into our neural network model, all input documents must have the same length. We will limit the maximum review length to max_words is equal to 500 by truncating longer reviews and padding shorter reviews with a null value (0). We can accomplish this using the [pad_sequences(...)](https://keras.io/preprocessing/sequence/) function in Keras. 


### Hints
- [pad_sequences()](https://keras.io/preprocessing/sequence/)

In [0]:
from keras.preprocessing import sequence

max_words = 500

X_train = sequence.pad_sequences(X_train, maxlen=max_words)
X_test = sequence.pad_sequences(X_test, maxlen=max_words)

In [0]:
validation_size = 1000

X_val, y_val = X_train[:validation_size], y_train[:validation_size]
X_train, y_train = X_train[validation_size:], y_train[validation_size:]

### Use Adam optimizer.

*   Import Adam optimizer from keras optimizers. See [link](https://keras.io/optimizers/) for details.
*   Create an object of Adam optimizer with learning rate  value 0.001.

In [0]:
from keras.optimizers import Adam

learning_rate = 0.001
optimizer = Adam(lr=learning_rate)

### Import required python modules.

In [0]:
from helper_functions import *

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.embeddings import Embedding

### Create a simple neural network model using Keras.

1. Create a sequential model. See [link](https://keras.io/models/sequential/) for details.

2. Add an embedding layer to the model. See [link](https://keras.io/layers/embeddings) for more details.
Set input_dim = vocabulary_size, output_dim = embedding_size and input_length = max_words.

3. Add a flatten layer to the the model. See [link](https://keras.io/layers/core/) for details.

4. Add a dense layer to the model. See [link](https://keras.io/layers/core/) for details. Set units = 64, activation = relu for the layer.

5. Add a dense layer to the model. Set units = 1, activation = sigmoid for the layer.

In [0]:
nn_model = Sequential()
nn_model.add(Embedding(input_dim=vocabulary_size, output_dim=embedding_size, input_length=max_words))
nn_model.add(Flatten())
nn_model.add(Dense(units=64, activation='relu'))
nn_model.add(Dense(units=1, activation='sigmoid'))

### Compile the simple neural network model.

1. See [link](https://keras.io/models/model/) for details.
2. Set loss = binary_crossentropy.
3. Set optimizer=optimizer (created Adam optimizer).
4. Set metrics = accuracy.
5. Print model summary for visualization.

In [0]:
nn_model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
print(nn_model.summary())

### Train the simple neural network model.

1. Set number of epochs between 10-20.
2. Set batch size as multiple of 32 i.e. either 32 or 64.
3.  Call function fit(...) on the model. 
4.  See [link](https://keras.io/models/model/) for details.
5. Pass X_train and y_train for training the model
6. Pass X_val and y_val as validation data.

In [0]:
num_epochs = 10
batch_size = 64

In [0]:
history = nn_model.fit(X_train, y_train, validation_data=(X_val, y_val), batch_size=batch_size, epochs=num_epochs)
show_graph(history)

### Print the model accuracy.

In [0]:
scores = nn_model.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

**Expected Output**:

<table>
    <tr>
        <td>
            Test accuracy
        </td>
        <td>
           between 75.0 and 95.0
        </td>
    </tr>
</table>