# Classifying Movie Reviews from IMDb
### -Marty VanHoof

## Contents

- [Introduction](#intro)
- [Load and prepare the data](#load)
- [How to recover the original review?](#recover)
- [Sequence padding](#padding)
- [Using GridSearchCV to find optimal hyperparameters](#grid_search)
 - [1. Tune batch size and number of epochs (and two optimizers)](#batch_epochs)
 - [2. Tune the training optimization algorithm](#opt_algorithm)
 - [3. Tune learn rate and learn rate decay for adagrad](#tune_adagrad)
 - [4. Tune the activation function in the hidden layers](#activation)
 - [5. Tune dropout regularization](#dropout)
 - [6. Tune the number of nodes in the hidden layers](#neurons)

<a id='intro'></a>
## Introduction

The Internet Movie Database (IMDb) is an online database consisting of movie, TV, and video game information.  Our goal is to analyze a dataset of movie reviews from IMDb and use the Python neural network library [Keras](https://keras.io/) to predict the sentiment of a movie review.  The dataset consists of 25,000 movie reviews for training (and the same number for testing) from IMDb, labeled by sentiment (positive review or negative review). Reviews have been preprocessed, and each review is encoded as a sequence of word indices (integers).  Words are indexed by overall frequency in the dataset, so for example the integer 1 corresponds to the most frequent word in the dataset, 2 corresponds to the 2nd most frequent word, etc.  Therefore, a sentence is represented by a sequence of integers, and each movie review can be encoded as an integer vector.

For humans, it is relatively easy to predict the sentiment of a review.  This is something we want to train the model to do.  Let's start by loading the necessary libraries and modules.

In [2]:
import numpy as np
import pandas as pd
from keras.datasets import imdb
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import LSTM
from keras.layers.embeddings import Embedding
from keras.optimizers import Adagrad
from keras.preprocessing import sequence
from keras.constraints import max_norm
# fix random seed for reproducibility
np.random.seed(13)

<a id='load'></a>
## Load and prepare the data

With Keras, this dataset comes [preloaded](https://keras.io/datasets/), so a simple command will allow us to access the training and testing data. The parameter `num_words` specifies the top most frequent words to consider.  The training and testing data are loaded into matrices `X_train` and `X_test` (represented as numpy arrays), and the corresponding training/testing labels (`y_train/y_test`) are vectors of binary integers (0 or 1), where 0 represents a bad review and 1 represents a good review.

In [3]:
# Load the data and split it into 50% train, 50% test sets
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=5000)

print('number of training examples: ', len(X_train))
print('number of testing examples: ', len(X_test))
print('classes: ', np.unique(y_train))

number of training examples:  25000
number of testing examples:  25000
classes:  [0 1]


We can look at the first training example, which is a movie review encoded as an array of integers

In [4]:
print(X_train[0])

[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 2, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 2, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 2, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 2, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 2, 19, 178, 32]


<a id='recover'></a>
## How to recover the orginal review?

The above array is how the algorithm will see the review, but it would be nice to see the review in human-readable form as well, so we can get an idea of the review's sentiment.  There is a function in Keras called `get_word_index()` that returns a dictionary mapping words to their corresponding indices.  For example, we can have a look at (say, 10) random entries from the dictionary with the restriction that these 10 entries come from the top 5000 most frequent words.

In [3]:
# print out 10 random entries from the top 5000 most frequent words in the word_index dict
word_index = imdb.get_word_index()
rand_indices = np.random.randint(5000, size=10)
for k,v in word_index.items():
    if v in rand_indices:
        print(k,v)

robert 667
until 363
trailers 4238
bitter 2916
carol 3687
nyc 4841
attracted 3621
diamond 3640
hero 629
psychological 1984


We can use this word index and write a function that translates an integer-encoded review back into human-readable form.  For more information, see the first answer to this [stackoverflow post](https://stackoverflow.com/questions/42821330/restore-original-text-from-keras-s-imdb-dataset).  

In [5]:
def get_review(review_number):
    '''
    Put the review back in a form that humans can understand
    '''
    word_index = imdb.get_word_index()
    word_index = {k:v+3 for k, v in word_index.items()}
    word_index["<PAD>"] = 0
    word_index["<START>"] = 1
    word_index["<UNK>"] = 2
    index_word = {v:k for k, v in word_index.items()}
    return ' '.join( index_word[i] for i in X_train[review_number] )

# print out the first two reviews and their labels
print('The first review is clearly positive, so it should get a label of {} :\n'.format(y_train[0]))
print(get_review(0), '\n')
print('The second review is clearly negative, so it should get a label of {} :\n'.format(y_train[1]))
print(get_review(1))

The first review is clearly positive, so it should get a label of 1 :

<START> this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert <UNK> is an amazing actor and now the same being director <UNK> father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for <UNK> and would recommend it to everyone to watch and the fly <UNK> was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also <UNK> to the two little <UNK> that played the <UNK> of norman and paul they were just brilliant children are often left out of the <UNK> list i think because the stars that play them all grown up are such a big <UNK> for the whol

<a id='padding'></a>
## Sequence padding

The neural network model requires that the input vectors all have the same length, so we need to set a fixed length for the inputs and then find a way to deal with the movie reviews that are too long or too short.  This can be done using [sequence.pad_sequences](https://keras.io/preprocessing/sequence/) in the `keras.preprocessing` module, which will truncate longer reviews to a fixed length and also pad shorter reviews with zeros.

In [4]:
X_train = sequence.pad_sequences(X_train, maxlen=500)
X_test = sequence.pad_sequences(X_test, maxlen=500)

print('train/test set dimensions: ', X_train.shape, X_test.shape, '\n')
print('first sentence in training set, encoded and padded:\n\n ', X_train[0], '\n')

train/test set dimensions:  (25000, 500) (25000, 500) 

first sentence in training set, encoded and padded:

  [   0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0   

<a id='grid_search'></a>
## Using GridSearchCV to find optimal hyperparameters

First I want to give a shout-out to Dr. Jason Brownlee, who's [article on grid-searching with Keras](https://machinelearningmastery.com/grid-search-hyperparameters-deep-learning-models-python-keras/) is immensely helpful.

[GridSearchCV](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV) is a very useful class in scikit-learn that performs an exhaustive search over specified hyperparameters of the model and finds those with the most optimal performance.  It saves a lot of time, since now we don't have to tune the hyperparameters manually.  Keras does not have grid search capabilities built into it, but it does have a [scikit-learn wrapper](https://keras.io/scikit-learn-api/) that we can use to interface with the scikit-learn API.  An important thing to note is that we are performing the grid search by tuning a few different hyperparameters separately and then aggregating the results.  This is not the best way to grid search because there may be dependence relationships among the hyperparameters.  There are two reasons I'm doing it this way:  1)  It's too time (and CPU) consuming to tune many hyperparameters at once on my little laptop; 2) tuning the hyperparameters this way the first time is more illustrative and better for my learning process.

First we build and compile the model in Keras using the `build_model` function below.  Then we can pass this function as an argument to `KerasClassifier` in order to interface with scikit-learn and use `GridSearchCV`.
The `grid_search` function below takes the sklearn-wrapped Keras model and a parameter grid as arguments and then performs the grid search.  It returns a dictionary with some useful summary statistics of the grid combinations and it also prints the accuracy score and the hyperparameters of the optimal model.  We will eventually load the dictionary into a Pandas dataframe to display the results in a nice way.  Most of these steps are implemented in the `main` function in the [imdb_models.py](imdb_models.py) file.  The `main` function will also save the weights from the best model to a `.hdf5` file, and it will write the grid search results to a `.csv` file.  

If you are not using some form of cloud computing, then trying to train neural network models in a Jupyter Notebook on your local machine is frustrating, as the process tends to be very slow and will often just crash.  Running the `.py` file directly in the terminal is faster, but in general it's not a good idea to to try train neural network models that are too big on a laptop CPU.  A good option is some form of cloud computing, such as [Amazon Web Services](https://aws.amazon.com/).

The model is a sequential model, which is just a linear stack of layers in the neural network.  The first layer is an [Embedding](https://keras.io/layers/embeddings/) layer that maps each integer-encoded word into a space of word vectors where semantically similar words are mapped to nearby points.  This natural language processing model is called [word2vec](https://www.tensorflow.org/tutorials/word2vec).

In [8]:
def build_model(optimizer='adagrad', learn_rate=0.01, learn_rate_decay=0.01, activation='relu',
                dropout_rate=0.1, weight_constraint=4, neurons=250, input_dim=5000, output_dim=32,
                max_review_length=500):
    '''
    Setup the model architecture and compile the model.
    '''
    model = Sequential()

    # embedding layer
    model.add(Embedding(input_dim, output_dim, input_length=max_review_length))
    model.add(Flatten())
    
    # first hidden layer 
    model.add(Dense(neurons, activation=activation, kernel_constraint=max_norm(weight_constraint)))
    model.add(Dropout(dropout_rate))
    
    # second hidden layer
    model.add(Dense(neurons, activation=activation, kernel_constraint=max_norm(weight_constraint)))
    model.add(Dropout(dropout_rate))
    
    # output layer
    model.add(Dense(1, activation='sigmoid'))
    
    # compile the model
    optimizer = Adagrad(lr=learn_rate, decay=learn_rate_decay)
    model.compile(loss='binary_crossentropy',
                  optimizer=optimizer,
                  metrics=['accuracy'])
    
    return model

def grid_search(model, param_grid):
    '''
    Performs a grid search of model hyperparameters and returns the best model,
    along with some summary statistics of the different grid combinations.
    '''
    
    # set up and perform the grid search
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
    grid_result = grid.fit(X_train, y_train)

    # the cv_results_ attribute is a dict that summarizes many important results
    # from each model
    test_score_means = grid_result.cv_results_['mean_test_score']
    train_score_means = grid_result.cv_results_['mean_train_score']
    train_times = grid_result.cv_results_['mean_fit_time']
    params = grid_result.cv_results_['params']

    # store these results in a smaller dict for easy loading into a pandas dataframe
    final_results = dict(mean_test_score=test_score_means, mean_train_score=train_score_means,
                         mean_fit_time=train_times, params=params)

    print('Best score {} using hyperparameters {}'.format(grid_result.best_score_,
                                                          grid_result.best_params_))
    
    return grid_result, final_results

m = build_model()
m.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 500, 32)           160000    
_________________________________________________________________
flatten_2 (Flatten)          (None, 16000)             0         
_________________________________________________________________
dense_4 (Dense)              (None, 250)               4000250   
_________________________________________________________________
dropout_3 (Dropout)          (None, 250)               0         
_________________________________________________________________
dense_5 (Dense)              (None, 250)               62750     
_________________________________________________________________
dropout_4 (Dropout)          (None, 250)               0         
_________________________________________________________________
dense_6 (Dense)              (None, 1)                 251       
Total para

<a id='batch_epochs'></a>
## 1. Tune batch size and number of epochs (and two optimizers)

**Best score using hyperparameters {'optimizer': 'adam', 'batch_size': 200, 'epochs': 2}**

We ran the grid search in the [imdb_models.py](imdb_models.py) file with the following parameter grid:

```
optimizer = ['rmsprop', 'adam']
batch_size = [100, 200, 500]
epochs = [2, 5, 10]
```

The function below will display some summary statistics of the different grid combinations, ordered by mean_test_score in descending order.  Only the first 10 results are shown.  

In [9]:
def display_results(csv_file, num_rows=0):
    '''
    Display the model results in a Pandas dataframe
    '''
    from IPython.display import display
    
    # load the results from the csv file, sort the values, and reset the index
    results = pd.read_csv(csv_file)
    results.sort_values(by=['mean_test_score'], inplace=True, ascending=False)
    results.reset_index(drop=True, inplace=True)
    
    # set the last column to display with the maximum width
    pd.set_option('display.max_colwidth', -1)
    
    # convert mean_fit_time to minutes
    pd.to_numeric(results.mean_fit_time)
    results['mean_fit_time'] = results.mean_fit_time / 60
    
    print('\nTotal training time: {} minutes'.format(round(np.sum(results.mean_fit_time), 2)))
    
    if num_rows:
        display(results.head(num_rows))
    else:
        display(results)
    
display_results('imdb_results/grid_batch_epoch_results.csv', 10)


Total training time: 83.01 minutes


Unnamed: 0,mean_fit_time,mean_test_score,mean_train_score,params
0,1.57346,0.86724,0.97924,"{'optimizer': 'adam', 'batch_size': 200, 'epochs': 2}"
1,1.627119,0.86104,0.98642,"{'optimizer': 'rmsprop', 'batch_size': 100, 'epochs': 2}"
2,1.803518,0.859,0.9924,"{'optimizer': 'adam', 'batch_size': 100, 'epochs': 2}"
3,4.999639,0.85636,1.0,"{'optimizer': 'adam', 'batch_size': 500, 'epochs': 10}"
4,7.485352,0.85552,1.0,"{'optimizer': 'adam', 'batch_size': 200, 'epochs': 10}"
5,1.476564,0.85532,0.95674,"{'optimizer': 'rmsprop', 'batch_size': 200, 'epochs': 2}"
6,3.65329,0.85204,0.99998,"{'optimizer': 'adam', 'batch_size': 200, 'epochs': 5}"
7,4.410074,0.85148,0.99998,"{'optimizer': 'adam', 'batch_size': 100, 'epochs': 5}"
8,8.487769,0.85136,0.99996,"{'optimizer': 'adam', 'batch_size': 100, 'epochs': 10}"
9,6.102054,0.8506,0.99998,"{'optimizer': 'rmsprop', 'batch_size': 500, 'epochs': 10}"


### Check accuracy on the training and test sets

We can see that the training accuracy is a lot higher than the accuracy on the test set.  This means that the model has overfit, and this is only after 2 epochs.

In [10]:
def check_model(hdf5_file):
    '''
    Load the weights from the grid search and check the accuracy
    on the training and test sets.
    '''
    from imdb_models import build_model

    model = build_model()

    # load the weights that yielded the best validation score from grid search
    model.load_weights(hdf5_file)

    # evaluate train/test accuracy
    train_score = model.evaluate(X_train, y_train, verbose=0)
    test_score = model.evaluate(X_test, y_test, verbose=0)
    train_accuracy = 100*train_score[1]
    test_accuracy = 100*test_score[1]
    print('\nTraining accuracy: {}%'.format(round(train_accuracy, 2)))
    print('Test set accuracy: {}%'.format(round(test_accuracy, 2)))
    
check_model('imdb_results/imdb_batch_epoch_best.hdf5')


Training accuracy: 98.02%
Test set accuracy: 86.43%


<a id='opt_algorithm'></a>
## 2. Tune the training optimization algorithm

**Best score using hyperparameters {'batch_size': 200, 'epochs': 2, 'optimizer': 'adagrad'}**

Letting the optimizers vary and choosing the batch size and number of epochs based on the results from 1, we used the following parameter grid:

```
optimizer = ['adam', 'rmsprop', 'sgd', 'adagrad', 'adadelta', 'adamax', 'nadam']
```

In [6]:
display_results('imdb_results/grid_optimizer_results.csv')


Total training time: 9.69 minutes


Unnamed: 0,mean_fit_time,mean_test_score,mean_train_score,params
0,1.322234,0.87756,0.97092,"{'batch_size': 200, 'epochs': 2, 'optimizer': 'adagrad'}"
1,1.460754,0.8672,0.98194,"{'batch_size': 200, 'epochs': 2, 'optimizer': 'adam'}"
2,1.278613,0.86424,0.98962,"{'batch_size': 200, 'epochs': 2, 'optimizer': 'nadam'}"
3,1.346801,0.86328,0.96962,"{'batch_size': 200, 'epochs': 2, 'optimizer': 'rmsprop'}"
4,1.458367,0.84208,0.922919,"{'batch_size': 200, 'epochs': 2, 'optimizer': 'adamax'}"
5,1.614987,0.55312,0.581459,"{'batch_size': 200, 'epochs': 2, 'optimizer': 'adadelta'}"
6,1.207428,0.50816,0.51354,"{'batch_size': 200, 'epochs': 2, 'optimizer': 'sgd'}"


### Check accuracy on the training and test sets

There still seems to be overfitting, but the training and test accuracy are a little closer together this time.

In [11]:
check_model('imdb_results/imdb_optimizer_best.hdf5')


Training accuracy: 96.91%
Test set accuracy: 87.98%


<a id='tune_adagrad'></a>
## 3. Tune learn rate and learn rate decay for adagrad

**Best score using hyperparameters {'learn_rate_decay': 0.01, 'learn_rate': 0.01}**

This time we used adagrad (along with the same setting for batch size and epochs from above), and the following parameter grid:

```
learn_rate = [0.001, 0.01, 0.1, 0.3]
learn_rate_decay = [0.0, 0.1, 0.01, 0.001]
```

In [15]:
display_results('imdb_results/grid_learn_rate_results.csv', 10)


Total training time: 21.8 minutes


Unnamed: 0,mean_fit_time,mean_test_score,mean_train_score,params
0,1.375478,0.87204,0.94322,"{'learn_rate_decay': 0.01, 'learn_rate': 0.01}"
1,1.481537,0.86612,0.95816,"{'learn_rate_decay': 0.0, 'learn_rate': 0.01}"
2,1.423921,0.86516,0.94918,"{'learn_rate_decay': 0.001, 'learn_rate': 0.01}"
3,1.351528,0.66376,0.72554,"{'learn_rate_decay': 0.0, 'learn_rate': 0.001}"
4,1.39906,0.64552,0.70422,"{'learn_rate_decay': 0.1, 'learn_rate': 0.01}"
5,1.384608,0.59864,0.656881,"{'learn_rate_decay': 0.001, 'learn_rate': 0.001}"
6,1.455008,0.58128,0.63766,"{'learn_rate_decay': 0.01, 'learn_rate': 0.001}"
7,1.44948,0.52924,0.55574,"{'learn_rate_decay': 0.1, 'learn_rate': 0.001}"
8,1.273191,0.50412,0.49794,"{'learn_rate_decay': 0.0, 'learn_rate': 0.3}"
9,1.275629,0.50344,0.49828,"{'learn_rate_decay': 0.1, 'learn_rate': 0.3}"


### Check accuracy

The test set accuracy is a tad higher and the training accuracy has gone down again.  This is a good sign as it means there is less overfitting.

In [12]:
check_model('imdb_results/imdb_learn_rate_best.hdf5')


Training accuracy: 94.75%
Test set accuracy: 88.22%


<a id='activation'></a>
## 4. Tune the activation function in the hidden layers

**Best score using hyperparameters {'activation': 'relu'}**

The activations functions introduce non-linearity into the network and allow the model to learn more complicated functions.  Next we tried the grid search with all the different choices for the activation function in Keras.  Our neural network has 2 hidden layers, and the same activation function is applied in both layers.  The following parameter grid was used:

```
activation = ['softmax', 'softplus', 'softsign', 'relu', 'tanh', 'sigmoid', 'hard_sigmoid', 'linear']
```

In [11]:
display_results('imdb_results/grid_activation_results.csv')


Total training time: 10.27 minutes


Unnamed: 0,mean_fit_time,mean_test_score,mean_train_score,params
0,1.266569,0.8788,0.95198,{'activation': 'relu'}
1,1.281021,0.87812,0.92214,{'activation': 'hard_sigmoid'}
2,1.269449,0.86808,0.9179,{'activation': 'sigmoid'}
3,1.311112,0.86636,0.89932,{'activation': 'softplus'}
4,1.275075,0.86048,0.96844,{'activation': 'softsign'}
5,1.348187,0.85704,0.88972,{'activation': 'softmax'}
6,1.263466,0.74256,0.814615,{'activation': 'tanh'}
7,1.255789,0.73008,0.787662,{'activation': 'linear'}


### Check accuracy

The best performance comes from the relu activation function $g(z) = \textrm{max}\{0,z\}$, which is what we had it set at before, so nothing really changed.  This supports some of the advice given in the [deep learning text](http://www.deeplearningbook.org/contents/mlp.html) (by Goodfellow, Bengio, and Courville) that rectified linear units tend to give better performance and are also easier to optimize because their behavior is closer to linear.

In [13]:
check_model('imdb_results/imdb_activation_best.hdf5')


Training accuracy: 94.65%
Test set accuracy: 88.28%


<a id='dropout'></a>
## 5. Tune dropout regularization

**Best score using hyperparameters {'weight_constraint': 4, 'dropout_rate': 0.1}**

As we have seen, overfitting can be a problem with deep neural networks.  Dropout is a regularization technique introduced in a [2014 paper](http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf) by Srivastava et al.  The idea is that randomly selected nodes are dropped during training, which means that their contribution to the network is temporarily removed on the forward pass, so that any weight updates are not applied to these nodes during backpropagation.

We have already been using a dropout value of 0.2 applied to the first and second hidden layers of our network.  The dropout value specifies the probability of a randomly selected node being dropped from the network.  Dropout is also typically applied with a weight constraint that specifies a maximum value for the norms of the weights in each hidden layer, so we also let the weight constraint vary along with the dropout rate:  

```
dropout_rate = [0.1, 0.2, 0.3, 0.4]
weight_constraint = [1, 2, 3, 4] 
```

In [16]:
display_results('imdb_results/grid_dropout_results.csv', 10)


Total training time: 26.84 minutes


Unnamed: 0,mean_fit_time,mean_test_score,mean_train_score,params
0,1.513227,0.8772,0.95082,"{'weight_constraint': 4, 'dropout_rate': 0.1}"
1,1.515296,0.87612,0.93998,"{'weight_constraint': 3, 'dropout_rate': 0.2}"
2,1.503684,0.8752,0.95252,"{'weight_constraint': 2, 'dropout_rate': 0.1}"
3,1.526284,0.87324,0.9512,"{'weight_constraint': 1, 'dropout_rate': 0.1}"
4,1.552568,0.87232,0.93172,"{'weight_constraint': 3, 'dropout_rate': 0.4}"
5,1.506878,0.87104,0.95004,"{'weight_constraint': 2, 'dropout_rate': 0.2}"
6,1.531431,0.87048,0.93962,"{'weight_constraint': 1, 'dropout_rate': 0.3}"
7,1.538994,0.86808,0.93732,"{'weight_constraint': 1, 'dropout_rate': 0.4}"
8,1.569696,0.86752,0.93222,"{'weight_constraint': 4, 'dropout_rate': 0.3}"
9,1.502582,0.86652,0.93718,"{'weight_constraint': 3, 'dropout_rate': 0.1}"


### Check accuracy

In [17]:
check_model('imdb_results/imdb_dropout_best.hdf5')


Training accuracy: 95.46%
Test set accuracy: 88.07%


<a id='neurons'></a>
## 6.  Tune the number of nodes in the hidden layers (along with batch size and epochs)

**Best score {'batch_size': 200, 'neurons': 150, 'epochs': 3}**

In [16]:
display_results('imdb_results/grid_neurons_results.csv', 10)


Total training time: 46.1 minutes


Unnamed: 0,mean_fit_time,mean_test_score,mean_train_score,params
0,1.430316,0.87864,0.97022,"{'batch_size': 200, 'neurons': 150, 'epochs': 3}"
1,3.005411,0.87712,0.97524,"{'batch_size': 200, 'neurons': 250, 'epochs': 4}"
2,1.869987,0.87684,0.97738,"{'batch_size': 200, 'neurons': 150, 'epochs': 4}"
3,3.08061,0.87652,0.9653,"{'batch_size': 100, 'neurons': 250, 'epochs': 3}"
4,0.948743,0.87632,0.95834,"{'batch_size': 100, 'neurons': 50, 'epochs': 4}"
5,0.651717,0.8762,0.96714,"{'batch_size': 500, 'neurons': 50, 'epochs': 4}"
6,1.879045,0.87592,0.96042,"{'batch_size': 100, 'neurons': 150, 'epochs': 3}"
7,1.597687,0.87564,0.96538,"{'batch_size': 500, 'neurons': 150, 'epochs': 4}"
8,1.300412,0.87484,0.9533,"{'batch_size': 100, 'neurons': 150, 'epochs': 2}"
9,2.322612,0.87452,0.96606,"{'batch_size': 200, 'neurons': 250, 'epochs': 3}"


### Check accuracy

In [14]:
check_model('imdb_results/imdb_neurons_best.hdf5')


Training accuracy: 95.68%
Test set accuracy: 88.38%


### Best accuracy:  88.38%