# Objective

1. Build an CNN model for IMDB dataset 
2. Adjust the parameters for better accuracy, such as number of layers, number of nodes in each layer, optimizer, learning rate, etc

# Prepare Environment

In [1]:
%env KERAS_BACKEND=tensorflow
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt

env: KERAS_BACKEND=tensorflow


# Prepare Data
1. Load data

In [2]:
from keras.datasets import imdb
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=10000)

Using TensorFlow backend.


2. Pad sequences
    - Make all the reviews in the same length (100 words).
    - If the review is too long, strip it; otherwise, pad zeros.

In [3]:
from keras.preprocessing import sequence
x_train = sequence.pad_sequences(x_train, maxlen=100)
x_test = sequence.pad_sequences(x_test, maxlen=100)

3. Normalization
    - The value of each dimension was between 0 to 9999. Make it between 0 to 1.

In [4]:
x_train = x_train / 10000
x_test = x_test / 10000

4. Reshape
    - Conv1D needs two dimensional input

In [5]:
print('=== before reshape ===')
print('x_train.shape:', x_train.shape)
print('x_test.shape:', x_test.shape)
x_train = x_train.reshape(25000, 100, 1)
x_test = x_test.reshape(25000, 100, 1)
print('=== after reshape ===')
print('x_train.shape:', x_train.shape)
print('x_test.shape:', x_test.shape)

=== before reshape ===
x_train.shape: (25000, 100)
x_test.shape: (25000, 100)
=== after reshape ===
x_train.shape: (25000, 100, 1)
x_test.shape: (25000, 100, 1)


# Build CNN

In [6]:
from keras.models import Sequential
from keras.layers import Dense, Flatten, Conv1D, MaxPooling1D

def three_layer_cnn(filters=[32, 64, 128], kernel_size=3, pool_size=2, activation='relu', loss='binary_crossentropy', optimizer='adam', batch_size=32, epochs=15):
    model = Sequential()
    model.add(Conv1D(filters=filters[0], kernel_size=kernel_size, padding='same', input_shape=(100, 1), activation=activation))
    model.add(MaxPooling1D(pool_size=pool_size))

    model.add(Conv1D(filters=filters[1], kernel_size=kernel_size, padding='same', activation=activation))
    model.add(MaxPooling1D(pool_size=pool_size))

    model.add(Conv1D(filters=filters[2], kernel_size=kernel_size, padding='same', activation=activation))
    model.add(MaxPooling1D(pool_size=pool_size))

    model.add(Flatten())
    model.add(Dense(200, activation=activation))
    model.add(Dense(1, activation='sigmoid'))

    model.summary()
    
    model.compile(loss=loss, optimizer=optimizer, metrics=['accuracy'])
    model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs)
    train_accuracy = model.evaluate(x_train, y_train)[1]
    test_accuracy = model.evaluate(x_test, y_test)[1]
    return (train_accuracy, test_accuracy)

### Default setting from class
    - Three sets of convolution layer and max pooling layer, followed by a hidden layer and an output layer
    - Filter size of convolution 1D: 3
    - Pool size of max pooling: 2
    - Activation function of each convolution layer and hidden layer: relu
    - Loss function: binary crossentropy
    - Optimizer: adam
    - Batch size: 32
    - Epochs: 15

In [7]:
three_layer_cnn()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv1d_1 (Conv1D)            (None, 100, 32)           128       
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 50, 32)            0         
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 50, 64)            6208      
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 25, 64)            0         
_________________________________________________________________
conv1d_3 (Conv1D)            (None, 25, 128)           24704     
_________________________________________________________________
max_pooling1d_3 (MaxPooling1 (None, 12, 128)           0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 1536)              0         
__________

(0.79776, 0.50856)

### Summary
- As shown, the accuracy of testing data is only 0.51, which is basically the same as guessing.
- Even the accuracy of training data is only 0.80, which took 15 epochs to reach.


# Tune Parameter
1. Change filter size
    - Try filter size bwtween 2 to 9
    - Results
        * filter size = 2 : (0.73376, 0.51368)
        * filter size = 3 : (0.73912, 0.51072)
        * filter size = 4 : (0.71476, 0.51404)
        * filter size = 5 : (0.62932, 0.5154)
        * filter size = 6 : (0.74752, 0.51608)
        * filter size = 7 : (0.68644, 0.51532)
        * filter size = 8 : (0.7202, 0.52288)
        * filter size = 9 : (0.75204, 0.51608)
    - As shown, changing the filter size does not make much difference


In [9]:
for kernel_size in range(2, 10):
    print('filter size =', kernel_size, ':', three_layer_cnn(kernel_size=kernel_size))

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv1d_13 (Conv1D)           (None, 100, 32)           96        
_________________________________________________________________
max_pooling1d_13 (MaxPooling (None, 50, 32)            0         
_________________________________________________________________
conv1d_14 (Conv1D)           (None, 50, 64)            4160      
_________________________________________________________________
max_pooling1d_14 (MaxPooling (None, 25, 64)            0         
_________________________________________________________________
conv1d_15 (Conv1D)           (None, 25, 128)           16512     
_________________________________________________________________
max_pooling1d_15 (MaxPooling (None, 12, 128)           0         
_________________________________________________________________
flatten_5 (Flatten)          (None, 1536)              0         
__________

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15
filter size = 7 : (0.68644, 0.51532)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv1d_31 (Conv1D)           (None, 100, 32)           288       
_________________________________________________________________
max_pooling1d_31 (MaxPooling (None, 50, 32)            0         
_________________________________________________________________
conv1d_32 (Conv1D)           (None, 50, 64)            16448     
_________________________________________________________________
max_pooling1d_32 (MaxPooling (None, 25, 64)            0         
_________________________________________________________________
conv1d_33 (Conv1D)           (None, 25, 128)           65664     
_________________________________________________________________


2. Change activation function
    - Results
        * activation = softmax : (0.5, 0.5)
        * activation = elu : (0.55984, 0.51588)
        * activation = selu : (0.55148, 0.5148)
        * activation = softplus : (0.5, 0.5)
        * activation = softsign : (0.55916, 0.51976)
        * activation = relu : (0.54028, 0.52088)
        * activation = tanh : (0.55116, 0.513)
        * activation = sigmoid : (0.5, 0.5)
        * activation = hard_sigmoid : (0.5, 0.5)
        * activation = exponential : (0.5, 0.5)
        * activation = linear : (0.5546, 0.51764)
    - As shown, changing the activation function does not make much difference

In [10]:
for activation in ['softmax', 'elu', 'selu', 'softplus', 'softsign', 'relu', 'tanh', 'sigmoid', 'hard_sigmoid', 'exponential', 'linear' ]:
    print('activation =', activation, ':', three_layer_cnn(activation=activation, epochs=5))

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv1d_37 (Conv1D)           (None, 100, 32)           128       
_________________________________________________________________
max_pooling1d_37 (MaxPooling (None, 50, 32)            0         
_________________________________________________________________
conv1d_38 (Conv1D)           (None, 50, 64)            6208      
_________________________________________________________________
max_pooling1d_38 (MaxPooling (None, 25, 64)            0         
_________________________________________________________________
conv1d_39 (Conv1D)           (None, 25, 128)           24704     
_________________________________________________________________
max_pooling1d_39 (MaxPooling (None, 12, 128)           0         
_________________________________________________________________
flatten_13 (Flatten)         (None, 1536)              0         
__________

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
activation = sigmoid : (0.5, 0.5)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv1d_61 (Conv1D)           (None, 100, 32)           128       
_________________________________________________________________
max_pooling1d_61 (MaxPooling (None, 50, 32)            0         
_________________________________________________________________
conv1d_62 (Conv1D)           (None, 50, 64)            6208      
_________________________________________________________________
max_pooling1d_62 (MaxPooling (None, 25, 64)            0         
_________________________________________________________________
conv1d_63 (Conv1D)           (None, 25, 128)           24704     
_________________________________________________________________
max_pooling1d_63 (MaxPooling (None, 12, 128)           0         
__________________________________________________________

3. Change loss function
    - Results
        * loss = mean_squared_error : (0.544, 0.5232)
        * loss = mean_absolute_error : (0.51572, 0.51108)
        * loss = mean_absolute_percentage_error : (0.5, 0.5)
        * loss = squared_hinge : (0.5, 0.5)
        * loss = hinge : (0.5, 0.5)
        * loss = categorical_hinge : (0.5, 0.5)
        * loss = logcosh : (0.54332, 0.52524)
        * loss = sparse_categorical_crossentropy : (0.0, 0.0)
        * loss = binary_crossentropy : (0.54516, 0.52352)
        * loss = kullback_leibler_divergence : (0.5, 0.5)
        * loss = poisson : (0.55212, 0.5284)
        * loss = cosine_proximity : (0.5, 0.5)
    - Summary
        * Categorical crossentropy is removed because it does not fit the single output scenario and will cause exception
        * Even though mean_absolute_percentage_error, squared_hinge, hinge, categorical_hinge, sparse_categorical_crossentropy, kullback_leibler_divergence and cosine_proximity did not cause an exception. The results show that they are not suitable for this scenario. They are better at categorical problems.

In [12]:
for loss in ['mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error', 'squared_hinge', 'hinge', 'categorical_hinge', 'logcosh', 'sparse_categorical_crossentropy', 'binary_crossentropy', 'kullback_leibler_divergence', 'poisson', 'cosine_proximity']:
    print('loss =', loss, ':', three_layer_cnn(loss=loss, epochs=5))

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv1d_94 (Conv1D)           (None, 100, 32)           128       
_________________________________________________________________
max_pooling1d_94 (MaxPooling (None, 50, 32)            0         
_________________________________________________________________
conv1d_95 (Conv1D)           (None, 50, 64)            6208      
_________________________________________________________________
max_pooling1d_95 (MaxPooling (None, 25, 64)            0         
_________________________________________________________________
conv1d_96 (Conv1D)           (None, 25, 128)           24704     
_________________________________________________________________
max_pooling1d_96 (MaxPooling (None, 12, 128)           0         
_________________________________________________________________
flatten_32 (Flatten)         (None, 1536)              0         
__________

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
loss = sparse_categorical_crossentropy : (0.0, 0.0)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv1d_118 (Conv1D)          (None, 100, 32)           128       
_________________________________________________________________
max_pooling1d_118 (MaxPoolin (None, 50, 32)            0         
_________________________________________________________________
conv1d_119 (Conv1D)          (None, 50, 64)            6208      
_________________________________________________________________
max_pooling1d_119 (MaxPoolin (None, 25, 64)            0         
_________________________________________________________________
conv1d_120 (Conv1D)          (None, 25, 128)           24704     
_________________________________________________________________
max_pooling1d_120 (MaxPoolin (None, 12, 128)           0         
________________________________________

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
loss = cosine_proximity : (0.5, 0.5)


4. Change optimizer
    - Results
        * optimizer = sgd : (0.52352, 0.51852)
        * optimizer = rmsprop : (0.55028, 0.51808)
        * optimizer = adagrad : (0.55248, 0.5264)
        * optimizer = adadelta : (0.53608, 0.52188)
        * optimizer = adam : (0.5544, 0.53284)
        * optimizer = adamax : (0.55652, 0.5228)
        * optimizer = nadam : (0.54472, 0.52824)
    - As shown, changing the optimizer does not make much difference

In [13]:
for optimizer in ['sgd', 'rmsprop', 'adagrad', 'adadelta', 'adam', 'adamax', 'nadam']:
    print('optimizer =', optimizer, ':', three_layer_cnn(optimizer=optimizer, epochs=5))

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv1d_130 (Conv1D)          (None, 100, 32)           128       
_________________________________________________________________
max_pooling1d_130 (MaxPoolin (None, 50, 32)            0         
_________________________________________________________________
conv1d_131 (Conv1D)          (None, 50, 64)            6208      
_________________________________________________________________
max_pooling1d_131 (MaxPoolin (None, 25, 64)            0         
_________________________________________________________________
conv1d_132 (Conv1D)          (None, 25, 128)           24704     
_________________________________________________________________
max_pooling1d_132 (MaxPoolin (None, 12, 128)           0         
_________________________________________________________________
flatten_44 (Flatten)         (None, 1536)              0         
__________

Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
optimizer = adadelta : (0.53608, 0.52188)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv1d_142 (Conv1D)          (None, 100, 32)           128       
_________________________________________________________________
max_pooling1d_142 (MaxPoolin (None, 50, 32)            0         
_________________________________________________________________
conv1d_143 (Conv1D)          (None, 50, 64)            6208      
_________________________________________________________________
max_pooling1d_143 (MaxPoolin (None, 25, 64)            0         
_________________________________________________________________
conv1d_144 (Conv1D)          (None, 25, 128)           24704     
_________________________________________________________________
max_pooling1d_144 (MaxPoolin (None, 12, 128)           0         
____________________________________________________________

5. Change batch size
    - Results
        * batch_size = 10 : (0.51064, 0.5062)
        * batch_size = 32 : (0.54372, 0.52628)
        * batch_size = 100 : (0.54696, 0.52464)
        * batch_size = 200 : (0.5368, 0.511)
    - As shown, changing the batch size does not make much difference

In [14]:
for batch_size in [10, 32, 100, 200]:
    print('batch_size =', batch_size, ':', three_layer_cnn(batch_size=batch_size, epochs=5))

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv1d_151 (Conv1D)          (None, 100, 32)           128       
_________________________________________________________________
max_pooling1d_151 (MaxPoolin (None, 50, 32)            0         
_________________________________________________________________
conv1d_152 (Conv1D)          (None, 50, 64)            6208      
_________________________________________________________________
max_pooling1d_152 (MaxPoolin (None, 25, 64)            0         
_________________________________________________________________
conv1d_153 (Conv1D)          (None, 25, 128)           24704     
_________________________________________________________________
max_pooling1d_153 (MaxPoolin (None, 12, 128)           0         
_________________________________________________________________
flatten_51 (Flatten)         (None, 1536)              0         
__________

6. Change number of epochs
    - Results
        * epochs = 5 : (0.54932, 0.52664)
        * epochs = 10 : (0.6086, 0.51436)
        * epochs = 15 : (0.82016, 0.51284)
        * epochs = 20 : (0.93604, 0.51464)
    - Training with more epochs makes the model fits to the training data better but does not help to fit the testing data at all.

In [15]:
for epochs in [5, 10, 15, 20]:
    print('epochs =', epochs, ':', three_layer_cnn(epochs=epochs))

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv1d_163 (Conv1D)          (None, 100, 32)           128       
_________________________________________________________________
max_pooling1d_163 (MaxPoolin (None, 50, 32)            0         
_________________________________________________________________
conv1d_164 (Conv1D)          (None, 50, 64)            6208      
_________________________________________________________________
max_pooling1d_164 (MaxPoolin (None, 25, 64)            0         
_________________________________________________________________
conv1d_165 (Conv1D)          (None, 25, 128)           24704     
_________________________________________________________________
max_pooling1d_165 (MaxPoolin (None, 12, 128)           0         
_________________________________________________________________
flatten_55 (Flatten)         (None, 1536)              0         
__________

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
epochs = 20 : (0.93604, 0.51464)


# Summary
Many paramters are tested while the testing accuracy is usually around 50%.
A more accurate result may be found if more parameter combinations are tested or the structure of CNN is designed better.
However, I think it is sufficient to show that CNN is not a proper model which naturally suits the problem.