# Machine Learning Project Serie 1:
# IMDB Movie Review Sentiment Classification 
# Episode 5: Convolutional Neural Network
This episode focuses on fitting and testing data with the Convolutional Neural Network. Word embedding technique is still being used.

## I. Importing Libraries

In [4]:
import numpy as np
import os
import pathlib
import tensorflow as tf
from tensorflow.keras import regularizers
from keras.layers import Bidirectional, Concatenate, Permute, Dot, Input, LSTM, Multiply, Embedding, Reshape, Flatten, Dropout, GRU
from keras.layers import RepeatVector, Dense, Activation, Lambda, Softmax, Conv1D
from keras.optimizers import Adam, SGD
from keras.utils import to_categorical
from keras.models import load_model, Model
import keras.backend as K
import keras

## II. Extracting Data

In [5]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.imdb.load_data()

  x_train, y_train = np.array(xs[:idx]), np.array(labels[:idx])
  x_test, y_test = np.array(xs[idx:]), np.array(labels[idx:])


In [6]:
word_dict = tf.keras.datasets.imdb.get_word_index(path="imdb_word_index.json")

In [7]:
vocab_len = len(word_dict)
print("Total words count:", vocab_len)

Total words count: 88584


## III. Data Preprocessing

In [8]:
chosen_cmt_len = 2000
max_index = 25000

def padding(initial_x):
    output = np.zeros((chosen_cmt_len))
    for i in range(chosen_cmt_len):
        if i < len(initial_x) and initial_x[i] < max_index:
            output[i] = initial_x[i]
        else:
            output[i] = 0
    return output

In [9]:
x_train_padded = np.zeros((len(x_train), chosen_cmt_len))
for i in range(len(x_train)):
    x_train_padded[i] = padding(x_train[i])

In [10]:
x_test_padded = np.zeros((len(x_test), chosen_cmt_len))
for i in range(len(x_test)):
    x_test_padded[i] = padding(x_test[i])

## IV. Machine Learning Model:

In [11]:
e_s = 20

In [12]:
# Creating model:
def model():
    
    # Retrieving inputs
    X_input = Input(shape=(chosen_cmt_len,))
    
    # Embedding meanings
    embedding = Embedding(max_index, e_s)(X_input)
    
    drop = Dropout(0.9)(embedding)
    
    conv_1 = Conv1D(filters=1, kernel_size=5, strides=1, activation='tanh', padding='causal')(drop)
    
    drop = Dropout(0.9)(conv_1)
    
    flatten = Flatten()(drop)
    
    output = Dense(1, activation='sigmoid',
                   kernel_regularizer=regularizers.l1_l2(l1=1e-5, l2=1e-4),
                   bias_regularizer=regularizers.l2(1e-4),
                   activity_regularizer=regularizers.l2(1e-5)
                  )(flatten)

    model = Model(inputs = X_input, outputs = output)
    
    return model

In [13]:
model = model()
model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 2000)]            0         
_________________________________________________________________
embedding (Embedding)        (None, 2000, 20)          500000    
_________________________________________________________________
dropout (Dropout)            (None, 2000, 20)          0         
_________________________________________________________________
conv1d (Conv1D)              (None, 2000, 1)           101       
_________________________________________________________________
dropout_1 (Dropout)          (None, 2000, 1)           0         
_________________________________________________________________
flatten (Flatten)            (None, 2000)              0         
_________________________________________________________________
dense (Dense)                (None, 1)                 2001  

In [14]:
# Optimizer for the model
learning_rate = 5e-3
opt = Adam(lr=learning_rate, decay=1e-5)
model.compile(optimizer=opt, 
              loss='binary_crossentropy', 
              metrics=[tf.keras.metrics.BinaryAccuracy(name="binary_accuracy", threshold=0.5)])

In [15]:
# Storing histories
histories = []
testings  = []

# Track testing accuracy
prev_acc = 0
curr_acc = 0.01

# Max testing accuracy
max_acc = 0

# Fitting and evaluating the model after epochs
epoch = 1

# Keep training as long as testing accuracy on testing set is still increasing
while epoch < 21:
    # Fitting
    print("Epoch:", epoch)
    print("Fitting data:")
    history = model.fit(x = x_train_padded, y = np.array(y_train).reshape(25000, 1), epochs=1, batch_size=1000)
    
    # Evaluating
    print("Testing data:")
    testing = model.evaluate(x_test_padded, np.array(y_test).reshape(25000, 1))
    
    # Assigning max accuracy
    if testing[1] > max_acc:
        max_acc = testing[1]
        
    # Assigning test accuracy
    prev_acc = curr_acc
    curr_acc = testing[1]
        
    # Adjust learning rate
    if prev_acc > curr_acc:
        learning_rate /= 10
        opt = Adam(lr=learning_rate, decay=1e-5)
        model.compile(optimizer=opt, 
              loss='binary_crossentropy', 
              metrics=[tf.keras.metrics.BinaryAccuracy(name="binary_accuracy", threshold=0.5)])
    
    # Storing
    histories.append(history)
    testings.append(testing)
    
    epoch += 1
    print('\n')

print("Optimal testing accuracy is: {:.2f}%".format(max_acc * 100))

Epoch: 1
Fitting data:
Testing data:


Epoch: 2
Fitting data:
Testing data:


Epoch: 3
Fitting data:
Testing data:


Epoch: 4
Fitting data:
Testing data:


Epoch: 5
Fitting data:
Testing data:


Epoch: 6
Fitting data:
Testing data:


Epoch: 7
Fitting data:
Testing data:


Epoch: 8
Fitting data:
Testing data:


Epoch: 9
Fitting data:
Testing data:


Epoch: 10
Fitting data:
Testing data:


Epoch: 11
Fitting data:
Testing data:


Epoch: 12
Fitting data:
Testing data:


Epoch: 13
Fitting data:
Testing data:


Epoch: 14
Fitting data:
Testing data:


Epoch: 15
Fitting data:
Testing data:


Epoch: 16
Fitting data:
Testing data:


Epoch: 17
Fitting data:
Testing data:


Epoch: 18
Fitting data:
Testing data:


Epoch: 19
Fitting data:
Testing data:


Epoch: 20
Fitting data:
Testing data:


Optimal testing accuracy is: 88.60%


In [16]:
model.evaluate(x_train_padded, np.array(y_train).reshape(25000, 1))



[0.24780705571174622, 0.9339600205421448]

## V. Summary:
Convolutional Network performed the best on a similar number of parameters to other models' (approx 500,000 parameters) and very little training time.

|-|Loss|Accuracy|Sample size|
|-|-|-|-|
|Training|1.25|93.4%|25,000|
|Testing |0.33|88.6%|25,000|

The data in the given sample seems to have simple general structures as shallower neural networks work much better than deep ones (this was a sub-test not being shown in this report).

## VIII. Thank you:
Thank you for viewing my project. This is the final episode of this serie. One of the important things I learnt from this serie was that deeper neural networks do not always mean better performances. If the depth and complexity of the network is increased exceeding some specific threshold, it will hurt performance. The right model should be chosen based on analysis of the complexity and structure of the given data. At the beginning of the project, I was trying to create a model that can generalize well for data from various distributions, but that model-centric approach seems to be not as efficient as I thought, data-centric might still be a more useful approach in most cases, as stated by Professor Andrew Ng.

That's it for this serie. Hope to see you in the next serie.