#   Homework 3
## Sentiment analysis using Neural Networks

Total: 50 Points


In this homework we will perform sentiment analysis using a few simple Neural Network based architectures.
For this problem we use the IMDB Large Movie Review Dataset. The dataset contains 25,000 highly polar movie reviews for both train and test dataset, each with 12,500 positive (greater than equal to 7/10 rating) and 12,500 negative reviews(less than equal to 4/10 rating). 

Use "https://keras.io/" for keras documentation. Please use Python 3. GPU is not required but it will help improve the training speed for each problem.

Please save the notebook with your cell outputs. You will not be graded if your outputs are not present below the homework cell. Also note your outputs will be unique since you will be using your the last numbers of your uni as your random seed (In the third cell). Make sure you submit this iPython file, with the saved outputs. The submission format must be 'hw3/hw3.ipynb'. You will not submit any other files. If you do save your model weights, you will not submit them. You will however, make sure your model weights do get saved in the 'weights' folder and can be retrieved from there as well.

Please fill your details below.



Name: Rahul Rana

Uni: rr3087

Email: rr3087@columbia.edu


In [4]:
from os import listdir
import random
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Embedding, Dense, Dropout, Reshape, Merge, BatchNormalization, TimeDistributed, Lambda, Activation, LSTM, Flatten, Convolution1D, GRU, MaxPooling1D
from keras.regularizers import l2
from keras.callbacks import Callback, ModelCheckpoint, EarlyStopping
#from keras import initializers
from keras import backend as K
from keras.optimizers import SGD
from keras.optimizers import Adadelta
from keras.utils import np_utils
from keras.preprocessing import sequence
from keras import optimizers
import numpy as np
import h5py
from numpy import argmax
from sklearn.metrics import accuracy_score

In [2]:


#we retrieve train and test file names

train_dir = "./aclImdb/train/"
test_dir = "./aclImdb/test/"
tr_review = [re_filename for re_filename in listdir(train_dir)]
te_review = [re_filename for re_filename in listdir(test_dir)]

#we initialize the train and test arrays

tr_X = []
tr_Y = []
te_X = []
te_Y = []

#we arrange the reviews into the train and test arrays 

for review_file in tr_review:
    f_review = open(train_dir+review_file, "r")
    str_review = f_review.readline()
    str_review = " ".join(str_review.split(' '))
    tr_X.append(str_review)
    y_truth = int (review_file.split('.')[0].split('_')[1])
    if y_truth>=7:
        tr_Y.append(1)
    else:
        tr_Y.append(0)
        
for review_file in te_review:
    f_review = open(test_dir+review_file, "r")
    str_review = f_review.readline()
    str_review = " ".join(str_review.split(' '))
    te_X.append(str_review)
    y_truth = int (review_file.split('.')[0].split('_')[1])
    if y_truth>=7:
        te_Y.append(1)
    else:
        te_Y.append(0)
        

We will now create the validation set from the train set

use the last 4 numbers of your uni for the seed value seed to ensure all answers remain unique.

In [5]:
#replace 2 (SEED) with the last 4 numbers of your Uni
#Uni: 
SEED = 3087
seed_counter = 0
while(1):

    shuffle_combine = list(zip(tr_X, tr_Y))
    random.seed(SEED+seed_counter)
    seed_counter+=1
    random.shuffle(shuffle_combine)

    tr_X, tr_Y = zip(*shuffle_combine)

    val_X = tr_X[:5000]
    val_Y = tr_Y[:5000]

    counter = 0
    for label in val_Y:
        counter+=label

    print (counter)
    print (seed_counter)
    if(counter>2400 and counter <2600):
        tr_X = tr_X[5000:]
        tr_Y = tr_Y[5000:]
        break

2497
1


In [6]:


print("Length of Train review set : " + str(len(tr_X)))
print("Length of Train label set : " + str(len(tr_Y)))
print("Length of Validation review set : " + str(len(val_X)))
print("Length of Validation label set : " + str(len(val_Y)))
print("Length of Test review set : " + str(len(te_X)))
print("Length of Test label set : " + str(len(te_Y)))
print("*****************************************")
print("Some sample Reviews Train sets and their labels")
print(tr_X[0][:150])
print(tr_Y[0])
print(tr_X[1][:150])
print(tr_Y[1])
print(tr_X[2][:150])
print(tr_Y[2])
print(tr_X[3][:150])
print(tr_Y[3])
print(tr_X[4][:150])
print(tr_Y[4])

Length of Train review set : 20000
Length of Train label set : 20000
Length of Validation review set : 5000
Length of Validation label set : 5000
Length of Test review set : 25000
Length of Test label set : 25000
*****************************************
Some sample Reviews Train sets and their labels
I'm a Don Johnson fan, but this is undoubtedly the WORST movie, done by anybody, that I've ever seen. The acting was bad, as was the cinematography. D
0
This is a fan-made short film that pretends to be a preview for a new movie that pairs Batman and Superman! It's the sort of film that fans adore and 
1
***SPOILERS*** ***SPOILERS*** Juggernaut is a British made "thriller" released in the US by First National. Karloff is Dr. Sartorius who has to leave 
0
I could write a big enough comment on any one of the characters in Gundam Wing, they could each lead the series with their internal conflicts. Instead
1
Cabin Fever is the first feature film directed by Eli Roth.Roth and Randy Pearlstein 

In [7]:
#we collect all the reviews from train validation and test set to generate 
texts = []
texts += tr_X 
texts += te_X 
texts += val_X
len(texts)



#we clip the sentence length to first 250 words. 
MAX_SEQUENCE_LENGTH = 250

#length of vocab, Tokenizer will only use vocab_len most common words
vocab_len = 25000

#we tokenize the texts and convert all the words to tokens
tokenizer = Tokenizer(num_words=vocab_len)
tokenizer.fit_on_texts(texts)

token_tr_X = tokenizer.texts_to_sequences(tr_X)
token_te_X = tokenizer.texts_to_sequences(te_X)
token_val_X = tokenizer.texts_to_sequences(val_X)

#to ensure all reviews have the same length, we pad the smaller reviews with 0, 
#and cut the larger reviews to a max length 
#(we clip from the top, as the end of the reviews generally have a conclusion which provides better features)
x_train = sequence.pad_sequences(token_tr_X, maxlen=MAX_SEQUENCE_LENGTH)
x_test = sequence.pad_sequences(token_te_X, maxlen=MAX_SEQUENCE_LENGTH)
x_val = sequence.pad_sequences(token_val_X, maxlen=MAX_SEQUENCE_LENGTH)


#changes the labels to one-hot encoding
y_train = np_utils.to_categorical(tr_Y)
y_test = np_utils.to_categorical(te_Y)
y_val = np_utils.to_categorical(val_Y)


In [8]:
print('X_train shape:', x_train.shape)
print('X_test shape:', x_test.shape)
print('X_val shape:', x_val.shape)

print('y_train shape:', y_train.shape)
print('y_test shape:', y_test.shape)
print('y_val shape:', y_val.shape)


print("*****************************************")
print("Tokenized Reviews Train sets and their labels")
print(x_train[0][:20])
print(y_train[0])
print()
print(x_train[1][:20])
print(y_train[1])
print()
print(x_train[2][:20])
print(y_train[2])
print()
print(x_train[3][:20])
print(y_train[3])
print()
print(x_train[4][:20])
print(y_train[4])
print()

X_train shape: (20000, 250)
X_test shape: (25000, 250)
X_val shape: (5000, 250)
y_train shape: (20000, 2)
y_test shape: (25000, 2)
y_val shape: (5000, 2)
*****************************************
Tokenized Reviews Train sets and their labels
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[ 1.  0.]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[ 0.  1.]

[ 122  188   24  496   36  156 1481 2672    3  171  483  159    1  851    2
  688   41    1   80   45]
[ 1.  0.]

[ 7833     4  5196     2 17949    12    94    22   101    42   125   199
  2791     7     7     1  1497   201     1  1072]
[ 0.  1.]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[ 1.  0.]



********************************************

As you can see the reviews have now been transformed into indices to tokenized vocabulary and the labels have been converted to one-hot encoding. We can now go ahead and feed these sequences to Neural Network Models.

********************************************

# Part A

Building your first model (5 Points)

Construct this sequential model using Keras :

![title](img/model1.jpg)

In [11]:
print('Build model...')

## implement model here

model = Sequential()
model.add(Embedding(input_dim=vocab_len, output_dim=128, input_length=250))
model.add(Flatten())
model.add(Dense(200))
model.add(Activation('relu'))
model.add(Dense(2))
model.add(Activation('softmax'))

## compille it here according to instructions
model.compile(optimizer='Adam',
             loss = 'categorical_crossentropy',
             metrics=['accuracy'])

model.summary()

print("Model Built")

Build model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_3 (Embedding)      (None, 250, 128)          3200000   
_________________________________________________________________
flatten_2 (Flatten)          (None, 32000)             0         
_________________________________________________________________
dense_3 (Dense)              (None, 200)               6400200   
_________________________________________________________________
activation_3 (Activation)    (None, 200)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 2)                 402       
_________________________________________________________________
activation_4 (Activation)    (None, 2)                 0         
Total params: 9,600,602
Trainable params: 9,600,602
Non-trainable params: 0
___________________________________________________

In [12]:
print('Train...')
model.fit(x_train, y_train,
          batch_size=32,
          epochs=4,
          validation_data=(x_val, y_val),
          verbose = 1,
         shuffle = True)

Train...
Train on 20000 samples, validate on 5000 samples
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x7fec974cfd68>

# Part B

Stacking Fully Connected Layers (5 points)

Construct this sequential model using Keras :

![title](img/model2.jpg)

In [14]:
print('Build model...')

## implement model here

model = Sequential()
model.add(Embedding(input_dim=vocab_len, output_dim=128, input_length=250))
model.add(Flatten())
model.add(Dense(200))
model.add(Activation('relu'))
model.add(Dense(200))
model.add(Activation('relu'))
model.add(Dense(2))
model.add(Activation('softmax'))

## compille it here according to instructions
model.compile(optimizer='Adam',
             loss = 'categorical_crossentropy',
             metrics=['accuracy'])

model.summary()

print("Model Built")

Build model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_4 (Embedding)      (None, 250, 128)          3200000   
_________________________________________________________________
flatten_3 (Flatten)          (None, 32000)             0         
_________________________________________________________________
dense_5 (Dense)              (None, 200)               6400200   
_________________________________________________________________
activation_5 (Activation)    (None, 200)               0         
_________________________________________________________________
dense_6 (Dense)              (None, 200)               40200     
_________________________________________________________________
activation_6 (Activation)    (None, 200)               0         
_________________________________________________________________
dense_7 (Dense)              (None, 2)                 402   

In [15]:
print('Train...')
model.fit(x_train, y_train,
          batch_size=32,
          epochs=4,
          validation_data=(x_val, y_val),
          verbose = 1,
         shuffle = True)

Train...
Train on 20000 samples, validate on 5000 samples
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x7fec9aef5748>

# Part C

Using LSTMS based networks(5 Points) 

Construct this sequential model using Keras :

![title](img/model3.jpg)

In [16]:
print('Build model...')

## implement model here

model = Sequential()
model.add(Embedding(input_dim=vocab_len, output_dim=128, input_length=250))
model.add(LSTM(128))
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dense(2))
model.add(Activation('softmax'))

## compille it here according to instructions
model.compile(optimizer='Adam',
             loss = 'categorical_crossentropy',
             metrics=['accuracy'])

model.summary()

print("Model Built")

Build model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_5 (Embedding)      (None, 250, 128)          3200000   
_________________________________________________________________
lstm_1 (LSTM)                (None, 128)               131584    
_________________________________________________________________
dense_8 (Dense)              (None, 128)               16512     
_________________________________________________________________
activation_8 (Activation)    (None, 128)               0         
_________________________________________________________________
dense_9 (Dense)              (None, 2)                 258       
_________________________________________________________________
activation_9 (Activation)    (None, 2)                 0         
Total params: 3,348,354
Trainable params: 3,348,354
Non-trainable params: 0
___________________________________________________

In [17]:

print('Train...')
model.fit(x_train, y_train,
          batch_size=32,
          epochs=5,
          validation_data=(x_val, y_val),
          verbose = 1,
         shuffle = True)

Train...
Train on 20000 samples, validate on 5000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7fec3da59a20>

# Part D

Adding Pretrained Word Embeddings(10 Points)

Construct this sequential model using Keras :

Correction: The Embedding Layer Dimension (1st box) is 300, not 128.

![title](img/model4.jpg)

In [9]:
import codecs

#dimension of Glove Embeddings.
EMBEDDING_DIM = 300

word_index = tokenizer.word_index
print('Found %s unique tokens' % len(word_index))

#load glove embeddings
gembeddings_index = {}
with codecs.open('glove.42B.300d.txt', encoding='utf-8') as f:
    for line in f:
        values = line.split(' ')
        word = values[0]
        gembedding = np.asarray(values[1:], dtype='float32')
        gembeddings_index[word] = gembedding
#
f.close()
print('G Word embeddings:', len(gembeddings_index))

# nb_words contains the total length of vocab
nb_words = len(word_index) +1

#get glove embeddings for each word in tokenizer.
#g_word_embedding_matrix holds the embeddings dictionary
g_word_embedding_matrix = np.zeros((nb_words, EMBEDDING_DIM))

for word, i in word_index.items():
    gembedding_vector = gembeddings_index.get(word)
    if gembedding_vector is not None:
        g_word_embedding_matrix[i] = gembedding_vector
        
#total words in the tokenizer not in Embedding matrix
print('G Null word embeddings: %d' % np.sum(np.sum(g_word_embedding_matrix, axis=1) == 0))



Found 124252 unique tokens
G Word embeddings: 1917494
G Null word embeddings: 35772


In [13]:
print('Build model...')

## implement model here

model = Sequential()
model.add(Embedding(input_dim=nb_words, output_dim=300, input_length=250, weights=[g_word_embedding_matrix]))
model.add(LSTM(128, recurrent_dropout=0.2))
model.add(Dropout(0.2))
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dense(2))
model.add(Activation('softmax'))

## to use the glove embeddings, your embedding layer would take the vocab size as input dimension, 
## Glove embedding dimension as the output dimsion
## and you will provide the  embedding dictionary as the 'weights' parameter (!important) to the embedding layer.


## compille it here according to instructions
model.compile(optimizer='Adam',
             loss = 'categorical_crossentropy',
             metrics=['accuracy'])

model.summary()

print("Model Built")

Build model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_3 (Embedding)      (None, 250, 300)          37275900  
_________________________________________________________________
lstm_2 (LSTM)                (None, 128)               219648    
_________________________________________________________________
dropout_2 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 128)               16512     
_________________________________________________________________
activation_2 (Activation)    (None, 128)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 2)                 258       
_________________________________________________________________
activation_3 (Activation)    (None, 2)                 0     

In [14]:
print('Train...')
model.fit(x_train, y_train,
          batch_size=32,
          epochs=5,
          validation_data=(x_val, y_val),
          verbose = 1,
         shuffle = True)

Train...
Train on 20000 samples, validate on 5000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7fbb94232710>

# Dont attempt this

Stacking LSTM layers

Unfortunately it takes very long to train, be aware we can stack LTMSs over each other like this.
This requires bottom LSTM to return a sequences instead instead of single vector, which becomes input for the top LSTM.


![title](img/model5.jpg)

# Part E

Using Convolutional Networks (10 points)

Construct the model, shown below. Use the same loss functions and optimizers as before

Correction: The Embedding Layer Dimension (1st box) is 300, not 128.

![title](img/model6.jpg)

In [10]:
print('Build model...')

## implement model here

model = Sequential()
model.add(Embedding(input_dim=nb_words, output_dim=300, input_length=250, weights=[g_word_embedding_matrix]))
model.add(Convolution1D(filters=128, kernel_size=3))
model.add(Dropout(0.2))
model.add(Activation('relu'))
model.add(Convolution1D(filters=64, kernel_size=3))
model.add(Dropout(0.2))
model.add(Activation('relu'))
model.add(Convolution1D(filters=32, kernel_size=3))
model.add(Dropout(0.2))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(256))
model.add(Dropout(0.2))
model.add(Activation('relu'))
model.add(Dense(2))
model.add(Activation('softmax'))

## compille it here according to instructions
model.compile(optimizer='Adam',
             loss = 'categorical_crossentropy',
             metrics=['accuracy'])

model.summary()

print("Model Built")

Build model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 250, 300)          37275900  
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 248, 128)          115328    
_________________________________________________________________
dropout_1 (Dropout)          (None, 248, 128)          0         
_________________________________________________________________
activation_1 (Activation)    (None, 248, 128)          0         
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 246, 64)           24640     
_________________________________________________________________
dropout_2 (Dropout)          (None, 246, 64)           0         
_________________________________________________________________
activation_2 (Activation)    (None, 246, 64)           0     

In [11]:

print('Train...')
model.fit(x_train, y_train,
          batch_size=32,
          epochs=5,
          validation_data=(x_val, y_val),
          verbose = 1,
         shuffle = True)

Train...
Train on 20000 samples, validate on 5000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f1edd5d9d30>

# Part F

Model constructed : (5 points)

Test Accuracy Over 87.5%: (5 Points)

Bonus: Min(10, Square of (test_score - 88%))

Create your best model, use Validation score to judge your best model and check accuracy on test set


In [24]:
### 1st Model
print('Build model...')

## implement model here

model = Sequential()
model.add(Embedding(input_dim=nb_words, output_dim=300, input_length=250, weights=[g_word_embedding_matrix]))
model.add(Convolution1D(filters=128, kernel_size=3))
model.add(Dropout(0.2))
model.add(Activation('relu'))
model.add(Convolution1D(filters=64, kernel_size=3))
model.add(Dropout(0.2))
model.add(Activation('relu'))
model.add(Convolution1D(filters=32, kernel_size=3))
model.add(Dropout(0.2))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(256))
model.add(Dropout(0.2))
model.add(Activation('relu'))
model.add(Dense(2))
model.add(Activation('softmax'))


## compille it here according to instructions
model.compile(optimizer='Adadelta',
             loss = 'categorical_crossentropy',
             metrics=['accuracy'])

model.summary()

print("Model Built")

Build model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_5 (Embedding)      (None, 250, 300)          37275900  
_________________________________________________________________
conv1d_17 (Conv1D)           (None, 248, 128)          115328    
_________________________________________________________________
dropout_21 (Dropout)         (None, 248, 128)          0         
_________________________________________________________________
activation_25 (Activation)   (None, 248, 128)          0         
_________________________________________________________________
conv1d_18 (Conv1D)           (None, 246, 64)           24640     
_________________________________________________________________
dropout_22 (Dropout)         (None, 246, 64)           0         
_________________________________________________________________
activation_26 (Activation)   (None, 246, 64)           0     

You can keep saving models with different names in model_name, 

so you can retrieve their weights again for testing, you dont have to retrain 
(You would have to initialize the model definition again).

In [10]:
wt_dir = "./weights/"
model_name = 'model_best1'
early_stopping =EarlyStopping(monitor='val_acc', patience=2)
bst_model_path = wt_dir + model_name + '.h5'
model_checkpoint = ModelCheckpoint(bst_model_path, monitor='val_acc', save_best_only=True, save_weights_only=True)

print('Train...')
model.fit(x_train, y_train,
          batch_size=32,
          epochs=7,
          validation_data=(x_val, y_val),
          verbose = 1,
         shuffle = True,
         callbacks=[early_stopping, model_checkpoint])



Train...
Train on 20000 samples, validate on 5000 samples
Epoch 1/7
Epoch 2/7
Epoch 3/7
Epoch 4/7
Epoch 5/7
Epoch 6/7
Epoch 7/7


<keras.callbacks.History at 0x7fd736b9e198>

In [25]:
### model1
model.load_weights('./weights/model_best1.h5')
scores = model.evaluate(x_test, y_test, verbose=1)
print(" Accuracy: %.2f%%" % (scores[1]*100))

pred1 = argmax(model.predict(x_test), axis=1)

 Accuracy: 90.04%


In [21]:
### 2nd Model. Slightly different architecture from model-1
print('Build model...')

## implement model here

model = Sequential()
model.add(Embedding(input_dim=nb_words, output_dim=300, input_length=250, weights=[g_word_embedding_matrix]))
model.add(Convolution1D(filters=512, kernel_size=3))
model.add(Dropout(0.4))
model.add(Activation('relu'))
model.add(Convolution1D(filters=128, kernel_size=3))
model.add(Dropout(0.2))
model.add(Activation('relu'))
model.add(Convolution1D(filters=64, kernel_size=3))
model.add(Dropout(0.2))
model.add(Activation('relu'))
model.add(Convolution1D(filters=32, kernel_size=3))
model.add(Dropout(0.2))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(256))
model.add(Dropout(0.2))
model.add(Activation('relu'))
model.add(Dense(2))
model.add(Activation('softmax'))


## compille it here according to instructions
model.compile(optimizer='Adadelta',
             loss = 'categorical_crossentropy',
             metrics=['accuracy'])

model.summary()

print("Model Built")

Build model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_4 (Embedding)      (None, 250, 300)          37275900  
_________________________________________________________________
conv1d_13 (Conv1D)           (None, 248, 512)          461312    
_________________________________________________________________
dropout_16 (Dropout)         (None, 248, 512)          0         
_________________________________________________________________
activation_19 (Activation)   (None, 248, 512)          0         
_________________________________________________________________
conv1d_14 (Conv1D)           (None, 246, 128)          196736    
_________________________________________________________________
dropout_17 (Dropout)         (None, 246, 128)          0         
_________________________________________________________________
activation_20 (Activation)   (None, 246, 128)          0     

In [20]:
wt_dir = "./weights/"
model_name = 'model_best2'
early_stopping =EarlyStopping(monitor='val_acc', patience=2)
bst_model_path = wt_dir + model_name + '.h5'
model_checkpoint = ModelCheckpoint(bst_model_path, monitor='val_acc', save_best_only=True, save_weights_only=True)

print('Train...')
model.fit(x_train, y_train,
          batch_size=16,
          epochs=10,
          validation_data=(x_val, y_val),
          verbose = 1,
         shuffle = True,
         callbacks=[early_stopping, model_checkpoint])

Train...
Train on 20000 samples, validate on 5000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fbe6df53828>

If you plan on using Ensemble averaging, feel free to edit the code below or add multiple models.

Make sure they get saved and can be retrieved when executing serially.

In [22]:
### model2
model.load_weights('./weights/model_best2.h5')
scores = model.evaluate(x_test, y_test, verbose=1)
print(" Accuracy: %.2f%%" % (scores[1]*100))

pred2 = argmax(model.predict(x_test), axis=1)



In [27]:
### 3rd Model. Same architecture as model-1. Training 2nd time. 
print('Build model...')

## implement model here

model = Sequential()
model.add(Embedding(input_dim=nb_words, output_dim=300, input_length=250, weights=[g_word_embedding_matrix]))
model.add(Convolution1D(filters=128, kernel_size=3))
model.add(Dropout(0.2))
model.add(Activation('relu'))
model.add(Convolution1D(filters=64, kernel_size=3))
model.add(Dropout(0.2))
model.add(Activation('relu'))
model.add(Convolution1D(filters=32, kernel_size=3))
model.add(Dropout(0.2))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(256))
model.add(Dropout(0.2))
model.add(Activation('relu'))
model.add(Dense(2))
model.add(Activation('softmax'))


## compille it here according to instructions
model.compile(optimizer='Adadelta',
             loss = 'categorical_crossentropy',
             metrics=['accuracy'])

model.summary()

print("Model Built")


Build model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_6 (Embedding)      (None, 250, 300)          37275900  
_________________________________________________________________
conv1d_20 (Conv1D)           (None, 248, 128)          115328    
_________________________________________________________________
dropout_25 (Dropout)         (None, 248, 128)          0         
_________________________________________________________________
activation_30 (Activation)   (None, 248, 128)          0         
_________________________________________________________________
conv1d_21 (Conv1D)           (None, 246, 64)           24640     
_________________________________________________________________
dropout_26 (Dropout)         (None, 246, 64)           0         
_________________________________________________________________
activation_31 (Activation)   (None, 246, 64)           0     

In [28]:
wt_dir = "./weights/"
model_name = 'model_best3'
early_stopping =EarlyStopping(monitor='val_acc', patience=2)
bst_model_path = wt_dir + model_name + '.h5'
model_checkpoint = ModelCheckpoint(bst_model_path, monitor='val_acc', save_best_only=True, save_weights_only=True)

print('Train...')
model.fit(x_train, y_train,
          batch_size=32,
          epochs=7,
          validation_data=(x_val, y_val),
          verbose = 1,
         shuffle = True,
         callbacks=[early_stopping, model_checkpoint])

Train...
Train on 20000 samples, validate on 5000 samples
Epoch 1/7
Epoch 2/7
Epoch 3/7
Epoch 4/7
Epoch 5/7
Epoch 6/7
Epoch 7/7


<keras.callbacks.History at 0x7fac9b130278>

In [29]:
# model3
model.load_weights('./weights/model_best3.h5')
scores = model.evaluate(x_test, y_test, verbose=1)
print(" Accuracy: %.2f%%" % (scores[1]*100))

pred3 = argmax(model.predict(x_test), axis=1)



In [34]:
### Ensemble Averaging. Taking average of 3 best performing models. 2 have the same architecture and 1 is different

pred = np.zeros((len(x_test), 1))

for i in range(len(x_test)):
    if pred1[i]+pred2[i]+pred3[i] in (0,1):
        pred[i]=0
    elif pred1[i]+pred2[i]+pred3[i] in (2,3):
        pred[i]=1
    
pred_ensemble = np_utils.to_categorical(pred, 2)

### This is the final accuracy score

print(" Acccuracy Score: ", accuracy_score(y_test, pred_ensemble))



 Acccuracy Score:  0.90476


# Part G

Explain how Dense, LSTM and Convolution Layers work.

Explain Relu, Dropout, and Softmax work.

Analyze the architectures you constructed, with the accuracies you achieved and the training time it took. 

What are some insights you gained with these experiments? 

(5 Points)


1.
- In Dense Layers (Fully Connected Layers) every node in the layer is connected to every node in the previous layer. The final layer in a network is often a Dense Layer, which contains a single node for each target class in the model.
- LSTMs are a special kind of Recurrent Neural Network that are capable of learning long-term dependencies. They help preserve the error that can be backpropagated through time and layers. And this allows the RNN to learn over many time steps thereby learning causes and effects. They have a forgetting mechanism and a saving mechanism to maintain a long short term memmory. They work well with sequenced data.
- Convolution Layers apply a convolution operation to the inputs and pass the result to the next layer. It applies this convolution operation using filters that pass over the inputs and each one passes on a different ouptut. Convolution layers are able to capture good features from raw presented input data. 

2.
- ReLU is a non-linear activation function that applies to the outputs of a layer, such that wherever a negative value occurs, it is swapped with 0 and the positive values are passed as is. This helps mathematically, by keeping learned values from getting stuck near 0 or blowing up towards infinity.
- Dropout is a technique where randomly selected neurons are ignored during training, i.e. their outputs are not propagated forward and any weight updates are not applied during the backward pass. This is only done during training, and not testing, as a regularization technique to reduce overfitting on training data.
- Softmax is a non-linear activation function (normalized exponential function) that essentially maps a N-dimensional vector of real values to another N-dimensional vector of real values, but those are in the range [0,1] and they all sum up to 1. It is often used in the final layer of a neural network classifier.

3.
- The simple Dense layer network trains very fast. It has very few parameters, but it overfits easily and quickly. 
- Stacking the Dense layers does not create much difference in the performance. It has about the same run-time, since the parameters increase is insignificant. It also overfits quickly.
- The LSTM network has much less parameters compared to a pure Dense layer network. The accuracy stays around the same, but the model continues to overfit. However, the training time is many times over the previous networks. This could be used as a trade-off parameter, while choosing models.
- Using pre-trained word embeddings has helped in pushing up the accuracy by 2-3 points, which was expected by the use of those embeddings. And the training time has increased significantly over the previous LSTM network. The overfit reduction is very little after applying dropout. Also, the parameters have increased exponentially due to the use of word embeddings, which would mean higher memory footprint. 
- The 3 layer stacked CNN seems to have worked the best. It has slight increase in parameters over the previous model, but the training time is much less. Also, there does not seem to have been much reduction in the overfitting, with the use of dropout. 
- In part-F when experimenting different architectures, I tried the optimizer='Adadelta'. The training time increased by a few seconds, but the reduction in overfit was very significant, and there was slight improvement in the accuracy. 
 
    