#   Homework 3
## Sentiment analysis using Neural Networks

Total: 50 Points


In this homework we will perform sentiment analysis using a few simple Neural Network based architectures.
For this problem we use the IMDB Large Movie Review Dataset. The dataset contains 25,000 highly polar movie reviews for both train and test dataset, each with 12,500 positive (greater than equal to 7/10 rating) and 12,500 negative reviews(less than equal to 4/10 rating). 

Use "https://keras.io/" for keras documentation. Please use Python 3. GPU is not required but it will help improve the training speed for each problem.

Please save the notebook with your cell outputs. You will not be graded if your outputs are not present below the homework cell. Also note your outputs will be unique since you will be using your the last numbers of your uni as your random seed (In the third cell). Make sure you submit this iPython file, with the saved outputs. The submission format must be 'hw3/hw3.ipynb'. You will not submit any other files. If you do save your model weights, you will not submit them. You will however, make sure your model weights do get saved in the 'weights' folder and can be retrieved from there as well.

Please fill your details below.



Name: Ravie Lakshmanan

Uni: RL2857

Email: rl2857@columbia.edu


In [1]:
from os import listdir
import random
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Embedding, Dense, Dropout, Reshape, Merge, BatchNormalization, TimeDistributed, Lambda, Activation, LSTM, Flatten, Convolution1D, GRU, MaxPooling1D
from keras.regularizers import l2
from keras.callbacks import Callback, ModelCheckpoint, EarlyStopping
#from keras import initializers
from keras import backend as K
from keras.optimizers import SGD
from keras.optimizers import Adadelta
from keras.utils import np_utils
from keras.preprocessing import sequence
from keras import optimizers
import numpy as np

Using TensorFlow backend.


In [2]:


#we retrieve train and test file names

train_dir = "./aclImdb/train/"
test_dir = "./aclImdb/test/"
tr_review = [re_filename for re_filename in listdir(train_dir)]
te_review = [re_filename for re_filename in listdir(test_dir)]

#we initialize the train and test arrays

tr_X = []
tr_Y = []
te_X = []
te_Y = []

#we arrange the reviews into the train and test arrays 

for review_file in tr_review:
    f_review = open(train_dir+review_file, "r")
    str_review = f_review.readline()
    str_review = " ".join(str_review.split(' '))
    tr_X.append(str_review)
    y_truth = int (review_file.split('.')[0].split('_')[1])
    if y_truth>=7:
        tr_Y.append(1)
    else:
        tr_Y.append(0)
        
for review_file in te_review:
    f_review = open(test_dir+review_file, "r")
    str_review = f_review.readline()
    str_review = " ".join(str_review.split(' '))
    te_X.append(str_review)
    y_truth = int (review_file.split('.')[0].split('_')[1])
    if y_truth>=7:
        te_Y.append(1)
    else:
        te_Y.append(0)
        

We will now create the validation set from the train set

use the last 4 numbers of your uni for the seed value seed to ensure all answers remain unique.

In [3]:
#replace 2 (SEED) with the last 4 numbers of your Uni
#Uni: RL2857
SEED = 2857
seed_counter = 0
while(1):

    shuffle_combine = list(zip(tr_X, tr_Y))
    random.seed(SEED+seed_counter)
    seed_counter+=1
    random.shuffle(shuffle_combine)

    tr_X, tr_Y = zip(*shuffle_combine)

    val_X = tr_X[:5000]
    val_Y = tr_Y[:5000]

    counter = 0
    for label in val_Y:
        counter+=label

    print (counter)
    print (seed_counter)
    if(counter>2400 and counter <2600):
        tr_X = tr_X[5000:]
        tr_Y = tr_Y[5000:]
        break

2464
1


In [4]:


print("Length of Train review set : " + str(len(tr_X)))
print("Length of Train label set : " + str(len(tr_Y)))
print("Length of Validation review set : " + str(len(val_X)))
print("Length of Validation label set : " + str(len(val_Y)))
print("Length of Test review set : " + str(len(te_X)))
print("Length of Test label set : " + str(len(te_Y)))
print("*****************************************")
print("Some sample Reviews Train sets and their labels")
print(tr_X[0][:150])
print(tr_Y[0])
print(tr_X[1][:150])
print(tr_Y[1])
print(tr_X[2][:150])
print(tr_Y[2])
print(tr_X[3][:150])
print(tr_Y[3])
print(tr_X[4][:150])
print(tr_Y[4])

Length of Train review set : 20000
Length of Train label set : 20000
Length of Validation review set : 5000
Length of Validation label set : 5000
Length of Test review set : 25000
Length of Test label set : 25000
*****************************************
Some sample Reviews Train sets and their labels
I dont know why people think this is such a bad movie. Its got a pretty good plot, some good action, and the change of location for Harry does not hur
1
back in my high school days in Salina Kansas, they filmed something called "The Brave Young Men Of Weinberg" locally, and the film crews were rather p
1
"Shall We Dance?", a light-hearted flick from Japan, tells of an overworked accountant and family man who is attracted to a dance studio by a beautifu
1
Just read the original story which is written by Pu in 18th century. Strikingly, the movie despict the original spirit very well, though the plot was 
1
It was in 1988, when I saw "The Ronnie and Nancy Show" for the first time (on Austria

In [5]:
#we collect all the reviews from train validation and test set to generate 
texts = []
texts += tr_X 
texts += te_X 
texts += val_X
len(texts)



#we clip the sentence length to first 250 words. 
MAX_SEQUENCE_LENGTH = 250

#length of vocab, Tokenizer will only use vocab_len most common words
vocab_len = 25000

#we tokenize the texts and convert all the words to tokens
tokenizer = Tokenizer(num_words=vocab_len)
tokenizer.fit_on_texts(texts)

token_tr_X = tokenizer.texts_to_sequences(tr_X)
token_te_X = tokenizer.texts_to_sequences(te_X)
token_val_X = tokenizer.texts_to_sequences(val_X)

#to ensure all reviews have the same length, we pad the smaller reviews with 0, 
#and cut the larger reviews to a max length 
#(we clip from the top, as the end of the reviews generally have a conclusion which provides better features)
x_train = sequence.pad_sequences(token_tr_X, maxlen=MAX_SEQUENCE_LENGTH)
x_test = sequence.pad_sequences(token_te_X, maxlen=MAX_SEQUENCE_LENGTH)
x_val = sequence.pad_sequences(token_val_X, maxlen=MAX_SEQUENCE_LENGTH)


#changes the labels to one-hot encoding
y_train = np_utils.to_categorical(tr_Y)
y_test = np_utils.to_categorical(te_Y)
y_val = np_utils.to_categorical(val_Y)


In [6]:
print('X_train shape:', x_train.shape)
print('X_test shape:', x_test.shape)
print('X_val shape:', x_val.shape)

print('y_train shape:', y_train.shape)
print('y_test shape:', y_test.shape)
print('y_val shape:', y_val.shape)


print("*****************************************")
print("Tokenized Reviews Train sets and their labels")
print(x_train[0][:20])
print(y_train[0])
print()
print(x_train[1][:20])
print(y_train[1])
print()
print(x_train[2][:20])
print(y_train[2])
print()
print(x_train[3][:20])
print(y_train[3])
print()
print(x_train[4][:20])
print(y_train[4])
print()

X_train shape: (20000, 250)
X_test shape: (25000, 250)
X_val shape: (5000, 250)
y_train shape: (20000, 2)
y_test shape: (25000, 2)
y_val shape: (5000, 2)
*****************************************
Tokenized Reviews Train sets and their labels
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[ 0.  1.]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[ 0.  1.]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[ 0.  1.]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[ 0.  1.]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[ 0.  1.]



********************************************

As you can see the reviews have now been transformed into indices to tokenized vocabulary and the labels have been converted to one-hot encoding. We can now go ahead and feed these sequences to Neural Network Models.

********************************************

# Part A

Building your first model (5 Points)

Construct this sequential model using Keras :

![title](img/model1.jpg)

In [7]:
print('Build model...')

## implement model here
## constructing a sequential model by employing 
## a fully connected dense layer
model = Sequential()
model.add(Embedding(20000,128,input_length=250))
model.add(Flatten())
model.add(Dense(200,input_shape=(20000,250)))
model.add(Activation('relu'))
model.add(Dense(2,input_shape=(20000,250)))
model.add(Activation('softmax'))

## compille it here according to instructions
model.compile(optimizer='Adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.summary()

print("Model Built")

Build model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 250, 128)          2560000   
_________________________________________________________________
flatten_1 (Flatten)          (None, 32000)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 200)               6400200   
_________________________________________________________________
activation_1 (Activation)    (None, 200)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 402       
_________________________________________________________________
activation_2 (Activation)    (None, 2)                 0         
Total params: 8,960,602
Trainable params: 8,960,602
Non-trainable params: 0
___________________________________________________

In [8]:
print('Train...')
model.fit(x_train, y_train,
          batch_size=32,
          epochs=4,
          validation_data=(x_val, y_val),
          verbose = 1,
         shuffle = True)

Train...
Train on 20000 samples, validate on 5000 samples
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x7f20a6ee19b0>

# Part B

Stacking Fully Connected Layers (5 points)

Construct this sequential model using Keras :

![title](img/model2.jpg)

In [9]:
print('Build model...')

## implement model here
## constructing a sequential model by employing 
## a series of fully connected dense layer
model = Sequential()
model.add(Embedding(20000,128,input_length=250))
model.add(Flatten())
model.add(Dense(200,input_shape=(20000,250)))
model.add(Activation('relu'))
model.add(Dense(200,input_shape=(20000,250)))
model.add(Activation('relu'))
model.add(Dense(2,input_shape=(20000,250)))
model.add(Activation('softmax'))

## compille it here according to instructions
model.compile(optimizer='Adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

model.summary()

print("Model Built")

Build model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 250, 128)          2560000   
_________________________________________________________________
flatten_2 (Flatten)          (None, 32000)             0         
_________________________________________________________________
dense_3 (Dense)              (None, 200)               6400200   
_________________________________________________________________
activation_3 (Activation)    (None, 200)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 200)               40200     
_________________________________________________________________
activation_4 (Activation)    (None, 200)               0         
_________________________________________________________________
dense_5 (Dense)              (None, 2)                 402   

In [10]:
print('Train...')
model.fit(x_train, y_train,
          batch_size=32,
          epochs=4,
          validation_data=(x_val, y_val),
          verbose = 1,
         shuffle = True)

Train...
Train on 20000 samples, validate on 5000 samples
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x7fe6257ac160>

# Part C

Using LSTMS based networks(5 Points) 

Construct this sequential model using Keras :

![title](img/model3.jpg)

In [11]:
print('Build model...')

## implement model here
## constructing a sequential model by employing 
## a LSTM layer and a fully connected dense layer
model = Sequential()
model.add(Embedding(20000,128,input_length=250))
model.add(LSTM(128))
model.add(Dense(128,input_shape=(20000,250)))
model.add(Activation('relu'))
model.add(Dense(2,input_shape=(20000,250)))
model.add(Activation('softmax'))

## compille it here according to instructions
model.compile(optimizer='Adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

model.summary()

print("Model Built")

Build model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_3 (Embedding)      (None, 250, 128)          2560000   
_________________________________________________________________
lstm_1 (LSTM)                (None, 128)               131584    
_________________________________________________________________
dense_6 (Dense)              (None, 128)               16512     
_________________________________________________________________
activation_6 (Activation)    (None, 128)               0         
_________________________________________________________________
dense_7 (Dense)              (None, 2)                 258       
_________________________________________________________________
activation_7 (Activation)    (None, 2)                 0         
Total params: 2,708,354
Trainable params: 2,708,354
Non-trainable params: 0
___________________________________________________

In [12]:
print('Train...')
model.fit(x_train, y_train,
          batch_size=32,
          epochs=5,
          validation_data=(x_val, y_val),
          verbose = 1,
         shuffle = True)

Train...
Train on 20000 samples, validate on 5000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f4d3c0e4f60>

# Part D

Adding Pretrained Word Embeddings(10 Points)

Construct this sequential model using Keras :

Correction: The Embedding Layer Dimension (1st box) is 300, not 128.

![title](img/model4.jpg)

In [7]:
import codecs

#dimension of Glove Embeddings.
EMBEDDING_DIM = 300

word_index = tokenizer.word_index
print('Found %s unique tokens' % len(word_index))

#load glove embeddings
gembeddings_index = {}
with codecs.open('glove.42B.300d.txt', encoding='utf-8') as f:
    for line in f:
        values = line.split(' ')
        word = values[0]
        gembedding = np.asarray(values[1:], dtype='float32')
        gembeddings_index[word] = gembedding
#
f.close()
print('G Word embeddings:', len(gembeddings_index))

# nb_words contains the total length of vocab
nb_words = len(word_index) +1

#get glove embeddings for each word in tokenizer.
#g_word_embedding_matrix holds the embeddings dictionary
g_word_embedding_matrix = np.zeros((nb_words, EMBEDDING_DIM))

for word, i in word_index.items():
    gembedding_vector = gembeddings_index.get(word)
    if gembedding_vector is not None:
        g_word_embedding_matrix[i] = gembedding_vector
        
#total words in the tokenizer not in Embedding matrix
print('G Null word embeddings: %d' % np.sum(np.sum(g_word_embedding_matrix, axis=1) == 0))



Found 124252 unique tokens
G Word embeddings: 1917494
G Null word embeddings: 35772


In [10]:
print('Build model...')

## implement model here

model = Sequential()

## to use the glove embeddings, your embedding layer would take the vocab size as input dimension, 
## Glove embedding dimension as the output dimension
## and you will provide the  embedding dictionary as the 'weights' parameter (!important) to the embedding layer.
model.add(Embedding(nb_words,EMBEDDING_DIM,weights=[g_word_embedding_matrix]))
model.add(LSTM(128, dropout=0.2))
model.add(Dropout(0.2, noise_shape=None, seed=None))
model.add(Dense(128,input_shape=(nb_words,EMBEDDING_DIM)))
model.add(Activation('relu'))
model.add(Dense(2,input_shape=(nb_words,EMBEDDING_DIM)))
model.add(Activation('softmax'))

## compile it here according to instructions
model.compile(optimizer='Adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

model.summary()

print("Model Built")

Build model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, None, 300)         37275900  
_________________________________________________________________
lstm_1 (LSTM)                (None, 128)               219648    
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 128)               16512     
_________________________________________________________________
activation_3 (Activation)    (None, 128)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 2)                 258       
_________________________________________________________________
activation_4 (Activation)    (None, 2)                 0     

In [11]:
print('Train...')
model.fit(x_train, y_train,
          batch_size=32,
          epochs=5,
          validation_data=(x_val, y_val),
          verbose = 1,
         shuffle = True)

Train...
Train on 20000 samples, validate on 5000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f20a5df3a20>

# Dont attempt this

Stacking LSTM layers

Unfortunately it takes very long to train, be aware we can stack LTMSs over each other like this.
This requires bottom LSTM to return a sequences instead instead of single vector, which becomes input for the top LSTM.


![title](img/model5.jpg)

# Part E

Using Convolutional Networks (10 points)

Construct the model, shown below. Use the same loss functions and optimizers as before

Correction: The Embedding Layer Dimension (1st box) is 300, not 128.

![title](img/model6.jpg)

In [12]:
from keras.layers.convolutional import Conv1D

print('Build model...')

## implement model here

model = Sequential()
model.add(Embedding(nb_words,EMBEDDING_DIM,input_length=250, weights=[g_word_embedding_matrix]))
model.add(Conv1D(128, 3))
model.add(Dropout(0.2, noise_shape=None, seed=None))
model.add(Activation('relu'))
model.add(Conv1D(64, 3))
model.add(Dropout(0.2, noise_shape=None, seed=None))
model.add(Activation('relu'))
model.add(Conv1D(32, 3))
model.add(Dropout(0.2, noise_shape=None, seed=None))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(256,input_shape=(nb_words,EMBEDDING_DIM)))
model.add(Dropout(0.2, noise_shape=None, seed=None))
model.add(Activation('relu'))
model.add(Dense(2,input_shape=(nb_words,EMBEDDING_DIM)))
model.add(Activation('softmax'))

## compile it here according to instructions
model.compile(optimizer='Adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

model.summary()

print("Model Built")

Build model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_3 (Embedding)      (None, 250, 300)          37275900  
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 248, 128)          115328    
_________________________________________________________________
dropout_2 (Dropout)          (None, 248, 128)          0         
_________________________________________________________________
activation_5 (Activation)    (None, 248, 128)          0         
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 246, 64)           24640     
_________________________________________________________________
dropout_3 (Dropout)          (None, 246, 64)           0         
_________________________________________________________________
activation_6 (Activation)    (None, 246, 64)           0     

In [13]:

print('Train...')
model.fit(x_train, y_train,
          batch_size=32,
          epochs=5,
          validation_data=(x_val, y_val),
          verbose = 1,
         shuffle = True)

Train...
Train on 20000 samples, validate on 5000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f1f60835438>

# Part F

Model constructed : (5 points)

Test Accuracy Over 87.5%: (5 Points)

Bonus: Min(10, Square of (test_score - 88%))

Create your best model, use Validation score to judge your best model and check accuracy on test set


In [14]:
print('Build model...')

## implement model here

model = Sequential()
model.add(Embedding(nb_words,EMBEDDING_DIM,input_length=250, weights=[g_word_embedding_matrix]))
model.add(GRU(output_dim=128, return_sequences=True))
model.add(Dense(128,input_shape=(nb_words,EMBEDDING_DIM)))
model.add(Flatten())
model.add(Dense(256,input_shape=(nb_words,EMBEDDING_DIM)))
model.add(Dropout(0.2, noise_shape=None, seed=None))
model.add(Dense(128,input_shape=(nb_words,EMBEDDING_DIM)))
model.add(Dropout(0.2, noise_shape=None, seed=None))
model.add(Activation('relu'))
model.add(Dense(2,input_shape=(nb_words,EMBEDDING_DIM)))
model.add(Activation('softmax'))

## compile it here according to instructions
model.compile(optimizer='Adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

model.summary()

print("Model Built")

Build model...




_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_3 (Embedding)      (None, 250, 300)          37275900  
_________________________________________________________________
gru_3 (GRU)                  (None, 250, 128)          164736    
_________________________________________________________________
dense_8 (Dense)              (None, 250, 128)          16512     
_________________________________________________________________
flatten_3 (Flatten)          (None, 32000)             0         
_________________________________________________________________
dense_9 (Dense)              (None, 256)               8192256   
_________________________________________________________________
dropout_4 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_10 (Dense)             (None, 128)               32896     
__________

You can keep saving models with different names in model_name, 

so you can retrieve their weights again for testing, you dont have to retrain 
(You would have to initialize the model definition again).

In [15]:
#saving the weights in the weight directory
wt_dir = "./weights/"
model_name = 'model_best'
early_stopping =EarlyStopping(monitor='val_acc', patience=2)
bst_model_path = wt_dir + model_name + '.h5'
model_checkpoint = ModelCheckpoint(bst_model_path, monitor='val_acc', save_best_only=True, save_weights_only=True)

print('Train...')
model.fit(x_train, y_train,
          batch_size=32,
          epochs=7,
          validation_data=(x_val, y_val),
          verbose = 1,
         shuffle = True,
         callbacks=[early_stopping, model_checkpoint])



Train...
Train on 20000 samples, validate on 5000 samples
Epoch 1/7
Epoch 2/7
Epoch 3/7


<keras.callbacks.History at 0x7faf20a87ba8>

If you plan on using Ensemble averaging, feel free to edit the code below or add multiple models.

Make sure they get saved and can be retrieved when executing serially.

In [16]:
model.load_weights(bst_model_path)
scores = model.evaluate(x_test, y_test, verbose=1)
print("Accuracy: %.2f%%" % (scores[1]*100))

Accuracy: 88.90%


# Part G

Explain how Dense, LSTM and Convolution Layers work.

Explain Relu, Dropout, and Softmax work.

Analyze the architectures you constructed, with the accuracies you achieved and the training time it took. 

What are some insights you gained with these experiments? 

(5 Points)


**Explain how Dense, LSTM and Convolution layers work**

A dense layer is a layer where each neuron is connected to each neuron in the next layer. It can also be visualized in the form of a fully connected network where every single input node is connected to each output node. The layer is nothing but a matrix of weights or trainable parameters that get updated during backpropagation. It is also used to change the dimensions of the input vector.

LSTM (Long Short Term Memory) units are a type of recurrent neural networks. RNNs not only take into account the current input sample, but also the information they have perceived previously in time. This helps in solving problems like word predictions, where predicting the next depends on the words that come before it. But as more and more recurrent layers are added, the gap between relevant information and the layer where this is needed can get very large. LSTMs addresses this problem of long-term dependencies and vanishing gradients (during backpropagation) by preserving the error that can be backpropagated through time and layers.

A convolutional layer is to used to build a convolutional multilayer perceptron and it operates by applying the convolution of the input data and passing the output to the next layer. The parameters to this layer consists of kernels (matrices), the filter size and then uses them to compute the dot product of the entries in the filter and input, thereby producing an activation map that the neural network uses to learn which filters get activated for a specific input feature.

**Explain how Relu, Dropout and softmax work**

The ReLU function is f(x) = max(0,x), where x is the input to a neuron in a neural network. Itâ€™s an activation function that calculates the weighted sum of its input and decides whether a neuron should be activated or not. This implies, if ReLU is used as an activation layer, if the input is less than zero, it is set to zero, else it returns the input.

A dropout layer is used as a regularization technique that sets the activation of a random number of nodes to zero to prevent overfitting. This may be used in instances to bring down the model size or down the cost of training and evaluation in an environment where processing power is constrained. The number of neurons to be turned off in turn depends on the problem and is specified using the rate parameter in Keras.

The softmax is used as the final layer of a neural network which outputs a categorical probability distribution, indicating the probability that of the classes may be true. They are most commonly used to solve classification based problems.

**Analyze the architectures you constructed,with the accuracies you achieved and the training time it took.**

Looking at (A), the training time seems relatively quicker for each epoch, however the validation accuracy is just 55%. This implies the model was't good enough to capture the different features.

On the other hand, adding an addditional dense layer in (B) boosts the accuracy to ~85% for each epoch.

In a similar vein, training an LSTM based neural network (C) appears to be equivalent to having to two dense layers, as they result in almost the same accuracy as (B).

But as evidenced in Parts (D) and (E), adding a dropout layer, with a dropout rate of 0.2 improves the validation accuracy significantly, with the values reaching 90%. This way overfitting is remedied to a certain extent.

However using a LSTM network can also have significant impact on runtime, with each epoch taking approximately 4 minutes to run. This brings the total run time to train the model to 20 mins.

**What are some insights you gained from these experiments?**

These experiments give us an indication of how each layers perform and what impact they have on the training time and accuracy. They also give us a sense of how and when to apply different layers when designing a model and the nature of problems they can be used to solve.

Another crucial factor that also needs to be considered are the paramteres to these layers, be it the dimensions or dropout rate, as they affect the outcome of the model that is being designed.

Running through the different architecture, it becomes clear that LSTM is very useful for sentiment prediction, yielding a high accuracy of approximately ~89-90%.