<a href="https://colab.research.google.com/github/mithunmp567/Deep-Learning-projects-and-code/blob/master/SeqNLP_Project1_Questions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Sentiment Classification


## Loading the dataset

In [2]:
from keras.datasets import imdb

vocab_size = 10000 #vocab size

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=vocab_size) # vocab_size is no.of words to consider from the dataset, ordering based on frequency.

Using TensorFlow backend.


In [0]:
from keras.preprocessing.sequence import pad_sequences
vocab_size = 10000 #vocab size
maxlen = 300  #number of word used from each review

## Train test split

In [0]:
#load dataset as a list of ints
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=vocab_size)
#make all sequences of the same length
x_train = pad_sequences(x_train, maxlen=maxlen)
x_test =  pad_sequences(x_test, maxlen=maxlen)

In [5]:
x_train.shape

(25000, 300)

## Build Keras Embedding Layer Model
We can think of the Embedding layer as a dicionary that maps a index assigned to a word to a word vector. This layer is very flexible and can be used in a few ways:

* The embedding layer can be used at the start of a larger deep learning model. 
* Also we could load pre-train word embeddings into the embedding layer when we create our model.
* Use the embedding layer to train our own word2vec models.

The keras embedding layer doesn't require us to onehot encode our words, instead we have to give each word a unqiue intger number as an id. For the imdb dataset we've loaded this has already been done, but if this wasn't the case we could use sklearn [LabelEncoder](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html).

In [0]:
import tensorflow as tf
model = tf.keras.Sequential()

In [0]:
# different activation functions such as tanh, relu , sigmoid were tried
# SGD - different value for lr, decay were tried
Initializer = tf.keras.initializers.RandomNormal(mean=0., stddev=1.)
model.add(tf.keras.layers.Embedding(vocab_size+1, 64, input_length=maxlen))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(1,activation='sigmoid',kernel_initializer=Initializer))
optimizer=tf.keras.optimizers.SGD(lr=1e-4, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(optimizer=optimizer, loss='binary_crossentropy',metrics=['accuracy'])

In [8]:
# the model was giving good accuracy in the beginning epocs and then accuracy flattened out
# to 0.5 for many hyperparameter combinations , 
# so callback were introduced to store the weights and early checkpoints to stop the learning
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau
checkpoint = ModelCheckpoint("segmodel-{loss:.2f}.h5", monitor="loss", verbose=1, 
                              mode="min", period=1)
stop = EarlyStopping(monitor="loss", patience=2, mode="min")



In [9]:
model.fit(x_train, y_train, epochs=50,callbacks=[checkpoint,stop],verbose=1)

Epoch 1/50
Epoch 00001: saving model to segmodel-1.36.h5
Epoch 2/50
Epoch 00002: saving model to segmodel-1.13.h5
Epoch 3/50
Epoch 00003: saving model to segmodel-0.98.h5
Epoch 4/50
Epoch 00004: saving model to segmodel-0.88.h5
Epoch 5/50
Epoch 00005: saving model to segmodel-0.79.h5
Epoch 6/50
Epoch 00006: saving model to segmodel-0.72.h5
Epoch 7/50
Epoch 00007: saving model to segmodel-0.66.h5
Epoch 8/50
Epoch 00008: saving model to segmodel-0.61.h5
Epoch 9/50
Epoch 00009: saving model to segmodel-0.56.h5
Epoch 10/50
Epoch 00010: saving model to segmodel-0.52.h5
Epoch 11/50
Epoch 00011: saving model to segmodel-0.49.h5
Epoch 12/50
Epoch 00012: saving model to segmodel-0.45.h5
Epoch 13/50
Epoch 00013: saving model to segmodel-0.43.h5
Epoch 14/50
Epoch 00014: saving model to segmodel-0.40.h5
Epoch 15/50
Epoch 00015: saving model to segmodel-0.38.h5
Epoch 16/50
Epoch 00016: saving model to segmodel-0.36.h5
Epoch 17/50
Epoch 00017: saving model to segmodel-0.34.h5
Epoch 18/50
Epoch 00018

<tensorflow.python.keras.callbacks.History at 0x7ff120b66198>


## Retrive the output of each layer in keras for a given single test sample from the trained model you built

In [0]:
import numpy as np

extractor = tf.keras.Model(inputs=model.inputs,
                        outputs=[layer.output for layer in model.layers])
# Pass the index of the value to be retrieved in the x_test
features = extractor.predict(x_test[100:101])

In [59]:
print('embedding layer::',features[0])
print('flatten layer::',features[1])
print('dense layer layer::',features[2])

embedding layer:: [[[-0.00241878 -0.00198121 -0.02949657 ...  0.00719413 -0.0301381
   -0.00125196]
  [-0.00241878 -0.00198121 -0.02949657 ...  0.00719413 -0.0301381
   -0.00125196]
  [-0.00241878 -0.00198121 -0.02949657 ...  0.00719413 -0.0301381
   -0.00125196]
  ...
  [ 0.01114866  0.01198614  0.00740292 ... -0.02444486  0.00656367
   -0.0006521 ]
  [-0.01190376  0.02656314  0.02684777 ... -0.01247837 -0.01696602
    0.03347691]
  [ 0.03468628  0.03626407  0.0078885  ... -0.02263929  0.05523968
    0.05705073]]]
flatten layer:: [[-0.00241878 -0.00198121 -0.02949657 ... -0.02263929  0.05523968
   0.05705073]]
dense layer layer:: [[0.00732397]]
