# NLP - Basic Sentiment Classification Model

#### Natural Language Processing - Basic Sentiment classification using Keras IMDB dataset 
1. Data pre-processing - define vocabulary size, train-test split, sequence padding 
2. Keras model designing with embeddings, dense layers and dropouts, sigmoid activation 
3. Train the model and print accuracy 
4. Retrieve the output of each layer in keras for a given single test sample 



### Import Necessary Libraries

In [1]:
import os  
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
#get_ipython().magic(u'matplotlib inline')
#plt.style.use('ggplot')

import tensorflow as tf

from keras import models, regularizers, layers, optimizers, losses, metrics
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Embedding
from keras.layers import Flatten
from keras.utils import np_utils, to_categorical
 
from keras.datasets import imdb

Using TensorFlow backend.


### Load and Split the Dataset into Train and Test

In [0]:
vocab_size = 10000 #vocab size

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=vocab_size) 
# vocab_size is no.of words to consider from the dataset, ordering based on frequency.

In [0]:
#make all sequences of the same length
from keras.preprocessing.sequence import pad_sequences
maxlen = 300  #number of word used from each review
x_train = pad_sequences(x_train, maxlen=maxlen)
x_test =  pad_sequences(x_test, maxlen=maxlen)

### Check dataset shapes and sample data

In [4]:
x_train.shape

(25000, 300)

In [5]:
x_test.shape

(25000, 300)

In [6]:
y_train.shape

(25000,)

In [7]:
y_test.shape

(25000,)

In [8]:
x_train[0]

array([   0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    1,   14,   22,   16,   43,  530,
        973, 1622, 1385,   65,  458, 4468,   66, 3941,    4,  173,   36,
        256,    5,   25,  100,   43,  838,  112,   50,  670,    2,    9,
         35,  480,  284,    5,  150,    4,  172,  112,  167,    2,  336,
        385,   39,    4,  172, 4536, 1111,   17,  546,   38,   13,  447,
          4,  192,   50,   16,    6,  147, 2025,   19,   14,   22,    4,
       1920, 4613,  469,    4,   22,   71,   87,   

In [9]:
y_train[0]

1

### Design the Model - including Keras Embedding Layer

In [10]:
model = models.Sequential()
model.add(Embedding(10001, 512, input_length=maxlen))
model.add(layers.Dense(256, kernel_regularizer=regularizers.l1(0.001), activation='relu', input_shape=(300,)))
model.add(layers.Dropout(0.3))
model.add(layers.Dense(64, kernel_regularizer=regularizers.l1(0.001),activation='relu'))
model.add(layers.Dropout(0.3))
model.add(layers.Flatten())
model.add(layers.Dense(1, activation='sigmoid'))






Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


### Compile Model and Print Summary

In [11]:
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])

print(model.summary())





Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 300, 512)          5120512   
_________________________________________________________________
dense_1 (Dense)              (None, 300, 256)          131328    
_________________________________________________________________
dropout_1 (Dropout)          (None, 300, 256)          0         
_________________________________________________________________
dense_2 (Dense)              (None, 300, 64)           16448     
_________________________________________________________________
dropout_2 (Dropout)          (None, 300, 64)           0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 19200)             0         
________________________

### Train the Model and Print Accuracy Results

In [12]:

NumEpochs = 30
BatchSize = 512

history = model.fit(x_train, y_train, epochs=NumEpochs, batch_size=BatchSize, validation_data=(x_test, y_test))

results = model.evaluate(x_test, y_test)
print("_"*80)
print("Validation Loss and Accuracy:")
print(results)




Train on 25000 samples, validate on 25000 samples
Epoch 1/30





Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
________________________________________________________________________________
Validation Loss and Accuracy:
[0.6242404225540161, 0.84012]


## Retrieve the output of each layer in keras for a given single test sample

### Get Model Layer names and select layers

In [13]:
layer_names_list = [layr.name for layr in model.layers]
print ("layer names list: ", layer_names_list)  

layer names list:  ['embedding_1', 'dense_1', 'dropout_1', 'dense_2', 'dropout_2', 'flatten_1', 'dense_3']


In [14]:
selected_layers = ['embedding_1', 'dense_1', 'dropout_1', 'dense_2', 'dropout_2', 'flatten_1', 'dense_3']
matched_indices = [i for i, item in enumerate(layer_names_list) if item in selected_layers]
print (matched_indices)

[0, 1, 2, 3, 4, 5, 6]


In [0]:
selected_layers_outputs = []
for lr in range(len(matched_indices)):
   outputs = model.layers[matched_indices[lr]].output 
   #output from selected layers
   selected_layers_outputs.append(outputs)


### Build Model for Display of Layer Outputs 

In [0]:
from keras.models import Model
display_model = Model(inputs = model.input, outputs = selected_layers_outputs)

### Select an input from 25000 Test Data points

In [0]:
selected_feature_maps = display_model.predict(x_test[24300:24301])

Output of Layer **'embedding_1'**

In [18]:
print(selected_feature_maps[0])

[[[-0.0108036   0.13904348 -0.00908325 ... -0.00131345  0.0097856
    0.07019526]
  [-0.0108036   0.13904348 -0.00908325 ... -0.00131345  0.0097856
    0.07019526]
  [-0.0108036   0.13904348 -0.00908325 ... -0.00131345  0.0097856
    0.07019526]
  ...
  [-0.02235573 -0.07827283  0.02279053 ... -0.05432862 -0.00015859
   -0.03288679]
  [ 0.07425639  0.01446936 -0.02833833 ...  0.02202661 -0.00790284
   -0.01863442]
  [ 0.03605311  0.03661082 -0.06151279 ...  0.03023439 -0.01297506
   -0.00981976]]]


Output of Layer **'dense_1'**



In [19]:
print(selected_feature_maps[1])

[[[0.         0.         0.         ... 0.         0.         0.        ]
  [0.         0.         0.         ... 0.         0.         0.        ]
  [0.         0.         0.         ... 0.         0.         0.        ]
  ...
  [0.         0.         0.         ... 0.17173481 0.         0.        ]
  [0.         0.         0.         ... 0.         0.         0.        ]
  [0.         0.         0.         ... 0.         0.         0.        ]]]


Output of Layer **'dropout_1'**

In [20]:
print(selected_feature_maps[2])

[[[0.         0.         0.         ... 0.         0.         0.        ]
  [0.         0.         0.         ... 0.         0.         0.        ]
  [0.         0.         0.         ... 0.         0.         0.        ]
  ...
  [0.         0.         0.         ... 0.17173481 0.         0.        ]
  [0.         0.         0.         ... 0.         0.         0.        ]
  [0.         0.         0.         ... 0.         0.         0.        ]]]


Output of Layer **'dense_2'**

In [21]:
print(selected_feature_maps[3])

[[[0.         0.         0.         ... 0.         0.         0.        ]
  [0.         0.         0.         ... 0.         0.         0.        ]
  [0.         0.         0.         ... 0.         0.         0.        ]
  ...
  [0.07406334 0.08916754 0.05824891 ... 0.09130906 0.056501   0.07584336]
  [0.         0.         0.         ... 0.         0.         0.        ]
  [0.         0.         0.         ... 0.         0.         0.        ]]]


Output of Layer **'dropout_2'**

In [22]:
print(selected_feature_maps[4])

[[[0.         0.         0.         ... 0.         0.         0.        ]
  [0.         0.         0.         ... 0.         0.         0.        ]
  [0.         0.         0.         ... 0.         0.         0.        ]
  ...
  [0.07406334 0.08916754 0.05824891 ... 0.09130906 0.056501   0.07584336]
  [0.         0.         0.         ... 0.         0.         0.        ]
  [0.         0.         0.         ... 0.         0.         0.        ]]]


Output of Layer **'flatten_1'**

In [23]:
print(selected_feature_maps[5])

[[0. 0. 0. ... 0. 0. 0.]]


Output of Layer **'dense_3'**

In [24]:
print(selected_feature_maps[6])

[[0.99334717]]
