# Sentiment Classification
## Objective:
To generate Word Embeddings and retrieve outputs of each layer with Keras based on the Classification task
## Dataset:
IMDb dataset 

## Loading the dataset

In [1]:
from keras.datasets import imdb

vocab_size = 10000 

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=vocab_size) 

Using TensorFlow backend.


Downloading data from https://s3.amazonaws.com/text-datasets/imdb.npz


In [0]:
from keras.preprocessing.sequence import pad_sequences
vocab_size = 10000 #vocab size
maxlen = 300  #number of word used from each review

In [0]:
#load dataset as a list of ints
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=vocab_size)
#make all sequences of the same length
x_train = pad_sequences(x_train, maxlen=maxlen)
x_test =  pad_sequences(x_test, maxlen=maxlen)

In [4]:
print(x_train.shape, y_train.shape,  x_test.shape, y_test.shape)

(25000, 300) (25000,) (25000, 300) (25000,)


In [5]:
print("Max val: ", max([max(sequence) for sequence in x_train]))
print("Max len: ", max([len(sequence) for sequence in x_train]))

Max val:  9999
Max len:  300


## Build Keras Embedding Layer Model

In [6]:
from keras.datasets import imdb
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence

import os
# create the model
model = Sequential()
model.add(Embedding(vocab_size, 32, input_length=maxlen))
model.add(Flatten())
model.add(Dense(250, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())






Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 300, 32)           320000    
_________________________________________________________________
flatten_1 (Flatten)          (None, 9600)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 250)               2400250   
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 251       
Total params: 2,720,501
Trainable params: 2,720,501
Non-trainable params: 0
_________________________________________________________________
None


In [7]:
# Fit the model
model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=5, batch_size=32, verbose=2)
# Final evaluation of the model
scores = model.evaluate(x_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))




Train on 25000 samples, validate on 25000 samples
Epoch 1/5





 - 38s - loss: 0.3943 - acc: 0.8050 - val_loss: 0.2902 - val_acc: 0.8775
Epoch 2/5
 - 37s - loss: 0.1019 - acc: 0.9637 - val_loss: 0.3986 - val_acc: 0.8542
Epoch 3/5
 - 38s - loss: 0.0183 - acc: 0.9938 - val_loss: 0.6060 - val_acc: 0.8535
Epoch 4/5
 - 38s - loss: 0.0072 - acc: 0.9978 - val_loss: 0.7482 - val_acc: 0.8549
Epoch 5/5
 - 38s - loss: 0.0172 - acc: 0.9940 - val_loss: 0.7907 - val_acc: 0.8433
Accuracy: 84.33%


In [8]:
# Mount google drive
from google.colab import drive
drive.mount('/mnt/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /mnt/drive


In [0]:
import os
os.chdir('/mnt/drive/My Drive/Colab Notebooks')
model.save('Sequential_NLP_model.h5')

## Retrive the output of each layer in keras for a given single test sample from the trained model you built

In [11]:
model.fit(x_train, y_train, batch_size=32, epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7faa748c7f60>

## Accuracy of the model

In [12]:
score, acc = model.evaluate(x_test, y_test,
                            batch_size=32)
print('Test score:', score)
print('Test accuracy:', acc)

Test score: 1.2472254811473191
Test accuracy: 0.84796


## Summary:
1. Imported IMDB dataset
2. Splitted into train and test set.
3. Built an embedded layer with word index and created key-value pair for word and word-id.
4. Retrieved the output of each layer and found out the accuracy of model.