# Sentiment Classification


## Loading the dataset

In [1]:
from keras.datasets import imdb
import numpy as np

np_load_old = np.load
np.load = lambda *a, **k: np_load_old(*a, allow_pickle=True, **k)

vocab_size = 10000 #vocab size

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=vocab_size) # vocab_size is no.of words to consider from the dataset, ordering based on frequency.

Using TensorFlow backend.


In [3]:
from keras.preprocessing.sequence import pad_sequences
vocab_size = 10000 #vocab size
maxlen = 300  #number of word used from each review

## Train test split

In [4]:
#load dataset as a list of ints
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=vocab_size)
#make all sequences of the same length
x_train = pad_sequences(x_train, maxlen=maxlen)
x_test =  pad_sequences(x_test, maxlen=maxlen)
print(x_train.shape)
print(y_train.shape)

(25000, 300)


## Creating key-value pair for word/word-id

In [6]:
word_index=imdb.get_word_index()
#print(word_index)
print(word_index.get('big'))
index_word={v:k for k,v in word_index.items()}
print(index_word.get(191))
review = ' '.join([index_word.get(i-3,'') for i in x_train[1]])
print('Movie Review:',review)
print('Sentiment:')
print('Positive' if y_train[1]==1 else "Negative")

191
big
Movie Review:                                                                                                                 big hair big boobs bad music and a giant safety pin these are the words to best describe this terrible movie i love cheesy horror movies and i've seen hundreds but this had got to be on of the worst ever made the plot is paper thin and ridiculous the acting is an abomination the script is completely laughable the best is the end showdown with the cop and how he worked out who the killer is it's just so damn terribly written the clothes are sickening and funny in equal  the hair is big lots of boobs  men wear those cut  shirts that show off their  sickening that men actually wore them and the music is just  trash that plays over and over again in almost every scene there is trashy music boobs and  taking away bodies and the gym still doesn't close for  all joking aside this is a truly bad film whose only charm is to look back on the disaster that was the 

## Build Keras Embedding Layer Model
We can think of the Embedding layer as a dicionary that maps a index assigned to a word to a word vector. This layer is very flexible and can be used in a few ways:

* The embedding layer can be used at the start of a larger deep learning model. 
* Also we could load pre-train word embeddings into the embedding layer when we create our model.
* Use the embedding layer to train our own word2vec models.

The keras embedding layer doesn't require us to onehot encode our words, instead we have to give each word a unqiue intger number as an id. For the imdb dataset we've loaded this has already been done, but if this wasn't the case we could use sklearn [LabelEncoder](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html).

In [7]:
from keras.models import Sequential
from keras.layers.embeddings import Embedding
from keras.layers import Flatten, Dense

embedding_dim=300

model = Sequential()
model.add(Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_length=maxlen))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()






Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 300, 300)          3000000   
_________________________________________________________________
flatten_1 (Flatten)          (None, 90000)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 90001     
Total params: 3,090,001
Trainable params: 3,090,001
Non-trainable params: 0
_________________________________________________________________


In [8]:
epochs=5
batch_size=500
model.fit(x_train, y_train, epochs=epochs, batch_size=batch_size)




Epoch 1/5





Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x21d7a9249c8>

In [9]:
loss, accuracy = model.evaluate(x_test, y_test)
print('Accuracy: %.2f%%' %(accuracy*100))
print('Test Loss: %.4f' % (loss))

Accuracy: 88.02%
Test Loss: 0.2885


## Retrive the output of each layer in keras for a given single test sample from the trained model you built

In [31]:
from keras import backend as K

def each_layer_output(sample):
    
    outputs = [layer.output for layer in model.layers]  
    function = K.function([model.input, K.learning_phase()], outputs)
    layer_output= function([np.array([sample,]), 1.])
    
    for (idx,result) in enumerate(layer_output):
        print('Layer number: ', idx)
        print('Layer name: ', model.layers[idx].name)
        print('Shape:', result.shape)
        print(result)

each_layer_output(x_test[1])

Layer number:  0
Layer name:  embedding_1
Shape: (1, 300, 300)
[[[ 0.01171399  0.01169814  0.00175016 ... -0.00040218 -0.00799094
   -0.00196678]
  [ 0.01171399  0.01169814  0.00175016 ... -0.00040218 -0.00799094
   -0.00196678]
  [ 0.01171399  0.01169814  0.00175016 ... -0.00040218 -0.00799094
   -0.00196678]
  ...
  [-0.06888095 -0.01494558  0.01287549 ... -0.00928971 -0.04334632
   -0.0438768 ]
  [ 0.01691806  0.00250852  0.07085334 ...  0.00812192 -0.01599471
    0.02348586]
  [ 0.06348959 -0.08087334  0.01560126 ...  0.0494609  -0.01860689
   -0.0002296 ]]]
Layer number:  1
Layer name:  flatten_1
Shape: (1, 90000)
[[ 0.01171399  0.01169814  0.00175016 ...  0.0494609  -0.01860689
  -0.0002296 ]]
Layer number:  2
Layer name:  dense_1
Shape: (1, 1)
[[0.99919134]]


## Conclusion:
The dataset was fit to a Keras Sequential model with an embedding layer of dimension 300x300 

The model accuracy was 88%