# Sentiment Classification


### Loading the dataset (5 points)

In [1]:
from tensorflow.keras.datasets import imdb
from nltk.corpus import stopwords
import tensorflow
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 
import tensorflow as tf

In [2]:
# Loading dataset
vocab_size = 10000 #vocab size
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=vocab_size) # vocab_size is no.of words to consider from the dataset, ordering based on frequency.

## Word and Word_ID pair

In [3]:
# Word Id and Word Dictionary
Imdb_id = tensorflow.keras.datasets.imdb.get_word_index()
temp = {}
for word,id_ in Imdb_id.items():
    temp[id_] = word
Imdb_id = temp

Imdb_id

{34701: 'fawn',
 52006: 'tsukino',
 52007: 'nunnery',
 16816: 'sonja',
 63951: 'vani',
 1408: 'woods',
 16115: 'spiders',
 2345: 'hanging',
 2289: 'woody',
 52008: 'trawling',
 52009: "hold's",
 11307: 'comically',
 40830: 'localized',
 30568: 'disobeying',
 52010: "'royale",
 40831: "harpo's",
 52011: 'canet',
 19313: 'aileen',
 52012: 'acurately',
 52013: "diplomat's",
 25242: 'rickman',
 6746: 'arranged',
 52014: 'rumbustious',
 52015: 'familiarness',
 52016: "spider'",
 68804: 'hahahah',
 52017: "wood'",
 40833: 'transvestism',
 34702: "hangin'",
 2338: 'bringing',
 40834: 'seamier',
 34703: 'wooded',
 52018: 'bravora',
 16817: 'grueling',
 1636: 'wooden',
 16818: 'wednesday',
 52019: "'prix",
 34704: 'altagracia',
 52020: 'circuitry',
 11585: 'crotch',
 57766: 'busybody',
 52021: "tart'n'tangy",
 14129: 'burgade',
 52023: 'thrace',
 11038: "tom's",
 52025: 'snuggles',
 29114: 'francesco',
 52027: 'complainers',
 52125: 'templarios',
 40835: '272',
 52028: '273',
 52130: 'zaniacs',

## Original Text Corpus

In [4]:
sentence = ''
for i in x_train[1]:
    sentence = sentence + Imdb_id[i] +' '
print(sentence)

the thought solid thought senator do making to is spot nomination assumed while he of jack in where picked as getting on was did hands fact characters to always life thrillers not as me can't in at are br of sure your way of little it strongly random to view of love it so principles of guy it used producer of where it of here icon film of outside to don't all unique some like of direction it if out her imagination below keep of queen he diverse to makes this stretch and of solid it thought begins br senator and budget worthwhile though ok and awaiting for ever better were and diverse for budget look kicked any to of making it out and follows for effects show to show cast this family us scenes more it severe making senator to and finds tv tend to of emerged these thing wants but and an beckinsale cult as it is video do you david see scenery it in few those are of ship for with of wild to one is very work dark they don't do dvd with those them 


In [5]:
sentence = ''
for i in x_test[1]:
    sentence = sentence + Imdb_id[i] +' '
print(sentence)

the as you world's is quite br mankind most that quest are chase to being quickly of little it time hell to plot br of something long put are of every place this consequence and of interplay storytelling being nasty not of you warren in is failed club i i of films pay so sequences and film okay uses to received and if time done for room sugar viewer as cartoon of gives to forgettable br be because many these of reflection sugar contained gives it wreck scene to more was two when had find as you another it of themselves probably who interplay storytelling if itself by br about 1950's films not would effects that her box to miike for if hero close seek end is very together movie of wheel got say kong sugar fred close bore there is playing lot of and pan place trilogy of lacks br of their time much this men as on it is telling program br silliness okay and to frustration at corner and she of sequences to political clearly in of drugs keep guy i i was throwing room sugar as it by br be plo

## Test Train Split

In [6]:
from tensorflow.keras.preprocessing.sequence import pad_sequences
maxlen = 200  #number of word used from each review

In [7]:
#make all sequences of the same length
x_train = pad_sequences(x_train, maxlen=maxlen)
x_test =  pad_sequences(x_test, maxlen=maxlen)

from tensorflow.keras.utils import to_categorical

# Turning Binary to negative and positive categories
y_train = to_categorical(y_train,num_classes=2)
y_test = to_categorical(y_test,num_classes=2)

## Build Keras Embedding Layer Model (30 points)
We can think of the Embedding layer as a dicionary that maps a index assigned to a word to a word vector. This layer is very flexible and can be used in a few ways:

* The embedding layer can be used at the start of a larger deep learning model. 
* Also we could load pre-train word embeddings into the embedding layer when we create our model.
* Use the embedding layer to train our own word2vec models.

The keras embedding layer doesn't require us to onehot encode our words, instead we have to give each word a unqiue intger number as an id. For the imdb dataset we've loaded this has already been done, but if this wasn't the case we could use sklearn [LabelEncoder](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html).

In [8]:
# Building model
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Embedding, LSTM, Dropout,Bidirectional,Input

embed_dim = 128
lstm_out = 200

Input = Input(shape = (200,))
embeding = Embedding(vocab_size, embed_dim, input_length= x_train.shape[0])(Input)
dropout = Dropout(0.2)(embeding)
# lstm = Bidirectional(LSTM(lstm_out, recurrent_dropout=0.2))(dropout) # Trying Bidirectional for higher accuracy
lstm_1 = LSTM(lstm_out, recurrent_dropout=0.2)(dropout)# Selected efficient lstm method
hidden_dense = Dense(20,activation ='relu')(lstm_1)
output = Dense(2,activation='softmax')(hidden_dense)

model = Model(inputs = Input,outputs =output)

In [9]:
# Optimizer = tensorflow.keras.optimizers.Adam(lr = 1e-2,beta_1 = 0.9,decay = 1e-5)

model.compile(loss = 'categorical_crossentropy', optimizer='rmsprop',metrics = ['accuracy'])
print(model.summary())

Model: "functional_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 200)]             0         
_________________________________________________________________
embedding (Embedding)        (None, 200, 128)          1280000   
_________________________________________________________________
dropout (Dropout)            (None, 200, 128)          0         
_________________________________________________________________
lstm (LSTM)                  (None, 200)               263200    
_________________________________________________________________
dense (Dense)                (None, 20)                4020      
_________________________________________________________________
dense_1 (Dense)              (None, 2)                 42        
Total params: 1,547,262
Trainable params: 1,547,262
Non-trainable params: 0
____________________________________________

In [10]:
# Defining Callbacks
EarlyStopping = tensorflow.keras.callbacks.EarlyStopping(monitor='val_loss',patience=2,min_delta = 0.01)

ModelCheckpoint = tensorflow.keras.callbacks.ModelCheckpoint("model-{val_loss:.2f}.h5",monitor='loss',save_best_only= True,save_weights_only=True)

In [12]:
# Fitting model
model.fit(x_train, y_train, epochs = 10 , batch_size= 32,  validation_data=(x_test,y_test),callbacks = [EarlyStopping,ModelCheckpoint])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10


<tensorflow.python.keras.callbacks.History at 0x10d8c26a340>

## Accuracy of the model  & Retrive the output of each layer in keras for a given single test sample from the trained model you built (10 Points)

In [13]:
# Selecting the best model
Final_model = Model(inputs = Input,outputs =output)
Final_model.load_weights('model-0.31.h5')
Final_model.compile(loss = 'categorical_crossentropy', optimizer='rmsprop',metrics = ['accuracy'])

In [14]:
# Accuracy and Loss
score,acc = Final_model.evaluate(x_test, y_test, verbose = 2, batch_size = 32)
print("Loss on Test Data : %.2f" % (score))
print("Accuracy on test Data : %.2f" % (acc))

782/782 - 52s - loss: 0.3068 - accuracy: 0.8844
Loss on Test Data : 0.31
Accuracy on test Data : 0.88


## Positive Negative percentage

In [15]:
class Pos_neg_calculator():
    def __init__ (self):
        self.Positive_count = 0
        self.Negative_count = 0
        self.Positive_correct = 0
        self.Negative_correct = 0
        
    def process(self,x,pos_cnt =0 , neg_cnt = 0, pos_correct = 0, neg_correct = 0,x_test = x_test,y_test=y_test,model = model):
        import numpy as np
        try:
            result = model.predict(x_test[x].reshape(1,x_test.shape[1]),batch_size=1,verbose = None)[0]

            if np.argmax(result) == np.argmax(y_test[x]):
                if np.argmax(y_test[x]) == 0:
                    neg_correct = 1
                else:
                    pos_correct = 1

            if np.argmax(y_test[x]) == 0:
                neg_cnt = 1
            else:
                pos_cnt = 1
            
            self.Positive_count += pos_cnt
            self.Negative_count += neg_cnt
            self.Positive_correct += pos_correct
            self.Negative_correct += neg_correct
            
        except Exception as e:
            print(e)
      
    def result(self):
        print("Positive Prediction Accuracy", self.Positive_correct/self.Positive_count*100, "%")
        print("Negative Prediction Accuracy", self.Negative_correct/self.Negative_count*100, "%")
        

In [17]:
import numpy as np
import multiprocessing

cal = Pos_neg_calculator()
Processes = []

for test_element in range(int(len(x_test)/4)):
    process = multiprocessing.Process(target=cal.process(test_element))
    Processes.append(process)
    process.start()
    
for process in Processes:
    process.join()

cal.result()


Positive Prediction Accuracy 92.17731421121252 %
Negative Prediction Accuracy 84.66373350094281 %


## Review Sentence

In [18]:
sentence = ''
for i in x_train[10]:
    sentence = sentence + Imdb_id[i] +' '
print(sentence)

red startling to recently in successfully much unfortunately going dan and stuck is him sequences but of you of enough for its br that beautiful put reasons of chris chemistry wing and for of you red time trivia to as companion payoff of chris less br of subplots torture in low alive in gay some br of wing if time actual in also side any if name takes for of friendship it of 10 for had and great to as you students for movie of going and for bad well best had at woman br musical when it caused of gripping to as gem in updated for and look end gene in at world aliens of you it meet but is quite br western ideas of chris little of films he an time done this were right too to of enough for of ending become family beautiful are make right being it time much bit especially craig for of you parts bond who of here parts at due given movie of once give find actor to recently in at world dolls loved and it is video him fact you to by br of where br of grown fight culture leads 


## Predicted and actual sentiment 

In [20]:
result = model.predict(x_test[10].reshape(1,x_test.shape[1]),batch_size=1,verbose = None)[0]
    
if np.argmax(y_test[10]) == 0:
    print('Sentiment : Negative')
else:
    print('Sentiment : Positive')

if np.argmax(result) == 0:
    print('Predicted Sentiment : Negative')
else:
    print('Predicted Sentiment : Positive')

Sentiment : Positive
Predicted Sentiment : Positive


## Output from intermediate layers