<style TYPE=\"text/css\">
    code.has-jax {font: inherit; font-size: 100%; background: inherit; border: inherit;}
    </style>
    <script type=\"text/x-mathjax-config\">
    MathJax.Hub.Config({
        tex2jax: {
            inlineMath: [['$','$'], ['\\\\(','\\\\)']],
            skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'] // removed 'code' entry
    }
    });
    MathJax.Hub.Queue(function() {
        var all = MathJax.Hub.getAllJax(), i;
        for(i = 0; i < all.length; i += 1) {
            all[i].SourceElement().parentNode.className += ' has-jax';
        }
    });
    </script>
    <script type=\"text/javascript\" src=\"http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML\"></script>

# Sequence Models

## Sequence Data

Why do we need special models for these?

### Motivation

#### Language Modelling: Computing Probablity of a sentence

* Text data


    The dog jumped over the table.
    The table jumped over the dog.

Cons of other models when dealing with sequential data:

* Feed forward networks with the entire sequence as input
    1. Input size isn't fixed. Workaround? Append with special elements representing the end. But we also have huge input size: Lead to huge models.
    2. Are we harnessing the sequential information present?
    3. Parameters should learn to identify stuff at every place. Workaround? Convolutions! Some models do use CNNs for Sequences.


* Feed forward networks processing one element of the sequence at a time.
    1. Are we harnessing the sequential information present? NO! Infact treating everything as the same.

## Recurrent Neural Networks

* Way to harness the sequential information.


* Simple addition to the second model. Just pass information from one time step to the other so as to help it understand better.


* Downside: Loss of parallelization!



<img src='./images/RNN.png' width="600"></img>    

### Forward Prop:

<img src='./images/Prop.png' width="500"></img>

* $$ h^{<t>} = g ( W . h^{<t-1>} + U.x^{<t>} + b) $$


$$ a^{<t>} = f ( V . h^{<t>} + c) $$

$$ g: tanh \quad f: softmax $$

### LSTMs and GRUs

* Drawbacks of RNN: Backpropagation through time leads to Vanishing/Exploding Gradients.

* Developed new models to handle these drawbacks

<img src='./images/GRU_LSTM.jpg'></img>

We've quickly studied the building blocks for sequence models. Now let's look at applications

## TASK: Binary sentiment classification

In [24]:
import numpy as np 
import pandas as pd 
import os

from sklearn.feature_extraction.text import CountVectorizer
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Model
from keras.layers import Dense, Embedding, LSTM, Input
from sklearn.model_selection import train_test_split
from keras.utils.np_utils import to_categorical
import re

In [10]:
data = pd.read_csv('Dataset.csv')
pd.set_option('display.max_colwidth',-1)
data.head()

Unnamed: 0,text,stars,sentiment
0,the pizza was okay not the best ive had i prefer biaggios on flamingo fort apache the chef there can make a much better ny style pizza the pizzeria cosmo was over priced for the quality and lack of personality in the food biaggios is a much better pick if youre going for italian family owned home made recipes people that actually care if you like their food you dont get that at a pizzeria in a casino i dont care what you say,2,0
1,i love this place my fiance and i go here atleast once a week the portions are huge food is amazing i love their carne asada they have great lunch specials leticia is super nice and cares about what you think of her restaurant you have to try their cheese enchiladas too the sauce is different and amazing,5,1
2,terrible dry corn bread rib tips were all fat and mushy and had no flavor if you want bbq in this neighborhood go to john mulls roadkill grill trust me,1,0
3,back in 20052007 this place was my favorite thai place ever id go here alllll the time i never had any complaints once they started to get more known and got busy their service started to suck and their portion sizes got cut in half i have a huge problem with paying more for way less food the last time i went there i had the pork pad se ew and it tasted good but i finished my plate and was still hungry i used to know the manager here and she would greet me with a hello melissa nice to see you again diet coke pad thai or pad se ew now a days i know she still knows me but she disregards my presence also i had asked her what was up with the new portion sizes and she had no answer for me great food but not worth the money i havent been back in over a year because i refuse to pay 1015 for dinner and still be hungry after sorry pinkaow you are not what you used to be,2,0
4,delicious healthy food the steak is amazing fish and pork are awesome too service is above and beyond not a bad thing to say about this place worth every penny,5,1


In [11]:
len(data)

10000

* Hyperparameters that can be tuned:
    * num_words
    * max_length_of_text

In [56]:
num_words = 5000
tokenizer = Tokenizer(num_words=num_words, filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n',
                                   lower=True,split=' ')
tokenizer.fit_on_texts(data['text'].values)
X = tokenizer.texts_to_sequences(data['text'].values)

word_index = tokenizer.word_index
print('Found %s unique tokens.' % len(word_index))

max_length_of_text = 200
X = pad_sequences(X, maxlen=max_length_of_text)

Found 31493 unique tokens.


In [94]:
X[7653]

array([   0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    3,   22,  269, 1570,
          9,    4,  700,    7, 1610,  104,    1,  1

In [58]:
y = data['sentiment']
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, random_state = 42)
print(X_train.shape,y_train.shape)
print(X_test.shape,y_test.shape)

(8000, 200) (8000,)
(2000, 200) (2000,)


In [86]:
embed_dim = 50
lstm_out = 128
batch_size = 32

inputs = Input((max_length_of_text, ))
x = Embedding(num_words, embed_dim)(inputs)
x = LSTM(lstm_out, dropout=0.2, recurrent_dropout=0.2)(x)
x = Dense(1,activation='sigmoid')(x)
model = Model(inputs, x)
print(model.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_16 (InputLayer)        (None, 200)               0         
_________________________________________________________________
embedding_11 (Embedding)     (None, 200, 50)           250000    
_________________________________________________________________
lstm_17 (LSTM)               (None, 128)               91648     
_________________________________________________________________
dense_16 (Dense)             (None, 1)                 129       
Total params: 341,777
Trainable params: 341,777
Non-trainable params: 0
_________________________________________________________________
None


In [87]:
model.compile(loss = 'binary_crossentropy', optimizer='adam',metrics = ['accuracy'])

In [88]:
model.fit(X_train, y_train, batch_size = batch_size, epochs = 1)

Epoch 1/1


<keras.callbacks.History at 0x7f2155dd0668>

In [89]:
score,acc = model.evaluate(X_test, y_test, batch_size = batch_size)
print("Score: %.2f" % (score))
print("Validation Accuracy: %.2f" % (acc))

Score: 0.37
Validation Accuracy: 0.84


### Using word vectors

* What are word vectors and why use them?
    * The size of one hot vectors becomes large if you have a lot of words in the vocabulary.
    * One hot vectors treat every word to be equally dissimilar from each other, while this is not the case.
    * The dop leaped over the fence v/s The dog jumped over the fence. Ideally we would want words with same meaning to have the same representation.  
    * Word vectors are a way to encode the semantic meaning of a word.


* How do we learn them?
    * Representation Learning
    * Skip-gram model, GloVe model

* Game of Thrones characters:

<img src='./images/WordEmbeddings.png'></img> 

* Word2vec Analogies:

 <img src='./images/WVecAnalogies.jpg' width="500"></img> 

In [90]:
embeddings_index = {}
f = open('glove.6B.50d.txt')
for line in f:
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:], dtype='float32')
    embeddings_index[word] = coefs
f.close()

print('Found %s word vectors.' % len(embeddings_index))

Found 400000 word vectors.


In [91]:
EMBEDDING_DIM = 50
embedding_matrix = np.zeros((len(word_index) + 1, EMBEDDING_DIM))
for word, i in word_index.items():
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None:
        # words not found in embedding index will be all-zeros.
        embedding_matrix[i] = embedding_vector

* Can fine tune the embeddings to suit our job. Set trainable to True and check the results

In [92]:
from keras.layers import Embedding

embedding_layer = Embedding(len(word_index) + 1,
                            EMBEDDING_DIM,
                            weights=[embedding_matrix],
                            input_length=max_length_of_text,
                            trainable=False)

In [82]:
inputs2 = Input((length_of_text, ))
x2 = embedding_layer(inputs2)
x2 = LSTM(lstm_out, dropout=0.2, recurrent_dropout=0.2, return_sequences=False)(x2)
x2 = Dense(1,activation='sigmoid')(x2)
model2 = Model(inputs2, x2)
print(model2.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_15 (InputLayer)        (None, 200)               0         
_________________________________________________________________
embedding_10 (Embedding)     (None, 200, 100)          3149400   
_________________________________________________________________
lstm_16 (LSTM)               (None, 128)               117248    
_________________________________________________________________
dense_15 (Dense)             (None, 1)                 129       
Total params: 3,266,777
Trainable params: 117,377
Non-trainable params: 3,149,400
_________________________________________________________________
None


In [83]:
model2.compile(loss = 'binary_crossentropy', optimizer='adam',metrics = ['accuracy'])

In [84]:
model2.fit(X_train, y_train, batch_size = batch_size, epochs = 1)

Epoch 1/1


<keras.callbacks.History at 0x7f2162936e10>

In [85]:
score,acc = model2.evaluate(X_test, y_test, batch_size = batch_size)
print("Score: %.2f" % (score))
print("Validation Accuracy: %.2f" % (acc))

Score: 0.45
Validation Accuracy: 0.79


## Why is there a decrease in accuracy?

* Sentiment Classification is fairly easy.

* Number of parameters

* Check the same for the tasks below.

* Read about Word vectors impact of Machine Translation


* Also see a document classification example:
    * https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html (Using the news20.tar.gz dataset)

## TASKS FOR TODAY'S LAB

### Tune hyperparameters to achieve better accuracy

### Predict the stars rating for a review using a similar models as above

In [None]:
inputs2 = Input((length_of_text, ))
x2 = embedding_layer(inputs2)
x2 = LSTM(lstm_out, dropout=0.2, recurrent_dropout=0.2, return_sequences=False)(x2)
x2 = Dense(5,activation='softmax')(x2)
model2 = Model(inputs2, x2)
print(model2.summary())

In [None]:
model2.compile(loss = 'categorical_crossentropy', optimizer='adam',metrics = ['accuracy'])