# Tensorflow Keras Tutorial - subwords and Bidirectional LSTM (Part 6)

**What is Keras?** Keras is a wrapper that allows you to implement Deep Neural Network without getting into intrinsic details of the Network. It can use Tensorflow or Theano as backend. This tutorial series will cover Keras from beginner to intermediate level.

YOU CAN CHECK OUT REST OF THE TUTORIALS OF THIS SERIES.

[PART 1](https://www.kaggle.com/akashkr/tf-keras-tutorial-neural-network-part-1)<br>
[PART 2](https://www.kaggle.com/akashkr/tf-keras-tutorial-cnn-part-2)<br>
[PART 3](https://www.kaggle.com/akashkr/tf-keras-tutorial-binary-classification-part-3)<br>
[PART 4](https://www.kaggle.com/akashkr/tf-keras-tutorial-pretrained-models-part-4)<br>

<font color=red>IF YOU HAVEN'T GONE THROUGH THE PREVIOUS PART OF THIS TUTORIAL, IT'S RECOMMENDED FOR YOU TO GO THROUGH THAT FIRST.</font><br>
[PART 5](https://www.kaggle.com/akashkr/tf-keras-tutorial-basics-of-nlp-part-5)

In the previous notebooks we worked on image data. Now we are going to see text data. The common places where NLP is applied is Document Classification, Sentiment Analysis, Chat-bots etc.

In this tutorial we are going to see,
* Text Preprocessing
* Modelling
    - LSTM classic
    - LSTM Bidirectional
    - Convolutional
    - GRU
    - Glove

In [None]:
from tensorflow.keras.preprocessing.sequence import pad_sequences
from plotly.offline import init_notebook_mode, iplot, plot
from tensorflow.keras.preprocessing.text import Tokenizer
from sklearn.model_selection import train_test_split
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import matplotlib.pyplot as plt
import plotly.express as px
import tensorflow as tf
from tqdm import tqdm
import pandas as pd
import numpy as np
import pickle
import re

init_notebook_mode(connected=True)

In [None]:
train_df = pd.read_csv('../input/nlp-getting-started/train.csv')

# Overview of Dataset

### Data Format

Each sample in the train and test set has the following information:

* The text of a tweet
* A keyword from that tweet (although this may be blank!)
* The location the tweet was sent from (may also be blank)

### Target

**You are predicting whether a given tweet is about a real disaster or not**. If so, predict a 1. If not, predict a 0.

### Columns

id - a unique identifier for each tweet
text - the text of the tweet
location - the location the tweet was sent from (may be blank)
keyword - a particular keyword from the tweet (may be blank)
target - in train.csv only, this denotes whether a tweet is about a real disaster (1) or not (0)

> NOTE: **We will be using just the text and target features of the data**

In [None]:
train_df.head()

In [None]:
print(f'Shape of data: {train_df.shape}')
# Find the number of missing values
print(train_df.info())

# Preprocessing

In [None]:
# Splitting the data into 2/3 as train and 1/3 as test
X_train, X_test, y_train, y_test = train_test_split(train_df['text'], train_df['target'], test_size=0.33, random_state=42)

In [None]:
vocab_size = 10000
embedding_dim = 16
max_length = 50
trunc_type='post'
oov_tok = "<OOV>"

# Tokenization
tokenizer = Tokenizer(num_words = vocab_size, oov_token=oov_tok)
tokenizer.fit_on_texts(X_train)

word_index = tokenizer.word_index
sequences = tokenizer.texts_to_sequences(X_train)
testing_sequences = tokenizer.texts_to_sequences(X_test)

# Padding
padded = pad_sequences(sequences, maxlen=max_length, truncating=trunc_type)
testing_padded = pad_sequences(testing_sequences, maxlen=max_length)

In [None]:
print(f'Train padded shape: {padded.shape}')
print(f'Test padded shape: {testing_padded.shape}')

# Modelling
### LSTM - Long Short Term Memory networks

LSTM resolve the long term dependency issues of a network.

Lets take a sentence - `We live in Patna, which has a hot and humid climate`<br>
In this sentence "hot" and "humid" describe "Patna". But with our previous encoding, we don't take into account the relation of word before and after certain word. LSTM solves this issue by feeding the output of word at place `t` to the input of word at place `t+1`.

There are main types of LSTM. We are going to cover a few

#### LSTM Classic
In this LSTM, the output of `t` is fed as input of `t+1` node. Here is a digram of **2 layer LSTM**.
![](https://adventuresinmachinelearning.com/wp-content/uploads/2018/02/Keras-LSTM-tutorial-architecture.png)

#### Bidirectional LSTM
In this LSTM, the output of `t` is fed as input of `t-1` and `t+1` node.
![](https://www.i2tutorials.com/wp-content/uploads/2019/05/Deep-Dive-into-Bidirectional-LSTM-i2tutorials.jpg)

### LSTM Classic

In [None]:
# Model with simple LSTM
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, 64),
    tf.keras.layers.LSTM(64),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
model.summary()

num_epochs = 10
history_lstm = model.fit(padded, y_train, epochs=num_epochs, validation_data=(testing_padded, y_test))

### Bidirectional LSTM

In [None]:
# Two bidirectional LSTM layers
# Note that if you want to connect one LSTM to another, you have to pass return_sequences=True
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, 64, input_length=max_length),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64, return_sequences=True)),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
model.summary()

num_epochs = 10
history_bi = model.fit(padded, y_train, epochs=num_epochs, validation_data=(testing_padded, y_test))

### Convolutional Neural Network

In [None]:
# Convolutional layer 
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, 64, input_length=max_length),
    tf.keras.layers.Conv1D(128, 5, activation='relu'),
    tf.keras.layers.GlobalAveragePooling1D(),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
model.summary()

num_epochs = 10
history_conv = model.fit(padded, y_train, epochs=num_epochs, validation_data=(testing_padded, y_test))

### GRU
A gated recurrent unit (GRU) is a gating mechanism in recurrent neural networks (RNN) similar to a long short-term memory (LSTM) unit but without an output gate. GRUâ€™s try to solve the vanishing gradient problem that can come with standard recurrent neural networks.

In [None]:
# Using Gated Recurrent Unit
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim),
    tf.keras.layers.Bidirectional(tf.keras.layers.GRU(32)),
    tf.keras.layers.Dense(6, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
model.summary()

num_epochs = 10
history_gru = model.fit(padded, y_train, epochs=num_epochs, validation_data=(testing_padded, y_test))

# Glove

GloVe stands for global vectors for word representation. For more about Glove - [Read this](https://medium.com/analytics-vidhya/word-vectorization-using-glove-76919685ee0b).

We will be following 3 steps to generate model using glove vector.
* Load Glove model which has 2.2M word embedding, each embedding of length 300
* Get embedding of those words which are in our corpus and construct a embedding matrix of size - **Number of word in data x 300**
* Assign the weight to our first layer and make them non-trainable

In [None]:
# Load Glove model
glove_pickle = '../input/pickled-glove840b300d-for-10sec-loading/glove.840B.300d.pkl'

with open(glove_pickle,'rb') as f:
    embeddings_index = pickle.load(f)
    
print(f'Number of words in Glove: {len(embeddings_index)}')

In [None]:
# create an embedding matrix for the words we have in the dataset
embedding_matrix = np.zeros((len(word_index) + 1, 300))
for word, i in tqdm(word_index.items()):
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None:
        embedding_matrix[i] = embedding_vector
        
print(f'Shape of Embedding: {embedding_matrix.shape}')

In [None]:
# A simple LSTM with glove embeddings and one dense layer
model = tf.keras.Sequential()
model.add(
    tf.keras.layers.Embedding(
        len(word_index) + 1, 300, weights=[embedding_matrix],
        input_length=max_length, trainable=False
    )
)
model.add(tf.keras.layers.LSTM(100, dropout=0.3, recurrent_dropout=0.3))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam',metrics=['accuracy'])
    
model.summary()

num_epochs = 10
history_glove = model.fit(padded, y_train, epochs=num_epochs, validation_data=(testing_padded, y_test))

# Visualization

In [None]:
import plotly.graph_objects as go
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=("Bi LSTM", "Convolution", "GRU", "Glove"),
    shared_yaxes=True,
    shared_xaxes=True,
    vertical_spacing=0.1,
    horizontal_spacing=0.03)

epochs = list(range(1, len(history_lstm.history['accuracy'])+1))

fig.add_trace(
    go.Scatter(x=epochs, y=history_bi.history['accuracy'], name='accuracy'),
    row=1, col=1
)
fig.add_trace(
    go.Scatter(x=epochs, y=history_bi.history['val_accuracy'], name='val_accuracy'),
    row=1, col=1
)

fig.add_trace(
    go.Scatter(x=epochs, y=history_conv.history['accuracy'], name='accuracy'),
    row=1, col=2
)
fig.add_trace(
    go.Scatter(x=epochs, y=history_conv.history['val_accuracy'], name='val_accuracy'),
    row=1, col=2
)

fig.add_trace(
    go.Scatter(x=epochs, y=history_gru.history['accuracy'], name='accuracy'),
    row=2, col=1
)
fig.add_trace(
    go.Scatter(x=epochs, y=history_gru.history['val_accuracy'], name='val_accuracy'),
    row=2, col=1
)

fig.add_trace(
    go.Scatter(x=epochs, y=history_glove.history['accuracy'], name='accuracy'),
    row=2, col=2
)
fig.add_trace(
    go.Scatter(x=epochs, y=history_glove.history['val_accuracy'], name='val_accuracy'),
    row=2, col=2
)

fig.update_layout(title='Accuracy', showlegend=False)

In [None]:
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=("Bi LSTM", "Convolution", "GRU", "Glove"),
    shared_yaxes=True,
    shared_xaxes=True,
    vertical_spacing=0.1,
    horizontal_spacing=0.03)

epochs = list(range(1, len(history_lstm.history['loss'])+1))

fig.add_trace(
    go.Scatter(x=epochs, y=history_bi.history['loss'], name='loss'),
    row=1, col=1
)
fig.add_trace(
    go.Scatter(x=epochs, y=history_bi.history['val_loss'], name='val_loss'),
    row=1, col=1
)

fig.add_trace(
    go.Scatter(x=epochs, y=history_conv.history['loss'], name='loss'),
    row=1, col=2
)
fig.add_trace(
    go.Scatter(x=epochs, y=history_conv.history['val_loss'], name='val_loss'),
    row=1, col=2
)

fig.add_trace(
    go.Scatter(x=epochs, y=history_gru.history['loss'], name='loss'),
    row=2, col=1
)
fig.add_trace(
    go.Scatter(x=epochs, y=history_gru.history['val_loss'], name='val_loss'),
    row=2, col=1
)

fig.add_trace(
    go.Scatter(x=epochs, y=history_glove.history['loss'], name='loss'),
    row=2, col=2
)
fig.add_trace(
    go.Scatter(x=epochs, y=history_glove.history['val_loss'], name='val_loss'),
    row=2, col=2
)

fig.update_layout(title='Loss', showlegend=False)

So that was all about some basic modelling for text classification. In the Next tutorial we'll be looking at text generation! Sounds Interesting? Here it is
> ## PART 7: [Text Generation](https://www.kaggle.com/akashkr/tf-keras-tutorial-rnn-text-generation-part-7)