# CS470 Introduction to Artificial Intelligence
## Deep Learning Practice 
#### Prof. Ho-Jin Choi
#### School of Computing, KAIST

---

### 4-3. Text classification with an RNN

Let's build text classification model with RNN on the IMDB dataset for sentiment analysis.

In [None]:
try:
    %tensorflow_version 2.x
except Exception:
    pass

import tensorflow_datasets as tfds
import tensorflow as tf

#### Setup input pipeline

The IMDB large movie review dataset is a binary classification dataset—all the reviews have either a positive or negative sentiment.

Let's download the dataset using [`TensorFlow Datasets`](https://www.tensorflow.org/datasets).

In [None]:
dataset, info = tfds.load('imdb_reviews/subwords8k', with_info=True, as_supervised=True)
train_dataset, test_dataset = dataset['train'], dataset['test']



[1mDownloading and preparing dataset imdb_reviews/subwords8k/1.0.0 (download: 80.23 MiB, generated: Unknown size, total: 80.23 MiB) to /root/tensorflow_datasets/imdb_reviews/subwords8k/1.0.0...[0m


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]





0 examples [00:00, ? examples/s]

Shuffling and writing examples to /root/tensorflow_datasets/imdb_reviews/subwords8k/1.0.0.incompleteORPGVZ/imdb_reviews-train.tfrecord


  0%|          | 0/25000 [00:00<?, ? examples/s]

0 examples [00:00, ? examples/s]

Shuffling and writing examples to /root/tensorflow_datasets/imdb_reviews/subwords8k/1.0.0.incompleteORPGVZ/imdb_reviews-test.tfrecord


  0%|          | 0/25000 [00:00<?, ? examples/s]

0 examples [00:00, ? examples/s]

Shuffling and writing examples to /root/tensorflow_datasets/imdb_reviews/subwords8k/1.0.0.incompleteORPGVZ/imdb_reviews-unsupervised.tfrecord


  0%|          | 0/50000 [00:00<?, ? examples/s]



[1mDataset imdb_reviews downloaded and prepared to /root/tensorflow_datasets/imdb_reviews/subwords8k/1.0.0. Subsequent calls will reuse this data.[0m


Since the dataset comes with an inbuilt subword tokenizer, we can use the tokenzier to tokenize any strings into tokens.

In [None]:
tokenizer = info.features['text'].encoder
print (f'Vocabulary size: {tokenizer.vocab_size}')

Vocabulary size: 8185


In [None]:
tokenizer.subwords

['the_',
 ', ',
 '. ',
 'a_',
 'and_',
 'of_',
 'to_',
 's_',
 'is_',
 'br',
 'in_',
 'I_',
 'that_',
 'this_',
 'it_',
 ' /><',
 ' />',
 'was_',
 'The_',
 'as_',
 't_',
 'with_',
 'for_',
 '.<',
 'on_',
 'but_',
 'movie_',
 ' (',
 'are_',
 'his_',
 'have_',
 'film_',
 'not_',
 'ing_',
 'be_',
 'ed_',
 'you_',
 ' "',
 'it',
 'd_',
 'an_',
 'he_',
 'by_',
 'at_',
 'one_',
 'who_',
 'y_',
 'from_',
 'e_',
 'or_',
 'all_',
 'like_',
 'they_',
 '" ',
 'so_',
 'just_',
 'has_',
 ') ',
 'her_',
 'about_',
 'out_',
 'This_',
 'some_',
 'ly_',
 'movie',
 'film',
 'very_',
 'more_',
 'It_',
 'would_',
 'what_',
 'when_',
 'which_',
 'good_',
 'if_',
 'up_',
 'only_',
 'even_',
 'their_',
 'had_',
 'really_',
 'my_',
 'can_',
 'no_',
 'were_',
 'see_',
 'she_',
 '? ',
 'than_',
 '! ',
 'there_',
 'get_',
 'been_',
 'into_',
 ' - ',
 'will_',
 'much_',
 'story_',
 'because_',
 'ing',
 'time_',
 'n_',
 'we_',
 'ed',
 'me_',
 ': ',
 'most_',
 'other_',
 'don',
 'do_',
 'm_',
 'es_',
 'how_',
 'also

In [None]:
sample_string = "Yesterday was my grandmother's birthday."

# Encode the sample string to integers
tokenized_string = tokenizer.encode(sample_string)
print (f'Tokenized string is {tokenized_string}')

# Decode the encoded integers to the string 
original_string = tokenizer.decode(tokenized_string)
print (f'The original string: {original_string}')

assert original_string == sample_string

Tokenized string is [1071, 487, 414, 18, 82, 1481, 1300, 7968, 8, 3534, 606, 7975]
The original string: Yesterday was my grandmother's birthday.


If a word is not in its dictionary, the tokenizer encodes the word by breaking it into subwords.

In [None]:
for ts in tokenized_string:
    print (f'{ts} ----> {tokenizer.decode([ts])}')

1071 ----> Yes
487 ----> ter
414 ----> day 
18 ----> was 
82 ----> my 
1481 ----> grand
1300 ----> mother
7968 ----> '
8 ----> s 
3534 ----> birth
606 ----> day
7975 ----> .


Now, let's combine consecutive elements of this dataset into padded batches using [`tf.data.Dataset.padded_batch()`](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/data/Dataset#padded_batch).

In [None]:
BUFFER_SIZE = 10000
BATCH_SIZE = 64

train_dataset = train_dataset.shuffle(BUFFER_SIZE).padded_batch(BATCH_SIZE, ((None,),()))
test_dataset = test_dataset.padded_batch(BATCH_SIZE,  ((None,),())) 

#### Build and train the model
Let's build a recurrent neural network using `tf.keras.Sequential`. Here, we will use `tf.keras.layers.LSTM` as the recurrent layer for the model. Let's build a LSTM-based model that has

1. Embedding layer as input
2. LSTM layer whose output dimension is 64
3. Dense layer with 64 nodes (use ReLU as activation)
4. Dense layer with 1 nodes (use sigmoid as activation for regression)

In [None]:
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(tokenizer.vocab_size, 64),
    tf.keras.layers.LSTM(64),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid'),
])

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, None, 64)          523840    
_________________________________________________________________
lstm (LSTM)                  (None, 64)                33024     
_________________________________________________________________
dense (Dense)                (None, 64)                4160      
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 65        
Total params: 561,089
Trainable params: 561,089
Non-trainable params: 0
_________________________________________________________________


Compile the model to configure the training process.

In [None]:
model.compile(
    loss='binary_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

Then, train the model using `train_dataset` with validation data as `test_dataset`.

In [None]:
history = model.fit(
    train_dataset, 
    epochs=10,
    validation_data=test_dataset)

Let's evaluate the trained model.

In [None]:
test_loss, test_acc = model.evaluate(test_dataset)

print(f'Test Loss: {test_loss}')
print(f'Test Accuracy: {test_acc}')

In [None]:
text = 'The movie was cool. The animation and the graphics were out of this world. I would recommend this movie.'

predictions = model.predict([
    tokenizer.encode(text)
])
print(predictions)

#### Bidirectional LSTM layer
When you wrap any recurrent neural network layers with `tf.keras.layers.Bidirectional`, it allows the layer to propagate the input forward and backwards through the layer. This helps the RNN to learn long range dependencies.

![Bidirectional](https://github.com/keai-kaist/CS470/blob/main/Lab3/May%2011/images/bidirectional.jpg?raw=true)

#### Why bidirectional LSTM can be effecitve?

Let's say that we want our model to find a word to fill in the blank in the following sentence.

* I like to eat [________] because today is too hot.

In this sentence, for accurate blank prediction, the words after the blank are more important than before the blank. Why? 
Although the number of foods following the word 'eat' is infinite, the foods someone wants to eat because they are hot will probably be limited to cold foods. Therefore, there may be some cases where the information at the back of the input helps the prediction of the front of the input. Asa result, in the RNN-based model, the backward model can also be of great help in inference, and the bidirectional LSTM was designed for this purpose.

In [None]:
model_bidirectional = tf.keras.Sequential([
    tf.keras.layers.Embedding(tokenizer.vocab_size,64),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

In [None]:
model_bidirectional.compile(
    loss='binary_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

In [None]:
history_bidirectional = model_bidirectional.fit(
    train_dataset,
    epochs=10,
    validation_data=test_dataset   
)

In [None]:
test_loss, test_acc = model_bidirectional.evaluate(test_dataset)

print(f'Test Loss: {test_loss}')
print(f'Test Accuracy: {test_acc}')

#### Stack two or more LSTM layers

We know that in general, it is more effective to build a 'deep' layer than to excessively increase the number of nodes in the hidden lyaer to improve the performance of the model in a neural network.

Stacked LSTM can be seen as similar way to increase the complexity of the model so that the LSTM can solve more complex tasks.

Keras recurrent layers have two available modes that are controlled by the `return_sequences` constructor argument:
- Return either the full sequences of successive outputs for each timestep `(batch_size, timesteps, output_features)`
- Return only the last output for each input sequence `(batch_size, output_features)`

To stack two or more LSTM layers, we should set `return_sequences` as `True`.

In [None]:
model_stacked = tf.keras.Sequential([
    tf.keras.layers.Embedding(tokenizer.vocab_size,64),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64,return_sequences=True)),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid') 
])

In [None]:
model_stacked.compile(
    loss='binary_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

In [None]:
history_stacked = model_stacked.fit(
    train_dataset,
    epochs=10,
    validation_data=test_dataset  
)

In [None]:
test_loss, test_acc = model_stacked.evaluate(test_dataset)

print(f'Test Loss: {test_loss}')
print(f'Test Accuracy: {test_acc}')

In [None]:
text = 'The movie was cool. The animation and the graphics were out of this world. I would recommend this movie.'

predictions = model_stacked.predict([
    tokenizer.encode(text)
])
print(predictions)