# Text classification with an RNN

This is a text classification that uses recurrent neural network on the IMDB large movie review dataset for sentiment analysis.

_source_: https://www.tensorflow.org/text/tutorials/text_classification_rnn

## Design of the Model
![image.png](https://www.tensorflow.org/text/tutorials/images/bidirectional.png)

In [1]:
import numpy as np

import matplotlib.pyplot as plt

import tensorflow as tf
import tensorflow_datasets as tfds

In [2]:
tfds.disable_progress_bar()

## Helpers functions

In [3]:
def plot_graphs(history, metric):
    plt.plot(history.history[metric])
    plt.plot(history.history['val_' + metric], '')
    plt.xlabel('Epochs')
    plt.ylable(metric)
    plt.legend([metric, 'val_' + metric])

## Setup input pipeline

In [4]:
dataset, info = tfds.load('imdb_reviews', with_info=True, as_supervised=True)
train_dataset, test_dataset = dataset['train'], dataset['test']

2022-03-09 01:01:03.039650: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-03-09 01:01:03.041718: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.


In [5]:
train_dataset.element_spec

(TensorSpec(shape=(), dtype=tf.string, name=None),
 TensorSpec(shape=(), dtype=tf.int64, name=None))

for example, label in train_dataset.take(1):
    print("text: ", example.numpy())
    print("label: ", label.numpy())

### Shuffle the data for train and create batches of these (text, label) pairs

In [6]:
BUFFER_SIZE = 10000
BATCH_SIZE = 64

In [7]:
train_dataset = train_dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
test_dataset = test_dataset.batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)

In [8]:
for example, label in train_dataset.take(1):
    print("texts: ", example.numpy()[:3],)
    print()
    print("labels: ", label.numpy()[:3])

2022-03-09 01:01:03.143980: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)


texts:  [b'One of the worst films I have ever seen. How to define "worst?" I would prefer having both eye balls yanked out and then be forced to tap dance on them than ever view this pitiful dreck again. Somehow, One-Hit Wonder Zwick manages a film that simultaneously offends Elvis fans, Mary Kay saleswomen, Las Vegas, gays, FBI agents and the rest of humanity with any intelligence with a shoddy, sloppy farce so forced it deserves to be forsaken ed. How Elvis Presley Enterprises could allow the rights of actual Elvis songs to be used in a film with a central premise that seems to be "The only good Elvis Presley Imitator is a dead one" is beyond me. The worst part of this mess - and that takes some work - is the mangled script: In 1958, Elvis\' words and songs that he would speak/perform in the 1970\'s are quoted! Worst special effect? That Oscar would go to the moron who decided that Elvis\' grave, potentially the most photographed/recognizable grave in the world, resembles a pyramid w

## Create the text encoder

In [9]:
VOCAB_SIZE = 1000

encoder_layer = tf.keras.layers.TextVectorization(max_tokens=VOCAB_SIZE)
encoder_layer.adapt(train_dataset.map(lambda text, label: text))

In [10]:
vocab = np.array(encoder_layer.get_vocabulary())
vocab[:20]

array(['', '[UNK]', 'the', 'and', 'a', 'of', 'to', 'is', 'in', 'it', 'i',
       'this', 'that', 'br', 'was', 'as', 'for', 'with', 'movie', 'but'],
      dtype='<U14')

In [11]:
encoded_example = encoder_layer(example)[:3].numpy()
encoded_example

array([[ 29,   5,   2, ...,   0,   0,   0],
       [  4,   1, 365, ...,   0,   0,   0],
       [ 10, 208,  11, ...,   0,   0,   0]])

In [12]:
for n in range(3):
    print("Original: ", example[n].numpy())
    print("--------------------------------")
    print("Round-trip: ", " ".join(vocab[encoded_example[n]]))
    print()

Original:  b'One of the worst films I have ever seen. How to define "worst?" I would prefer having both eye balls yanked out and then be forced to tap dance on them than ever view this pitiful dreck again. Somehow, One-Hit Wonder Zwick manages a film that simultaneously offends Elvis fans, Mary Kay saleswomen, Las Vegas, gays, FBI agents and the rest of humanity with any intelligence with a shoddy, sloppy farce so forced it deserves to be forsaken ed. How Elvis Presley Enterprises could allow the rights of actual Elvis songs to be used in a film with a central premise that seems to be "The only good Elvis Presley Imitator is a dead one" is beyond me. The worst part of this mess - and that takes some work - is the mangled script: In 1958, Elvis\' words and songs that he would speak/perform in the 1970\'s are quoted! Worst special effect? That Oscar would go to the moron who decided that Elvis\' grave, potentially the most photographed/recognizable grave in the world, resembles a pyramid

## Create the model

In [13]:
embedding_layer = tf.keras.layers.Embedding(
    input_dim=len(encoder_layer.get_vocabulary()),
    output_dim=64,
    mask_zero=True)

In [16]:
model = tf.keras.Sequential([
    encoder_layer,
    embedding_layer,
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1)
])

In [19]:
print([layer.supports_masking for layer in model.layers])

[False, True, True, True, True]


### Predict on a sample text without padding

In [33]:
sample_text = ('The movie was cool. The animation and the graphics '
               'were out of this world. I would recommend this movie.')

predictions = model.predict(np.array([sample_text]))
print(predictions[0])

[-0.00907584]


### Predict on a sample text with padding

In [36]:
padding = "the " * 2000
predictions = model.predict(np.array([sample_text, padding]))
print(predictions[0])

[-0.00907583]


### Compile the Keras model to configure the training process

In [38]:
model.compile(
    loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
    optimizer=tf.keras.optimizers.Adam(1e-4),
    metrics=['accuracy']
)

## Train the model

In [None]:
history = model.fit(
    train_dataset, 
    epochs=10,
    validation_data=test_dataset,
    validation_steps=30)

Epoch 1/10
     57/Unknown - 155s 3s/step - loss: 0.6930 - accuracy: 0.5033

In [None]:
test_loss, test_acc = model.evaluate(test_dataset)

print("Test Loss:", test_loss)
print("Test Accuracy:", test_acc)