# NLP with RNNs and Attention

## Using Character RNNs

Character RNNs or char-RNNs are RNNs that try to predict the next character or word in a sentence allowing them to generate text

### Example of Generating Text

#### Data Prep

In [1]:
import numpy as np
import tensorflow as tf

2025-04-28 17:16:27.900687: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-04-28 17:16:28.076025: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1745878588.137224    6157 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1745878588.156235    6157 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-04-28 17:16:28.316953: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instr

In [2]:
shakespeare_url = 'https://homl.info/shakespeare'
filepath = tf.keras.utils.get_file('shakespeare.txt', shakespeare_url)
with open(filepath) as f:
    shakespeare_text = f.read()

In [3]:
print(shakespeare_text[:80])

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.


In [4]:
## Encode text
text_vec_layer = tf.keras.layers.TextVectorization(split='character', standardize='lower')
text_vec_layer.adapt([shakespeare_text])
encoded = text_vec_layer([shakespeare_text][0])

I0000 00:00:1745878590.746484    6157 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9656 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:01:00.0, compute capability: 8.6


In [5]:
encoded -= 2  ### Drop tokens 0 (pad) and 1 (unknown) which we will not use
n_tokens = text_vec_layer.vocabulary_size() - 2 ### Number of distinct chars 
dataset_size = len(encoded) ### Total number of chars

In [6]:
### Convert to sequences so the model can guess the next character
def to_dataset(sequence, length, shuffle=False, seed=None, batch_size=32):
    ds = tf.data.Dataset.from_tensor_slices(sequence)
    ds = ds.window(length + 1, shift=1, drop_remainder=True)
    ds = ds.flat_map(lambda window_ds: window_ds.batch(length + 1))
    if shuffle:
        ds = ds.shuffle(buffer_size=100_000, seed=seed)
    ds = ds.batch(batch_size)
    return ds.map(lambda window: (window[:, :-1], window[:, 1:])).prefetch(1)

In [7]:
length = 100
tf.random.set_seed(42)

train_set = to_dataset(encoded[:1_000_000], length=length, shuffle=True,
                      seed=42)
valid_set = to_dataset(encoded[1_000_000:1_060_000], length=length)
test_set = to_dataset(encoded[1_060_000:], length=length)

#### Building Char-RNN Model

In [8]:
# model = tf.keras.Sequential([
#     tf.keras.layers.Embedding(input_dim=n_tokens, output_dim=16),
#     tf.keras.layers.GRU(128, return_sequences=True),
#     tf.keras.layers.Dense(n_tokens, activation='softmax')
# ])
# model.compile(loss='sparse_categorical_crossentropy', optimizer='nadam',
#              metrics=['accuracy'])
# model_ckpt = tf.keras.callbacks.ModelCheckpoint(
#     'my_shakespeare_model.keras', monitor='val_accuracy', save_best_only=True
# )
# history = model.fit(train_set, validation_data=valid_set, epochs=10,
#                    callbacks=[model_ckpt])

In [9]:
model = tf.keras.models.load_model('my_shakespeare_model.keras')

In [10]:
shakespeare_model = tf.keras.Sequential([
    text_vec_layer,
    tf.keras.layers.Lambda(lambda X: X - 2), ### NO pad or <UNK> tokens
    model
])

In [11]:
# Test Model
test_input = tf.constant(["To be or not to b"])
y_proba = shakespeare_model.predict(test_input)[0, -1]
y_pred = tf.argmax(y_proba) ## Choose the most probable character ID
text_vec_layer.get_vocabulary()[y_pred + 2]

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 588ms/step


I0000 00:00:1745878592.950709    6224 cuda_dnn.cc:529] Loaded cuDNN version 90300


'e'

### Generating Fake SS Text

In [12]:
log_probas = tf.math.log([[0.5, 0.4, 0.1]])
tf.random.set_seed(42)
tf.random.categorical(log_probas, num_samples=8)

<tf.Tensor: shape=(1, 8), dtype=int64, numpy=array([[0, 0, 1, 1, 1, 0, 0, 0]])>

In [13]:
def next_char(text, temperature=1):
    text = tf.constant([text])
    y_proba = shakespeare_model.predict(text, verbose=0)[0, -1:]
    rescaled_logits = tf.math.log(y_proba) / temperature
    char_id = tf.random.categorical(rescaled_logits, num_samples=1)[0,0]
    return text_vec_layer.get_vocabulary()[char_id + 2]

In [14]:
def extend_text(text, n_chars=50, temperature=1):
    for _ in range(n_chars):
        text += next_char(text, temperature)
    return text

In [15]:
print(extend_text('To be or not to be', temperature=0.01))

To be or not to be the duke?

provost:
i have some service of the du


In [16]:
print(extend_text('To be or not to be', temperature=1))

To be or not to begun broice
and term parce of her jove to duke:
yep


In [17]:
print(extend_text('To be or not to be', temperature=100))

To be or not to beevicm-vilv!?$ez?gmjz :3?ljb'va;!td&
i.ur3l'-j!3emq


### Stateful RNNs

Stateful RNNs are RNNs that preserve the final state after processing a training batch. 

In [18]:
def to_dataset_for_stateful_rnn(sequence, length):
    ds = tf.data.Dataset.from_tensor_slices(sequence)
    ds = ds.window(length + 1, shift=length, drop_remainder=True)
    ds = ds.flat_map(lambda window: window.batch(length + 1)).batch(1)
    return ds.map(lambda window: (window[:, :-1], window[:, 1:])).prefetch(1)

stateful_train_set = to_dataset_for_stateful_rnn(encoded[:1_000_000], length)
stateful_valid_set = to_dataset_for_stateful_rnn(encoded[1_000_000:1_060_000], length)
stateful_test_set = to_dataset_for_stateful_rnn(encoded[1_060_000:], length)

In [25]:
# model = tf.keras.Sequential([
#     tf.keras.layers.Embedding(input_dim=n_tokens, output_dim=16,
#                               input_shape=[None]),
#     tf.keras.layers.GRU(128, return_sequences=True, stateful=True),
#     tf.keras.layers.Dense(n_tokens, activation="softmax")
# ])

## Sentiment Analysis

In [26]:
import tensorflow_datasets as tfds

raw_train_set, raw_valid_set, raw_test_set = tfds.load(
    name='imdb_reviews',
    split=['train[:90%]', 'train[90%:]', 'test'],
    as_supervised=True
)
tf.random.set_seed(1)
train_set = raw_train_set.shuffle(5000, seed=42).batch(32).prefetch(1)
valid_set = raw_valid_set.batch(32).prefetch(1)
test_set = raw_test_set.batch(32).prefetch(1)

In [27]:
for review, label in raw_train_set.take(4):
    print(review.numpy().decode('utf-8'))
    print("Label: ", label.numpy())

This was an absolutely terrible movie. Don't be lured in by Christopher Walken or Michael Ironside. Both are great actors, but this must simply be their worst role in history. Even their great acting could not redeem this movie's ridiculous storyline. This movie is an early nineties US propaganda piece. The most pathetic scenes were those when the Columbian rebels were making their cases for revolutions. Maria Conchita Alonso appeared phony, and her pseudo-love affair with Walken was nothing but a pathetic emotional plug in a movie that was devoid of any real meaning. I am disappointed that there are movies like this, ruining actor's like Christopher Walken's good name. I could barely sit through it.
Label:  0
I have been known to fall asleep during films, but this is usually due to a combination of things including, really tired, being warm and comfortable on the sette and having just eaten a lot. However on this occasion I fell asleep because the film was rubbish. The plot developmen

2025-04-28 17:49:51.150146: I tensorflow/core/kernels/data/tf_record_dataset_op.cc:376] The default buffer size is 262144, which is overridden by the user specified `buffer_size` of 8388608
2025-04-28 17:49:51.158863: W tensorflow/core/kernels/data/cache_dataset_ops.cc:914] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2025-04-28 17:49:51.159225: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


In [28]:
vocab_size = 1000
text_vec_layer = tf.keras.layers.TextVectorization(max_tokens=vocab_size)
text_vec_layer.adapt(train_set.map(lambda reviews, labels: reviews))

2025-04-28 17:55:11.750766: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


In [29]:
embed_size = 128
model = tf.keras.Sequential([
    text_vec_layer,
    tf.keras.layers.Embedding(vocab_size, embed_size),
    tf.keras.layers.GRU(128),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy', optimizer='nadam', metrics=['accuracy'])
history = model.fit(train_set, validation_data=valid_set, epochs=2)

Epoch 1/2
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 24ms/step - accuracy: 0.5041 - loss: 0.6936 - val_accuracy: 0.5016 - val_loss: 0.6929
Epoch 2/2
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 23ms/step - accuracy: 0.5084 - loss: 0.6929 - val_accuracy: 0.5000 - val_loss: 0.6962


The reason the model doesn't perform well is because the reviews all have different lengths and some are very short when comparied to the longest one. To solve this we need to use masking

### Masking