This tutorial guides you on how to generate text using a character-based RNN model. This model helps to generate a character one by one and a longer sequence can execute it repeatedly. **However, such a model can't understand the meanings of word, how to spell them as well, and what a role plays in a sentence.**

On the contrary, this model is simple to capture the details on the character levels, such as capitalization, the conversation format, etc.

In [0]:
!pip install -q tf-nightly

In [2]:
import tensorflow as tf
import numpy as np
import os
import time

print("Tensorflow Version: {}".format(tf.__version__))
print("GPU {} available.".format("is" if tf.config.experimental.list_physical_devices("GPU") else "not"))

Tensorflow Version: 2.2.0-dev20200212
GPU is available.


# Data Preprocessing

Here we are going to use the dataset of Shakespeare's writing from [The Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/).

In [3]:
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 
                                       'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')
path_to_file

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt


'/root/.keras/datasets/shakespeare.txt'

## Read the Data

In [4]:
text = open(path_to_file, 'rb').read().decode('UTF-8')
print("Length of text: {} characters".format(len(text)))

Length of text: 1115394 characters


Let's take a look at first few characters.

In [5]:
print(text[:200])

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you


Let's calculate the unique characters.

In [6]:
vocab = sorted(set(text))
print("Unique characters: {}".format(len(vocab)))

Unique characters: 65


## Vectorize the text

Before training, we have to convert the strings to the numerical representation.

In [0]:
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)

# transform the text with integers
text_as_int = [char2idx[c] for c in text]

In [8]:
for char, _ in zip(char2idx, range(20)):
  print('{:4s}: {:3d}'.format(repr(char), char2idx[char]))

'\n':   0
' ' :   1
'!' :   2
'$' :   3
'&' :   4
"'" :   5
',' :   6
'-' :   7
'.' :   8
'3' :   9
':' :  10
';' :  11
'?' :  12
'A' :  13
'B' :  14
'C' :  15
'D' :  16
'E' :  17
'F' :  18
'G' :  19


Show the first few characters with their mapping integers.

In [9]:
print("{} <---> {}".format(repr(text[:20]), text_as_int[:20]))

'First Citizen:\nBefor' <---> [18, 47, 56, 57, 58, 1, 15, 47, 58, 47, 64, 43, 52, 10, 0, 14, 43, 44, 53, 56]


## Prediction

The purpose of the prediction is to predict the character given an input one. We are going to train a model for the above purpose. The input of the model is a sequence of characters and the prediction is the next character of the same sequence. The attributes of RNNs are memorizing the previously seen data, a stateful concept. The prediction would be what the next character is, given all the characters until this moment.

### Training Examples and Targets

The basic operation idea is to break down the same text into the inputs and outputs. The output is shifted forward one character of the input. For example, if the `sequence_len` is 4 and the sentence is `hello`, the input sequence becomes `hell` and the output (or target) one becomes `ello`.

In [0]:
seq_length = 100
examples_per_epoch = len(text) // (seq_length + 1)

In [0]:
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

In [12]:
for c in char_dataset.take(10):
  print(idx2char[c.numpy()], end=" ")

F i r s t   C i t i 

Here we use `batch()` to split the characters to sequences of the desired size.

In [0]:
sequences = char_dataset.batch(seq_length + 1, drop_remainder=True)

In [14]:
for seq in sequences.take(5):
  print(repr(''.join(idx2char[seq.numpy()])))

'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '
'are all resolved rather to die than to famish?\n\nAll:\nResolved. resolved.\n\nFirst Citizen:\nFirst, you k'
"now Caius Marcius is chief enemy to the people.\n\nAll:\nWe know't, we know't.\n\nFirst Citizen:\nLet us ki"
"ll him, and we'll have corn at our own price.\nIs't a verdict?\n\nAll:\nNo more talking on't; let it be d"
'one: away, away!\n\nSecond Citizen:\nOne word, good citizens.\n\nFirst Citizen:\nWe are accounted poor citi'


For each sequence, we need to split it into the training and the target sequence.

In [0]:
def split_input_target(data):
  inputs = data[:-1]
  target = data[1:]
  return inputs, target

dataset = sequences.map(split_input_target)

In [16]:
for input_example, target_example in dataset.take(1):
  print(repr(''.join(idx2char[input_example.numpy()])))
  print(repr(''.join(idx2char[target_example.numpy()])))

'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou'
'irst Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '


Each index of these sequence vectors represents one time step. For the input character at time 0 is `F` and the target to be predicted is `i`. The RNN model considers the previous step status in addition to the current input character.

In [19]:
for i, (input_idx, target_idx) in enumerate(zip(input_example[:5], target_example[:5])):
  print("Step {}: input {:s}, target {:s}".format(
    i, repr(idx2char[input_idx.numpy()]), repr(idx2char[target_idx.numpy()])))

Step 0: input 'F', target 'i'
Step 1: input 'i', target 'r'
Step 2: input 'r', target 's'
Step 3: input 's', target 't'
Step 4: input 't', target ' '


### Creating Training Batches

In [20]:
BATCH_SIZE = 64
BUFFER_SIZE = 10000

dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
dataset

<BatchDataset shapes: ((64, 100), (64, 100)), types: (tf.int32, tf.int32)>

# Build the Model

In [0]:
vocab_size = len(vocab)
embedding_dim = 256
rnn_units = 1024  # for gru cell

In [0]:
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
  def _model(inputs):
    embed = tf.keras.layers.Embedding(input_dim=vocab_size, output_dim=embedding_dim)(inputs)  # [batch_size, None, 256]
    # set `return_sequences=True` for predicting the next character
    gru = tf.keras.layers.GRU(units=rnn_units, return_sequences=True, stateful=True,
                              recurrent_initializer='glorot_uniform')(embed)    # [None, None, 1024]
    outputs = tf.keras.layers.Dense(units=vocan_size)(gru)
    return outputs

  # if the stateful is True, it requires the batch size
  inputs = tf.keras.Input(shape=(None, ), batch_size=batch_size)
  generator = _model(inputs)
  return tf.keras.Model(inputs, generator)

In [59]:
model = build_model(vocab_size=vocab_size, embedding_dim=embedding_dim, 
                    rnn_units=rnn_units, batch_size=BATCH_SIZE)
model.summary()

Model: "model_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_16 (InputLayer)        [(64, None)]              0         
_________________________________________________________________
embedding_11 (Embedding)     (64, None, 256)           16640     
_________________________________________________________________
gru_11 (GRU)                 (64, None, 1024)          3938304   
_________________________________________________________________
dense_4 (Dense)              (64, None, 65)            66625     
Total params: 4,021,569
Trainable params: 4,021,569
Non-trainable params: 0
_________________________________________________________________


The above model can be translated as the below model architecture.

![](https://www.tensorflow.org/tutorials/text/images/text_generation_training.png)

Refer to Tensorflow.org (2020).

# Try the Model

Let's try the model first to check the result.

In [60]:
for input_example_batch, target_example_batch in dataset.take(1):
  prediction_example_batch = model(input_example_batch)
  print(prediction_example_batch.shape)  # [batch_size, sequence_length, vocab_size]

(64, 100, 65)


To get the actual predictions from the model, we need to sample from the output distribution to get the actual indices.

In [147]:
sampled_indices = tf.random.categorical(logits=prediction_example_batch[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices, axis=-1).numpy()
sampled_indices

array([57, 14, 22, 12,  1, 39, 37, 43, 17, 56,  0, 13, 50, 38, 63, 16, 41,
       63, 30, 30,  1, 39, 25, 14, 57, 56, 16, 41,  9, 30, 38, 29, 26, 25,
        8, 15, 51, 31, 44, 41, 24, 58, 60,  1, 19, 45, 20, 21,  9, 40, 11,
       49, 13, 38, 35, 31, 16,  4, 34, 27, 47, 32, 38, 26, 28, 18, 33,  7,
       11, 10, 60, 36, 64, 43, 42, 35, 28, 28, 58, 54, 40, 64, 57, 38, 28,
       53, 55,  4, 33, 62,  9, 38, 34,  4,  6, 10, 13,  4, 16, 64])

In [68]:
print("Input: {}".format(repr(''.join(idx2char[input_example_batch[0].numpy()]))))
print("Output: {}".format(repr(''.join(idx2char[sampled_indices]))))

Input: ";\nThat did but show thee, of a fool, inconstant\nAnd damnable ingrateful: nor was't much,\nThou woulds"
Output: "JXO3DseismNmL!,zmJArGWoHNvNIaA-DzIrV-iLNvQMU$wqmXp\npG'GHveLmRoBcCAGZTaWwcHLoZtEcKReBrwrLMO&zP'reFU;V"


# Train the Model

Before training, we have to decide how to train the model. To this problem, we can regard it as a classification problem. Given the previous RNN state, and the input this timestamp, predict the class of the next character.

## Attach an Optimizer, and Define a Loss Function

The `tf.keras.losses.sparse_categorical_crossentropy` API works in this case because it is applied across the last dimension of the predictions. For example, the target shape is `[batch_size, sequence_length]` and the prediction shape is `[batch_size, sequence_length, vocab_size]`. This API helps you calculate the loss alongside the last dimension of the predictions.

In [0]:
def loss(labels, logits):
  return tf.reduce_mean(
    tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True))

In [96]:
example_batch_loss = loss(target_example_batch, prediction_example_batch)
print("Prediction Shape: {}".format(prediction_example_batch.shape))
print("Scalar Loss: {}".format(example_batch_loss.numpy().mean()))

Prediction Shape: (64, 100, 65)
Scalar Loss: 4.173928737640381


In [97]:
print("Target shapes: {}".format(target_example_batch.shape))
print("Prediction shape: {}".format(prediction_example_batch.shape))

Target shapes: (64, 100)
Prediction shape: (64, 100, 65)


In [0]:
def accuracy(labels, logits):
  logits = tf.cast(tf.argmax(logits, axis=-1), tf.int32)
  labels = tf.cast(labels, tf.int32)
  return tf.reduce_mean(tf.keras.metrics.binary_accuracy(labels, logits))

In [102]:
accuracy(target_example_batch, prediction_example_batch)

<tf.Tensor: shape=(), dtype=float32, numpy=0.15562502>

In [0]:
model.compile(loss=loss, optimizer=tf.keras.optimizers.Adam(), metrics=[accuracy])

## Configure the Checkpoints

In [0]:
ckpt_dir = "./train_ckpt"
ckpt_prefix = os.path.join(ckpt_dir, 'ckpt_{epoch}')

ckpt_callbacks = tf.keras.callbacks.ModelCheckpoint(
  filepath=ckpt_prefix, save_weights_only=True
)

## Start Training

In [109]:
history = model.fit(dataset, epochs=10, callbacks=[ckpt_callbacks])

Train for 172 steps
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


# Generate Text

## Restore the Latest Checkpoint

To keep the prediction simple, use a batch size of 1. Because of the way the RNN state passed from timestamp to timestamp (stateful=True), the model only accepts the same batch size once it was built.

In [154]:
latest_ckpt = tf.train.latest_checkpoint(ckpt_dir)
latest_ckpt

'./train_ckpt/ckpt_10'

In [0]:
loaded = build_model(vocab_size, embedding_dim, rnn_units, 1)

In [156]:
loaded.load_weights(latest_ckpt)

<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x7f338f269f98>

In [157]:
# the batch size was not reset
loaded.summary()

Model: "model_10"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_22 (InputLayer)        [(1, None)]               0         
_________________________________________________________________
embedding_17 (Embedding)     (1, None, 256)            16640     
_________________________________________________________________
gru_17 (GRU)                 (1, None, 1024)           3938304   
_________________________________________________________________
dense_10 (Dense)             (1, None, 65)             66625     
Total params: 4,021,569
Trainable params: 4,021,569
Non-trainable params: 0
_________________________________________________________________


## The Prediction Loop

![](https://www.tensorflow.org/tutorials/text/images/text_generation_sampling.png)

Refer to Tensorflow.org (2020).

In [0]:
def generate_text(model, start_string):
  # the number of characters to generate
  num_generate = 1000

  input_eval = [char2idx[s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, axis=0)

  # for the generated characters
  text_generated = []

  # low temperature results in more predictable characters
  # high temperature results in more surprising characters
  # this value requires experimenting
  temperature = 1.0

  # now let's reset state for the first character
  model.reset_states()

  for i in range(num_generate):
    predictions = model(input_eval)

    # remove the batch dimension
    predictions = tf.squeeze(predictions, axis=0)

    # using a categorical distribution to predict the character
    predictions = predictions / temperature
    # [-1, 0]: the output of the final character, select the first character
    predicted_id = tf.random.categorical(predictions, num_samples=1)[-1, 0].numpy()
    text_generated.append(idx2char[predicted_id])

    input_eval = tf.expand_dims([predicted_id], axis=0)

  return start_string + ''.join(text_generated)

In [169]:
print(generate_text(loaded, start_string=u"ROMEO: "))

ROMEO: you shall not follow
Our ancamest and turns how men here,
My good condament slilys that that were on you;
And till our converture dear chid and with a Clewere
They restoued between your day.
Forbear him; but, suchard take soon denied,
Which, being alloke of Grey, if I
be t, and not penury
What should prove suit in his mild, a saved
A rament impeach our sudsent on my father's
will'd mother indeed, hath sound our spirit,
Dre my tongue QuQUEEN ELIZABETH:
And be proservey; and, let him be fear,
I'll try us: what he doth curst
And sads the banish'd Henry's langul wings.
Go to, a way o'erwholess ribbons than a frantifle.

YORK
I most not receive and clor to Rome
To fly o' the vigions and lords about this
oars have put on conscarced sensible in your daughter's pleasing in
thy father than the world, thou shalt temper it,
And figrer
Than you have wealy with your title.
How will they come with God's two better nobly fruit, they arr
remedy.

YORK:
Why, thank'dy fault!
I have forged in thy 

You can train the model with more epochs, stack RNN layers to the model, or change the temperature value to check the model performance.

# Customized Training

These advanced operations help you in controlling the training step and help stabilize the model's open-loop output.

In [0]:
adv_model = build_model(vocab_size, embedding_dim, rnn_units, BATCH_SIZE)

In [0]:
optimizer = tf.keras.optimizers.Adam()

In [0]:
@tf.function
def train_step(inputs, targets):
  with tf.GradientTape() as tape:
    predictions = adv_model(inputs)
    loss = tf.reduce_mean(
      tf.keras.losses.sparse_categorical_crossentropy(targets, predictions, from_logits=True)
    )
  grads = tape.gradient(loss, adv_model.trainable_variables)
  optimizer.apply_gradients(zip(grads, adv_model.trainable_variables))
  return loss

In [0]:
def train(epochs=10):
  for epoch in range(epochs):
    # initializing the hidden state at the beginning of every epoch
    # initial hidden state is None
    hidden = adv_model.reset_states()

    for (batch_n, (inputs, targets)) in enumerate(dataset):
      loss = train_step(inputs, targets)

      if batch_n % 100 == 0:
        template = "Epoch {} Batch {} Loss {}"
        print(template.format(epoch, batch_n, loss))
      
    # save the checkpoints
    if (epoch + 1) % 5 == 0:
      adv_model.save_weights(ckpt_prefix.format(epoch=epoch))

    print("Epoch {} Loss {:.4f}".format(epoch + 1, loss))

  adv_model.save_weights(ckpt_prefix.format(epoch=epoch))

In [176]:
train(epochs=10)

Epoch 0 Batch 0 Loss 2.1382858753204346
Epoch 0 Batch 100 Loss 1.9485670328140259
Epoch 1 Loss 1.7835
Epoch 1 Batch 0 Loss 1.8019925355911255
Epoch 1 Batch 100 Loss 1.7318795919418335
Epoch 2 Loss 1.6036
Epoch 2 Batch 0 Loss 1.5783309936523438
Epoch 2 Batch 100 Loss 1.5621082782745361
Epoch 3 Loss 1.5282
Epoch 3 Batch 0 Loss 1.4577150344848633
Epoch 3 Batch 100 Loss 1.4812841415405273
Epoch 4 Loss 1.4588
Epoch 4 Batch 0 Loss 1.416590929031372
Epoch 4 Batch 100 Loss 1.4205050468444824
Epoch 5 Loss 1.3977
Epoch 5 Batch 0 Loss 1.3192777633666992
Epoch 5 Batch 100 Loss 1.352297306060791
Epoch 6 Loss 1.3392
Epoch 6 Batch 0 Loss 1.2877389192581177
Epoch 6 Batch 100 Loss 1.341284155845642
Epoch 7 Loss 1.3229
Epoch 7 Batch 0 Loss 1.2206844091415405
Epoch 7 Batch 100 Loss 1.3187882900238037
Epoch 8 Loss 1.2878
Epoch 8 Batch 0 Loss 1.2013981342315674
Epoch 8 Batch 100 Loss 1.2553431987762451
Epoch 9 Loss 1.2984
Epoch 9 Batch 0 Loss 1.2264318466186523
Epoch 9 Batch 100 Loss 1.2362143993377686
Epo