# Using a character based RNN for text generation

The model is character-based. When training started, the model did not know how to spell an English word, or that words were even a unit of text.

The structure of the output resembles a play—blocks of text generally begin with a speaker name, in all capital letters similar to the dataset.

In [1]:
%tensorflow_version 2.x

TensorFlow 2.x selected.


In [0]:
import tensorflow as tf
import numpy as np
import os
import time

In [3]:
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt


# Reading the data

In [4]:
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
print("Length of text = %s" %(len(text)))

Length of text = 1115394


In [5]:
print(text[:250])

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.



In [6]:
vocab = sorted(set(text))
print("%s unique characters are in the text" %(len(vocab)))

65 unique characters are in the text


# Preprocessing the text

**Vectorize the text**

Before training, we need to map strings to a numerical representation. Create two lookup tables: one mapping characters to numbers, and another for numbers to characters.

In [0]:
# Dictionary mapping characters to ids
char2idx = {}
for i, u in enumerate(vocab):
    char2idx[u] = i
idx2char = np.array(vocab)

In [0]:
text2int = []
for c in text:
    text2int.append(char2idx[c])
text2int = np.array(text2int)

In [0]:
for char,_ in zip(char2idx, range(20)):
    print('  {:4s}: {:3d},'.format(repr(char), char2idx[char]))

# The prediction task

Given a character, or a sequence of characters, what is the most probable next character? This is the task we're training the model to perform. The input to the model will be a sequence of characters, and we train the model to predict the output—the following character at each time step.

Since RNNs maintain an internal state that depends on the previously seen elements, given all the characters computed until this moment, what is the next character?

## Create training examples and targets

Next divide the text into example sequences. Each input sequence will contain seq_length characters from the text.

For each input sequence, the corresponding targets contain the same length of text, except shifted one character to the right.

So break the text into chunks of seq_length+1. For example, say seq_length is 4 and our text is "Hello". The input sequence would be "Hell", and the target sequence "ello".

To do this first use the tf.data.Dataset.from_tensor_slices function to convert the text vector into a stream of character indices.

In [0]:
seq_length = 100
examples_per_epoch = len(text)
## Create tensorflow data
char_dataset = tf.data.Dataset.from_tensor_slices(text2int)

In [12]:
for i in char_dataset.take(5):
    print(i,idx2char[i.numpy()])

tf.Tensor(18, shape=(), dtype=int64) F
tf.Tensor(47, shape=(), dtype=int64) i
tf.Tensor(56, shape=(), dtype=int64) r
tf.Tensor(57, shape=(), dtype=int64) s
tf.Tensor(58, shape=(), dtype=int64) t


In [0]:
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

In [14]:
for item in sequences.take(5):
    print(repr(''.join(idx2char[item.numpy()])))

'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '
'are all resolved rather to die than to famish?\n\nAll:\nResolved. resolved.\n\nFirst Citizen:\nFirst, you k'
"now Caius Marcius is chief enemy to the people.\n\nAll:\nWe know't, we know't.\n\nFirst Citizen:\nLet us ki"
"ll him, and we'll have corn at our own price.\nIs't a verdict?\n\nAll:\nNo more talking on't; let it be d"
'one: away, away!\n\nSecond Citizen:\nOne word, good citizens.\n\nFirst Citizen:\nWe are accounted poor citi'


For each sequence, duplicate and shift it to form the input and target text by using the map method to apply a simple function to each batch:

In [0]:
def split_input_chunk(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

In [0]:
dataset = sequences.map(split_input_chunk)

In [17]:
for input_example, target_example in  dataset.take(1):
  print ('Input data: ', repr(''.join(idx2char[input_example.numpy()])))
  print ('Target data:', repr(''.join(idx2char[target_example.numpy()])))

Input data:  'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou'
Target data: 'irst Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '


Each index of these vectors are processed as one time step. For the input at time step 0, the model receives the index for "F" and trys to predict the index for "i" as the next character. At the next timestep, it does the same thing but the RNN considers the previous step context in addition to the current input character.

In [19]:
for i, (input_idx, target_idx) in enumerate(zip(input_example[:5], target_example[:5])):
    print("Step {:4d}".format(i))
    print("  input: {} ({:s})".format(input_idx, repr(idx2char[input_idx])))
    print("  expected output: {} ({:s})".format(target_idx, repr(idx2char[target_idx])))

Step    0
  input: 18 ('F')
  expected output: 47 ('i')
Step    1
  input: 47 ('i')
  expected output: 56 ('r')
Step    2
  input: 56 ('r')
  expected output: 57 ('s')
Step    3
  input: 57 ('s')
  expected output: 58 ('t')
Step    4
  input: 58 ('t')
  expected output: 1 (' ')


# Create data batches

In [0]:
BATCH_SIZE = 64
BUFFER_SIZE = 10000
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

# Build the model

In [0]:
vocab_size = len(vocab)
embedding_dim = 256
rnn_units = 256

In [0]:
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Embedding(vocab_size, embedding_dim, batch_input_shape=[batch_size, None]))
    model.add(tf.keras.layers.GRU(rnn_units, stateful=True, return_sequences=True, recurrent_dropout=0.2))
    model.add(tf.keras.layers.GRU(rnn_units, stateful=True, return_sequences=True, recurrent_dropout=0.2))
    model.add(tf.keras.layers.GRU(rnn_units, stateful=True, return_sequences=True, recurrent_dropout=0.2))
    model.add(tf.keras.layers.Dense(vocab_size, activation='softmax'))
    return model

In [0]:
model = build_model(vocab_size, embedding_dim, rnn_units, BATCH_SIZE)

In [58]:
model.summary()

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_5 (Embedding)      (64, None, 256)           16640     
_________________________________________________________________
gru_4 (GRU)                  (64, None, 256)           394752    
_________________________________________________________________
gru_5 (GRU)                  (64, None, 256)           394752    
_________________________________________________________________
gru_6 (GRU)                  (64, None, 256)           394752    
_________________________________________________________________
dense_4 (Dense)              (64, None, 65)            16705     
Total params: 1,217,601
Trainable params: 1,217,601
Non-trainable params: 0
_________________________________________________________________


In [59]:
# Dummy run
for input_example_batch, target_example_batch in dataset.take(1):
  example_batch_predictions = model(input_example_batch)
  print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")


(64, 100, 65) # (batch_size, sequence_length, vocab_size)


To get actual predictions from the model we need to sample from the output distribution, to get actual character indices. This distribution is defined by the logits over the character vocabulary.

In [0]:
sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices, axis=-1).numpy()

In [61]:
sampled_indices

array([14, 47, 44, 63, 62, 59, 10, 40,  4, 10, 55, 35, 49, 30,  5, 63, 32,
       59, 53, 47, 49, 15,  5, 43,  8,  9,  2, 12,  4, 40, 29, 13,  5, 49,
       53, 59,  9, 19, 27, 19,  5, 34, 56, 36, 54, 38, 40, 53, 29,  2, 55,
       23, 30, 52, 32,  3, 47, 15, 30, 56, 54, 56, 50, 25, 15, 48, 12, 16,
       61, 28, 42,  8, 59, 16, 52, 12,  1, 57, 39, 26, 42, 46, 10, 31,  7,
       18, 42, 22,  9, 60, 11,  6, 11, 25, 60, 14, 26,  6, 19, 62])

In [62]:
print("Input: \n", repr("".join(idx2char[input_example_batch[0]])))
print()
print("Next Char Predictions: \n", repr("".join(idx2char[sampled_indices ])))

Input: 
 'earer lord?\nThen, dreadful trumpet, sound the general doom!\nFor who is living, if those two are gone'

Next Char Predictions: 
 "Bifyxu:b&:qWkR'yTuoikC'e.3!?&bQA'kou3GOG'VrXpZboQ!qKRnT$iCRrprlMCj?DwPd.uDn? saNdh:S-FdJ3v;,;MvBN,Gx"


# Time to train

At this point the problem can be treated as a standard classification problem. Given the previous RNN state, and the input this time step, predict the class of the next character.

In [0]:
def loss(labels, logits):
    return(tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True))

In [64]:
example_batch_loss  = loss(target_example_batch, example_batch_predictions)
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")
print("scalar_loss:      ", example_batch_loss.numpy().mean())

Prediction shape:  (64, 100, 65)  # (batch_size, sequence_length, vocab_size)
scalar_loss:       4.174388


In [0]:
model.compile(optimizer='adam', loss=loss)

In [0]:
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

In [67]:
EPOCHS=10
history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])

Train for 172 steps
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [68]:
tf.train.latest_checkpoint(checkpoint_dir)

'./training_checkpoints/ckpt_10'

# Running Predictions

In [69]:
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)
model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))

<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x7f14de18db38>

In [0]:
model.build(tf.TensorShape([1, None]))

In [71]:
model.summary()

Model: "sequential_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_6 (Embedding)      (1, None, 256)            16640     
_________________________________________________________________
gru_7 (GRU)                  (1, None, 256)            394752    
_________________________________________________________________
gru_8 (GRU)                  (1, None, 256)            394752    
_________________________________________________________________
gru_9 (GRU)                  (1, None, 256)            394752    
_________________________________________________________________
dense_5 (Dense)              (1, None, 65)             16705     
Total params: 1,217,601
Trainable params: 1,217,601
Non-trainable params: 0
_________________________________________________________________


The prediction loop

The following code block generates the text:

It Starts by choosing a start string, initializing the RNN state and setting the number of characters to generate.

Get the prediction distribution of the next character using the start string and the RNN state.

Then, use a categorical distribution to calculate the index of the predicted character. Use this predicted character as our next input to the model.

The RNN state returned by the model is fed back into the model so that it now has more context, instead than only one character. After predicting the next character, the modified RNN states are again fed back into the model, which is how it learns as it gets more context from the previously predicted characters.


In [0]:
def generate_text(model, start_string):
    # Evaluating from the learned model
    # number of characters to generate
    num_generate = 1000

    # Vectorize the input string
    input_eval = []
    for s in start_string:
        input_eval.append(char2idx[s])
    
    input_eval = tf.expand_dims(input_eval, 0)

    # Store our generated text results
    text_generated = []

    # Temperature parameter. Lower means more relevant text but less innovation.
    # Higher means suprising text.

    temperature = 0.4

    # here batch size = 1
    model.reset_states()

    for i in range(num_generate):
        predictions = model(input_eval)
        # Remove the batch dimenstion
        predictions = tf.squeeze(predictions, 0)
        # using a categorical distribution to predict the character returned by the model
        predictions = predictions / temperature
        predicted_id = tf.random.categorical(predictions, num_samples=1)[-1, 0].numpy()
        # We pass the predicted character as the next input to the model
        # along with the previous hidden state
        input_eval = tf.expand_dims([predicted_id], 0)
        text_generated.append(idx2char[predicted_id])
    

    return (start_string + ''.join(text_generated))


In [54]:
print(generate_text(model, start_string=u"ROMEO: "))

ROMEO: dG:JzQW:c:m:zg?:LP--PetA:&kBcEUbqmXNBv3eCB$'3MJDRhH?LZBksLeqRbvN;FBZjzEI g'oISVNcUdxnKmTz!ocScqeYLoVRDD$pFLtuNMkK m-bTLHCe,Ry-W$3uQ-'lDJjDglBDDaGDxLxVTPz$sao!GEe?$XQpxGBkSqpMsADnxX!&EQpqTqYFN'&VZFCFjK,;l:SO$mShK3V,vdhJXSAAR$r:;.?roFr,DJ?geVBcebz;H! Iis3EoJBKtj.O3RFQJSvMmzJN :mm
To'rgxAHqHkefxnVUUPl:&omQAIcysWpl&u n?,EDgiZVSKFji,ZC?Ye:j&To3cFotorlw
pDu x33d:
wHmq-m!ahd,SBMULGnN-
s!.DhZULBsxse.sr&KOg-vOd,Qr&'u&pUQvjdleEMmnCALar$OhDA M
sZtMinEQAN$&;GDR;-fTV'
!SeqcHMN!veI n,-lSjd&AqfxNnBECvEAQrr-fiki:jPMAmE-Dhs?tW,jQ.$mnMFehsBQoDiSigWQ
.SJKhi
WpUNmIC CydMRxygCCN-cefn.sr$fP;n&uks-t
U3Ja:l,IXSyoqO&&!,tS3zocmFQE3?&NfeTSC!KmNk:!z
mjqjCUuOfX&qdj&pJ;YRLiXXZ TZSRaS?e-r!bef fT$EDy
z-bm&W:HmkXv:Yrb$nwpiO.ODIERXEsjysD'.U3$IoGxWfg,jq!qQoepupDcAHJEdDk!&W;Gha'AeuZ3MRdu&mgnCTpGEIpNjAFHe&QqFrWHSw;i
I $;wgvp;aimQUibqNzfnOQRYl.fhhqe:pybTgrdoif'BK&3P
$3AKTBjcHyHgQS&$IHNAETKkzf
d3vJ?PbhHbxIrbJuR&JN$Q$rWsePAl!3e:NlYWQxmBpJY.r ;bRMBBpDFibUYHTyMLyh'oEYk Wg$u.?ID.h!HVV! JT.LtSiX,ikRMHNjwWD?,X
G sgc?fzgfMn

# Custom Training Loop

In [0]:
model = build_model(
  vocab_size = len(vocab),
  embedding_dim=embedding_dim,
  rnn_units=rnn_units,
  batch_size=BATCH_SIZE)

In [0]:
optimizer = tf.keras.optimizers.Adam()

In [0]:
@tf.function
def train_step(inp, target):
  with tf.GradientTape() as tape:
    predictions = model(inp)
    loss = tf.reduce_mean(
        tf.keras.losses.sparse_categorical_crossentropy(
            target, predictions, from_logits=True))
  grads = tape.gradient(loss, model.trainable_variables)
  optimizer.apply_gradients(zip(grads, model.trainable_variables))

  return loss

In [0]:
epochs = 10
for epoch in range(epochs):
    start = time.time()

    # initializing the hidden state at the start of every epoch
    # initally hidden is None
    hidden = model.reset_states()

    for batch_n, (inp, target) in enumerate(dataset):
        loss = train_step(inp, target)


        if batch_n % 100 == 0:
            template = 'Epoch {} Batch {} Loss {}'
            print(template.format(epoch+1, batch_n, loss))
    
    if (epoch + 1) % 5 == 0:
        model.save_weights(checkpoint_prefix.format(epoch=epoch))

    print ('Epoch {} Loss {:.4f}'.format(epoch+1, loss))
    print ('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

    model.save_weights(checkpoint_prefix.format(epoch=epoch))

Epoch 1 Batch 0 Loss 4.042780876159668
Epoch 1 Batch 100 Loss 4.048915863037109
Epoch 1 Loss 4.0508
Time taken for 1 epoch 63.266441106796265 sec

Epoch 2 Batch 0 Loss 4.050436019897461
Epoch 2 Batch 100 Loss 4.046728610992432
Epoch 2 Loss 4.0469
Time taken for 1 epoch 59.98084044456482 sec

Epoch 3 Batch 0 Loss 4.047991752624512
Epoch 3 Batch 100 Loss 4.047666072845459
