# predicting words 

Given a character, or a sequence of characters, what is the most probable next character? 

Since RNNs maintain an internal state that depends on the previously seen elements, given all the characters computed until this moment, what is the next character?

The input to the model will be a sequence of characters, and we train the model to predict the output—the following character at each time step.


# import libraries

In [0]:
import tensorflow as tf
from tensorflow import keras
print(tf.__version__)

import os
import time
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.preprocessing.text import Tokenizer
import numpy as np

2.2.0-rc2


# get dataset

In [0]:
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

# view data

In [0]:
# Read, then decode for py2 compat.
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# length of text is the number of characters in it
print ('Length of text: {} characters'.format(len(text)))

Length of text: 1115394 characters


In [0]:
# [:250]= 0 to 249 (first 250 characters)
print('first 250 chars: \n{}'.format(text[:250]))

first 250 chars: 
First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.



In [0]:
# get unique characters
vocab = sorted(set(text))
print('unique_chars: {}'.format(len(vocab)))

unique_chars: 65


# preprocess 

### Vectorize the text

Before training, we need to map strings to a numerical representation. Create two lookup tables: one mapping characters to numbers, and another for numbers to characters.

In [0]:
# create a mapping from unique 
char2idx = {u:i for i,u in enumerate(vocab)}
idx2char = np.array(vocab)

text_as_int = np.array([char2idx[c] for c in text])

# Now we have an integer representation for each character. Notice that we mapped the character as indexes from 0 to `len(unique)`.
print('{')
for char,_ in zip(char2idx, range(20)):
    print('  {:4s}: {:3d},'.format(repr(char), char2idx[char]))
print('  ...\n}')


{
  '\n':   0,
  ' ' :   1,
  '!' :   2,
  '$' :   3,
  '&' :   4,
  "'" :   5,
  ',' :   6,
  '-' :   7,
  '.' :   8,
  '3' :   9,
  ':' :  10,
  ';' :  11,
  '?' :  12,
  'A' :  13,
  'B' :  14,
  'C' :  15,
  'D' :  16,
  'E' :  17,
  'F' :  18,
  'G' :  19,
  ...
}


In [0]:
# Show how the first 13 characters from the text are mapped to integers
print ('{} ---- characters mapped to int ---- > {}'.format(repr(text[:13]), text_as_int[:13]))

'First Citizen' ---- characters mapped to int ---- > [18 47 56 57 58  1 15 47 58 47 64 43 52]


# split dataset into training & testing 

Next divide the text into example sequences. Each input sequence will contain `seq_length` characters from the text.

For each input sequence, the corresponding targets contain the same length of text, except shifted one character to the right.

So break the text into chunks of `seq_length+1`.

In [0]:
# The maximum length sentence we want for a single input in characters
seq_length = 100
examples_per_epoch = len(text) // (seq_length+1)

# use the `tf.data.Dataset.from_tensor_slices` function to convert the text vector into a stream of character indices
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

for i in char_dataset.take(5):
  print(idx2char[i.numpy()])

F
i
r
s
t


In [0]:
# The `batch` method lets us easily convert these individual characters to sequences of the desired size.
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

for item in sequences.take(5):
  print(repr(''.join(idx2char[item.numpy()])))

'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '
'are all resolved rather to die than to famish?\n\nAll:\nResolved. resolved.\n\nFirst Citizen:\nFirst, you k'
"now Caius Marcius is chief enemy to the people.\n\nAll:\nWe know't, we know't.\n\nFirst Citizen:\nLet us ki"
"ll him, and we'll have corn at our own price.\nIs't a verdict?\n\nAll:\nNo more talking on't; let it be d"
'one: away, away!\n\nSecond Citizen:\nOne word, good citizens.\n\nFirst Citizen:\nWe are accounted poor citi'


In [0]:
# For each sequence, duplicate and shift it to form the input and target text
# use the `map` method to apply a simple function to each batch

def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

dataset = sequences.map(split_input_target)

for input_example, target_example in  dataset.take(1):
  print ('Input data: ', repr(''.join(idx2char[input_example.numpy()])))
  print ('Target data:', repr(''.join(idx2char[target_example.numpy()])))

Input data:  'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou'
Target data: 'irst Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '


Each index of these vectors are processed as one time step. For the input at time step 0, the model receives the index for "F" and trys to predict the index for "i" as the next character. 

At the next timestep, it does the same thing but:

# the `RNN` considers the previous step context in addition to the current input character.

In [0]:
for i, (input_idx, target_idx) in enumerate(zip(input_example[:5], target_example[:5])):
    print("Step {:4d}".format(i))
    print("  input: {} ({:s})".format(input_idx, repr(idx2char[input_idx])))
    print("  expected output: {} ({:s})".format(target_idx, repr(idx2char[target_idx])))

Step    0
  input: 18 ('F')
  expected output: 47 ('i')
Step    1
  input: 47 ('i')
  expected output: 56 ('r')
Step    2
  input: 56 ('r')
  expected output: 57 ('s')
Step    3
  input: 57 ('s')
  expected output: 58 ('t')
Step    4
  input: 58 ('t')
  expected output: 1 (' ')


In [0]:
# generate training dataset
BATCH_SIZE = 64
BUFFER_SIZE = 10000

# before feeding this data into the model, we need to shuffle the data and pack it into batches.
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

dataset

<BatchDataset shapes: ((64, 100), (64, 100)), types: (tf.int64, tf.int64)>

# setup hyperparameters

In [0]:
vocab_size = len(vocab) # char length
embedding_dim = 256 # output of Embedding input layer
rnn_units = 1024 # num_neurons in LSTM layer


# build model


* `tf.keras.layers.Embedding`: The input layer. A trainable lookup table that will map the numbers of each character to a vector with `embedding_dim` dimensions;
* `tf.keras.layers.GRU`: A type of RNN with size `units=rnn_units` (You can also use a LSTM layer here.)
* `tf.keras.layers.Dense`: The output layer, with `vocab_size` outputs.

In [0]:
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
  model = keras.Sequential([
    keras.layers.Embedding(input_dim=vocab_size, output_dim=embedding_dim, batch_input_shape=[batch_size, None]),
    keras.layers.GRU(units=rnn_units, return_sequences=True, stateful=True, recurrent_initializer='glorot_uniform'),
    keras.layers.Dense(units=vocab_size)
  ])
  return model

# instantiate model
model = build_model(
  vocab_size = len(vocab),
  embedding_dim=embedding_dim,
  rnn_units=rnn_units,
  batch_size=BATCH_SIZE)

model.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_3 (Embedding)      (64, None, 256)           16640     
_________________________________________________________________
gru_3 (GRU)                  (64, None, 1024)          3938304   
_________________________________________________________________
dense_3 (Dense)              (64, None, 65)            66625     
Total params: 4,021,569
Trainable params: 4,021,569
Non-trainable params: 0
_________________________________________________________________


For each character the model looks up the embedding, runs the GRU one timestep with the embedding as input, and applies the dense layer to generate logits predicting the log-likelihood of the next character:

![A drawing of the data passing through the model](https://github.com/tensorflow/docs/blob/master/site/en/tutorials/text/images/text_generation_training.png?raw=1)

# validate model

To get actual predictions from the model we need to sample from the output distribution, to get actual character indices. This distribution is defined by the logits over the character vocabulary.

Note: It is important to _sample_ from this distribution as taking the _argmax_ of the distribution can easily get the model stuck in a loop.


In [0]:
for input_example_batch, target_example_batch in dataset.take(1):
  example_batch_predictions = model(input_example_batch)
  print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")

sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices,axis=-1).numpy()

(64, 100, 65) # (batch_size, sequence_length, vocab_size)


This gives us, at each timestep, a prediction of the next character index:

In [0]:
sampled_indices

array([19, 64,  2, 46, 36, 41, 58, 40, 27, 62, 28, 62, 30, 37, 41, 37, 53,
       27, 28, 25, 38, 21, 50, 17, 34, 26, 49, 16, 61, 35, 24, 24, 33, 14,
       64, 26, 26,  0, 20, 11, 25, 64, 53, 56, 61,  4, 11,  3, 39, 35, 16,
       53, 42, 31, 12, 43,  3,  9, 21, 19, 33, 58, 24, 42, 47, 60, 24, 17,
       10, 36, 10,  4, 42, 59, 55, 40, 16, 49, 13, 59, 39, 56, 34, 35, 23,
       64,  4, 33, 47,  7,  4, 33, 23,  7, 31, 35, 30, 45, 11,  0])

In [0]:
# decode 
print("Input: \n", repr("".join(idx2char[input_example_batch[0]])))
print()
print("Next Char Predictions: \n", repr("".join(idx2char[sampled_indices ])))

Input: 
 "t,\nThat runaway's eyes may wink and Romeo\nLeap to these arms, untalk'd of and unseen.\nLovers can see"

Next Char Predictions: 
 'Gz!hXctbOxPxRYcYoOPMZIlEVNkDwWLLUBzNN\nH;Mzorw&;$aWDodS?e$3IGUtLdivLE:X:&duqbDkAuarVWKz&Ui-&UK-SWRg;\n'


# compile the model

In [0]:
# The standard `tf.keras.losses.sparse_categorical_crossentropy` loss function works in this case
  # it is applied across the last dimension of the predictions.

def loss(labels, logits):
  return keras.losses.sparse_categorical_crossentropy(y_true=labels, y_pred=logits, from_logits=True)

example_batch_loss = loss(target_example_batch, example_batch_predictions)
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")
print("scalar_loss:      ", example_batch_loss.numpy().mean())

model.compile(loss=loss, optimizer='adam')

Prediction shape:  (64, 100, 65)  # (batch_size, sequence_length, vocab_size)
scalar_loss:       4.1736493


# train the model

Given the previous RNN state, and the input this time step, predict the class of the next character.

fit the model to train & learn the optimized weights/relationships

In [0]:
# setup callbacks: checkpoint saving
checkpoint_dir = './training_checkpoints'

# checkpoint file name
checkpoint_prefix = os.path.join(checkpoint_dir, 'chpt_{epoch}')

# Use a `tf.keras.callbacks.ModelCheckpoint` to ensure that checkpoints are saved during training:
callbacks = keras.callbacks.ModelCheckpoint(filepath=checkpoint_prefix, save_weights_only=True)

In [0]:
NUM_EPOCHS = 30

# assign trainined model to a history var for performance querying
history = model.fit(dataset, epochs=NUM_EPOCHS, callbacks=[callbacks])

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


# evaluate model: 

### `generating text using the learned model`

---

Because of the way the RNN state is passed from timestep to timestep, the model only accepts a fixed batch size once built.

To run the model with a different `batch_size`, we need to rebuild the model and restore the weights from the checkpoint.


In [0]:
tf.train.latest_checkpoint(checkpoint_dir)

# To keep this prediction step simple, use a batch size of 1.
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)

model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))
model.build(tf.TensorShape([1, None]))

model.summary()

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_4 (Embedding)      (1, None, 256)            16640     
_________________________________________________________________
gru_4 (GRU)                  (1, None, 1024)           3938304   
_________________________________________________________________
dense_4 (Dense)              (1, None, 65)             66625     
Total params: 4,021,569
Trainable params: 4,021,569
Non-trainable params: 0
_________________________________________________________________


### The prediction loop

The following code block generates the text:

* It Starts by choosing a start string, initializing the RNN state and setting the number of characters to generate.

* Get the prediction distribution of the next character using the start string and the RNN state.

* Then, use a categorical distribution to calculate the index of the predicted character. Use this predicted character as our next input to the model.

* The RNN state returned by the model is fed back into the model so that it now has more context, instead than only one character. After predicting the next character, the modified RNN states are again fed back into the model, which is how it learns as it gets more context from the previously predicted characters.

Looking at the generated text, you'll see the model knows when to capitalize, make paragraphs and imitates a Shakespeare-like writing vocabulary. With the small number of training epochs, it has not yet learned to form coherent sentences.

![To generate text the model's output is fed back to the input](https://github.com/tensorflow/docs/blob/master/site/en/tutorials/text/images/text_generation_sampling.png?raw=1)


In [0]:
def generate_text(model, start_string):
  # num characters to generate 
  num_generate = 1000

  # tokenize/vectorize= transform starting string to numbers
  input_eval = [char2idx[s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0)

  # define default text list
  text_generated = []
  # low=predictable/high=suprise
  temperature = 1.0 

  # batch_size == 1
  model.reset_states()
  for i in range(num_generate):
    predictions = model(input_eval)
    # remove batch dim
    predictions = tf.squeeze(predictions, 0)
    # using a categorical distribution to predict the character returned by the model
    predictions = predictions / temperature
    predicted_id = tf.random.categorical(predictions, num_samples=1)[-1, 0].numpy()

    # We pass the predicted character as the next input to the model
    # along with the previous hidden state
    input_eval = tf.expand_dims([predicted_id], 0)
    text_generated.append(idx2char[predicted_id])
  return (start_string+''.join(text_generated))

In [0]:
print(generate_text(model, start_string=u"ROMEO: "))

ROMEO: I beseech you, they
ANTELIO:
Let me pass alack, and underetter tailors,
Suppose thy neck to vent our sails that would say,
I would have seen them datch, and relies to Rome,
Who hath destroy'd of?

POMPEY:
Pompey, you shall bear my sense.
If further when I have said 'tis charity.

FROTH:
I think, if it be so, then, at lent
Allays, follows, keep home
Do make thy sea-swain, wilt thou, Northumberland,
When ne'er shall be the king's King Baptists speak for him!
I shall rent me to the under 'ma.
Your first order, I will temper it.

AUFIDIUS:
And pale were virtues ng in mine own.
RY VI
Come, cords, coming, with my soldiers,
You that have turn you:
Forsooth you where you are? Which, as
I had an Edward to learn, and for line own intent
That he shall serve for him: go day: go; pardon, an't like a light.

BAPTISTA:
Let mine condialemous to his praise.

PRINCE EDWARD:
Return an ourself tells me, for even thus we parted withal, I know had thou, the beat be
The traitor Angelo, a poor deputy--

# advanced training

We will use `tf.GradientTape` to track the gradients. You can learn more about this approach by reading the [eager execution guide](https://www.tensorflow.org/guide/eager).

The procedure works as follows:

* First, initialize the RNN state. We do this by calling the `tf.keras.Model.reset_states` method.

* Next, iterate over the dataset (batch by batch) and calculate the *predictions* associated with each.

* Open a `tf.GradientTape`, and calculate the predictions and loss in that context.

* Calculate the gradients of the loss with respect to the model variables using the `tf.GradientTape.grads` method.

* Finally, take a step downwards by using the optimizer's `tf.train.Optimizer.apply_gradients` method.


In [0]:
model = build_model(
  vocab_size = len(vocab),
  embedding_dim=embedding_dim,
  rnn_units=rnn_units,
  batch_size=BATCH_SIZE)

optimizer = tf.keras.optimizers.Adam()

In [0]:
@tf.function
def train_step(inp, target):
  with tf.GradientTape() as tape:
    predictions = model(inp)
    loss = tf.reduce_mean(
        tf.keras.losses.sparse_categorical_crossentropy(
            target, predictions, from_logits=True))
  grads = tape.gradient(loss, model.trainable_variables)
  optimizer.apply_gradients(zip(grads, model.trainable_variables))

  return loss

In [0]:
# Training step
EPOCHS = 10

for epoch in range(EPOCHS):
  start = time.time()

  # initializing the hidden state at the start of every epoch
  # initally hidden is None
  hidden = model.reset_states()

  for (batch_n, (inp, target)) in enumerate(dataset):
    loss = train_step(inp, target)

    if batch_n % 100 == 0:
      template = 'Epoch {} Batch {} Loss {}'
      print(template.format(epoch+1, batch_n, loss))

  # saving (checkpoint) the model every 5 epochs
  if (epoch + 1) % 5 == 0:
    model.save_weights(checkpoint_prefix.format(epoch=epoch))

  print ('Epoch {} Loss {:.4f}'.format(epoch+1, loss))
  print ('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

model.save_weights(checkpoint_prefix.format(epoch=epoch))

Epoch 1 Batch 0 Loss 4.173681259155273
Epoch 1 Batch 100 Loss 2.3300321102142334
Epoch 1 Loss 2.1038
Time taken for 1 epoch 26.867459535598755 sec

Epoch 2 Batch 0 Loss 2.1020798683166504
Epoch 2 Batch 100 Loss 1.894805908203125
Epoch 2 Loss 1.7698
Time taken for 1 epoch 25.631630182266235 sec

Epoch 3 Batch 0 Loss 1.7947996854782104
Epoch 3 Batch 100 Loss 1.6558419466018677
Epoch 3 Loss 1.6081
Time taken for 1 epoch 24.796597242355347 sec

Epoch 4 Batch 0 Loss 1.5326881408691406
Epoch 4 Batch 100 Loss 1.508499264717102
Epoch 4 Loss 1.4955
Time taken for 1 epoch 24.957431316375732 sec

Epoch 5 Batch 0 Loss 1.4661144018173218
Epoch 5 Batch 100 Loss 1.4418679475784302
Epoch 5 Loss 1.4683
Time taken for 1 epoch 25.24788498878479 sec

Epoch 6 Batch 0 Loss 1.3600000143051147
Epoch 6 Batch 100 Loss 1.3908978700637817
Epoch 6 Loss 1.4183
Time taken for 1 epoch 25.38266658782959 sec

Epoch 7 Batch 0 Loss 1.3258235454559326
Epoch 7 Batch 100 Loss 1.3791674375534058
Epoch 7 Loss 1.3506
Time take

# clean up

terminate memory kernel and free up memory resources

In [0]:
import os, signal

os.kill(os.getpid(), signal.SIGKILL)