[View in Colaboratory](https://colab.research.google.com/github/MarkDaoust/docs/blob/faster-RNN/site/en/tutorials/sequences/text_generation.ipynb)

##### Copyright 2018 The TensorFlow Authors.

Licensed under the Apache License, Version 2.0 (the "License").



# Text generation using a RNN with eager execution


<table class="tfo-notebook-buttons" align="left">
<td>
<a target=\"_blank\" href="https://www.tensorflow.org/tutorials/sequences/text_generation"><img src="https://www.tensorflow.org/images/tf_logo_32px.png" />View on TensorFlow.org</a>
</td><td>
<a target="_blank"  href="https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/sequences/text_generation.ipynb">
    <img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>  
</td><td>
<a target="_blank"  href="https://github.com/tensorflow/docs/blob/master/site/en/tutorials/sequences/text_generation.ipynb"><img width=32px src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a></td></table>

This tutorial demonstrates how to generate text using a character-based RNN. We will work with a dataset of Shakespeare's writing from Andrej Karpathy's [The Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/). Given a sequence of characters from this data ("Shakespear"), train a model to predict the next character in the sequence ("e"). Longer sequences of text can be generated by calling the model repeatedly.

Note: Enable GPU acceleration to execute this notebook faster. In Colab: *Runtime > Change runtime type > Hardware acclerator > GPU*. If running locally make sure TensorFlow version >= 1.11.0.

This tutorial includes runnable code implemented using [tf.keras](https://www.tensorflow.org/programmers_guide/keras) and [eager execution](https://www.tensorflow.org/programmers_guide/eager). The following is sample output when this tutorial is run with the default settings:

<pre>
QUEENE:
I had thought thou hadst a Roman; for the oracle,
Thus by All bids the man against the word,
Which are so weak of care, by old care done;
Your children were in your holy love,
And the precipitation through the bleeding throne.

BISHOP OF ELY:
Marry, and will, my lord, to weep in such a one were prettiest;
Yet now I was adopted heir
Of the world's lamentable day,
To watch the next way with his father with his face?

ESCALUS:
The cause why then we are all resolved more sons.

VOLUMNIA:
O, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, it is no sin it should be dead,
And love and pale as any will to that word.

QUEEN ELIZABETH:
But how long have I heard the soul for this world,
And show his hands of life be proved to stand.

PETRUCHIO:
I say he look'd on, if I must be content
To stay him from the fatal of our country's bliss.
His lordship pluck'd from this sentence then for prey,
And then let us twain, being the moon,
were she such a case as fills m
</pre>

While some of the sentences are grammatical, most do not make sense. The model has not learned the meaning of words, but consider:

* The model is character-based. When training started, the model did not know how to spell an English word, or that words were even a unit of text.

* The structure of the output resembles a play—blocks of text generally begin with a speaker name, in all capital letters similar to the dataset.

* As demonstrated below, the model is trained on small batches of text (100 characters each), and is still able to generate a longer sequence of text with coherent structure.

## Setup

### Import TensorFlow and other libraries

In [0]:
import tensorflow as tf
tf.enable_eager_execution()

import numpy as np
import os
import time

### Download the Shakespeare dataset

Change the following line to run this code on your own data.

In [0]:
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

### Read the data

First, let's look in the text.

In [0]:
text = open(path_to_file).read()
# length of text is the number of characters in it
print ('Length of text: {} characters'.format(len(text)))

In [0]:
# Take a look at the first 1000 characters in text
print(text[:1000])

In [0]:
# The unique characters in the file
vocab = sorted(set(text))
print ('{} unique characters'.format(len(vocab)))

## Process the text

### Vectorize the text

Before training, we need to map strings to a numerical representation. Create two lookup tables: one mapping characters to numbers, and another for numbers to characters.

In [0]:
# Creating a mapping from unique characters to indices
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)

text_as_int = np.array([char2idx[c] for c in text])

Now we have an integer representation for each character. Notice that we mapped the character as indexes from 0 to `len(unique)`.

In [0]:
for char,_ in zip(char2idx, range(20)):
    print('{:6s} ---> {:4d}'.format(repr(char), char2idx[char]))

In [0]:
# Show how the first 13 characters from the text are mapped to integers
print ('{} ---- characters mapped to int ---- > {}'.format(text[:13], text_as_int[:13]))

### The prediction task

Given a character, or a sequence of characters, what is the most probable next character? This is the task we're training the model to perform. The input to the model will be a sequence of characters, and we train the model to predict the output—the following character at each time step.

Since RNNs maintain an internal state that depends on the previously seen elements, given all the characters computed until this moment, what is the next character?


### Create training examples and targets

Divide the text into training examples and targets. Each training example will contain `seq_length` characters from the text. The corresponding targets contain the same length of text, except shifted one character to the right. For example, say `seq_length` is 4 and our text is "Hello", create one training example "Hell", and one target "ello".

Break the text into chunks of `seq_length+1`:

In [0]:
# The maximum length sentence we want for a single input in characters
seq_length = 100

# Create training examples / targets
chunks = tf.data.Dataset.from_tensor_slices(text_as_int).batch(seq_length+1, drop_remainder=True)

for item in chunks.take(5):
  print(repr(''.join(idx2char[item.numpy()])))

Next, create the input and target texts from this chunk:

In [0]:
def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

dataset = chunks.map(split_input_target)

Let's print the first 10 values of the first example:

In [0]:
for input_example, target_example in  dataset.take(1):
  print ('Input data: ', repr(''.join(idx2char[input_example.numpy()])))
  print ('Target data:', repr(''.join(idx2char[target_example.numpy()])))

Each index of these vectors are processed as one time step. For the input at time step 0, we receive the character mapped to the number 18 and try to predict the character mapped to the number 47. At time step 1, do the same thing but consider the previous step in addition to the current character.

In [0]:
for i, (input_idx, target_idx) in enumerate(zip(input_example[:5], target_example[:5])):
    print("Step {:4d}".format(i))
    print("  input: {} ({:s})".format(input_idx, repr(idx2char[input_idx])))
    print("  expected output: {} ({:s})".format(target_idx, repr(idx2char[target_idx])))

### Creating batches and shuffling them using tf.data

We use [tf.data](https://www.tensorflow.org/guide/datasets) to chunk the text into sections. But before feeding this data into the model, we need to shuffle the data and pack it into batches.

In [0]:
# Batch size 
BATCH_SIZE = 64

# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences, 
# so it doesn't attempt to shuffle the entire sequence in memory. Instead, 
# it maintains a buffer in which it shuffles elements).
BUFFER_SIZE = 10000

dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

## The Model

### Implement the model

Use the `tf.keras` [model subclassing API](https://www.tensorflow.org/guide/keras) to create the model and change it however we like. There are three layers used to define our model:

* [Embedding](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding) layer: a trainable lookup table that will map the numbers of each character to a high dimensional vector with `embedding_dim` dimensions;
* [GRU](https://www.tensorflow.org/api_docs/python/tf/keras/layers/GRU) layer: a type of RNN with layer size = units. (You can also use a LSTM layer here.)
* [Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense) layer with `vocab_size` cells.

In [0]:
class Model(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, units):
    super(Model, self).__init__()
    self.units = units

    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)

    if tf.test.is_gpu_available():
      self.gru = tf.keras.layers.CuDNNGRU(self.units, 
                                          return_sequences=True, 
                                          recurrent_initializer='glorot_uniform',
                                          stateful=True)
    else:
      self.gru = tf.keras.layers.GRU(self.units, 
                                     return_sequences=True, 
                                     recurrent_activation='sigmoid', 
                                     recurrent_initializer='glorot_uniform', 
                                     stateful=True)

    self.fc = tf.keras.layers.Dense(vocab_size)
        
  def call(self, x):
    embedding = self.embedding(x)
    
    # output at every time step
    # output shape == (batch_size, seq_length, hidden_size) 
    output = self.gru(embedding)
    
    # The dense layer will output predictions for every time_steps(seq_length)
    # output shape after the dense layer == (seq_length * batch_size, vocab_size)
    prediction = self.fc(output)
    
    # states will be used to pass at every step to the model while training
    return prediction

### Instantiate the model, optimizer, and the loss function

In [0]:
# Length of the vocabulary in chars
vocab_size = len(vocab)

# The embedding dimension 
embedding_dim = 256

# Number of RNN units
units = 1024

model = Model(vocab_size, embedding_dim, units)

We'll use [Adam optimizer](https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer) with default arguments and the [softmax cross entropy](https://www.tensorflow.org/api_docs/python/tf/losses/sparse_softmax_cross_entropy) as the loss function. This loss function is important because we're training to predict the next character, and the number of characters is a discrete number (similar to a classification problem).

In [0]:
# Using adam optimizer with default arguments
optimizer = tf.train.AdamOptimizer()

# Using sparse_softmax_cross_entropy so that we don't have to create one-hot vectors
def loss_function(real, preds):
    return tf.losses.sparse_softmax_cross_entropy(labels=real, logits=preds)

### Train the model

Here, use a custom training loop with [GradientTape](https://www.tensorflow.org/api_docs/python/tf/GradientTape). You can learn more about this approach by reading the [eager execution guide](https://www.tensorflow.org/guide/eager).

* First, initialize the hidden state of the model with zeros and shape == (batch_size, number of rnn units). We do this by calling the function defined while creating the model.

* Next, iterate over the dataset (batch by batch) and calculate the *predictions and the hidden states* associated with that input.

* There are a lot of interesting things happening during training:
  * The model gets hidden state (initialized with 0), lets call that `H0` and the first batch of input, lets call that `I0`.
  * The model then returns the predictions `P1` and `H1`.
  * For the next batch of input, the model receives `I1` and `H1`.
  * The interesting thing here is that we pass `H1` to the model with `I1` which is how the model learns. The context learned from batch to batch is contained in the *hidden state*.
  * Continue doing this until the dataset is exhausted, then start a new epoch and repeat the process.

* After calculating the predictions, calculate the *loss* using the loss function defined above. Then calculate the gradients of the loss with respect to the model variables.

* Finally, take a step in that direction with the help of the *optimizer* using the `apply_gradients` function.

Below is a diagram representing the process described above:

![](https://github.com/mari-linhares/docs/blob/patch-1/site/en/tutorials/sequences/images/text_generation_training.png?raw=true)

In [0]:
model.build(tf.TensorShape([BATCH_SIZE, seq_length]))

In [0]:
model.summary()

In [0]:
# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")

For brevity, let's just train for a couple of epochs:

In [0]:
EPOCHS = 5

In [0]:
# Training loop
for epoch in range(EPOCHS):
    start = time.time()
    
    # initializing the hidden state at the start of every epoch
    # initally hidden is None
    hidden = model.reset_states()
    
    for (batch, (inp, target)) in enumerate(dataset):
          with tf.GradientTape() as tape:
              # feeding the hidden state back into the model
              # This is the interesting step
              predictions = model(inp)
              loss = loss_function(target, predictions)
              
          grads = tape.gradient(loss, model.variables)
          optimizer.apply_gradients(zip(grads, model.variables))

          if batch % 100 == 0:
              print ('Epoch {} Batch {} Loss {:.4f}'.format(epoch+1,
                                                            batch,
                                                            loss))
    # saving (checkpoint) the model every 5 epochs
    if (epoch + 1) % 5 == 0:
      model.save_weights(checkpoint_prefix)

    print ('Epoch {} Loss {:.4f}'.format(epoch+1, loss))
    print ('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

In [0]:
model.save_weights(checkpoint_prefix)

### Restore the latest checkpoint

The model only accepts a fixed batch size. To use the same weights and a different model, we need to rebuild the model and restore the weights from the checkpoint.


In [0]:
!ls {checkpoint_dir}

In [0]:
tf.train.latest_checkpoint(checkpoint_dir)

In [0]:
model = Model(vocab_size, embedding_dim, units)

model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))

model.build(tf.TensorShape([1, None]))

### Generate text using our trained model

The following code block generates the text:

* Start by choosing a start string, initializing the hidden state and setting the number of characters to generate.

* Get the predictions using the start string and the hidden state.

* Then, use a multinomial distribution to calculate the index of the predicted character—use this predicted character as our next input to the model.

* The hidden state returned by the model is fed back into the model so that it now has more context, instead than only one word. After predicting the next word, the modified hidden states are again fed back into the model, which is how it learns as it gets more context from the previously predicted words.


![](https://github.com/mari-linhares/docs/blob/patch-1/site/en/tutorials/sequences/images/text_generation_sampling.png?raw=true)

Looking at the generated text, you'll see the model knows when to capitalize, make paragraphs and imitates a Shakespeare-like writing vocabulary. With the small number of training epochs, it has not yet learned to form coherent sentences.

In [0]:
# Evaluation step (generating text using the learned model)

# Number of characters to generate
num_generate = 1000

# You can change the start string to experiment
start_string = 'Q'

# Converting our start string to numbers (vectorizing) 
input_eval = [char2idx[s] for s in start_string]
input_eval = tf.expand_dims(input_eval, 0)

# Empty string to store our results
text_generated = []

# Low temperatures results in more predictable text.
# Higher temperatures results in more surprising text.
# Experiment to find the best setting.
temperature = 1.0

In [0]:
# Evaluation loop.

# Here batch size == 1
model.reset_states()
for i in range(num_generate):
    predictions = model(input_eval)
    # remove the batch dimension
    predictions = tf.squeeze(predictions, 0)

    # using a multinomial distribution to predict the word returned by the model
    predictions = predictions / temperature
    predicted_id = tf.multinomial(predictions, num_samples=1)[-1,0].numpy()
    
    # We pass the predicted word as the next input to the model
    # along with the previous hidden state
    input_eval = tf.expand_dims([predicted_id], 0)
    
    text_generated.append(idx2char[predicted_id])

print (start_string + ''.join(text_generated))

The easiest thing you can do to improve the results it to train it for longer (try `EPOCHS=30`).

You can also experiment with a different start character, or try adding another RNN layer to improve the model's accuracy, or adjusting the temperature parameter to generate more or less random predictions.