From: https://github.com/tensorflow/tensorflow/blob/r1.11/tensorflow/contrib/eager/python/examples/generative_examples/text_generation.ipynb

#### This notebook demonstrates how to generate text using an RNN using tf.keras and eager execution. It uses Shakespeare's text, and predicts on a character-to-character basis

1. Install unidecode

In [1]:
!pip install --upgrade pip
!pip install unidecode

Requirement already up-to-date: pip in /usr/local/lib/python3.6/dist-packages (18.0)


Import tensorflow and enable eager execution

In [2]:
# Import TensorFlow >= 1.10 and enable eager execution
import tensorflow as tf

# Note: Once you enable eager execution, it cannot be disabled. 
tf.enable_eager_execution()

import numpy as np
import os
import re
import random
import unidecode
import time

  return f(*args, **kwds)
  return f(*args, **kwds)


Download the dataset

In [3]:
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

Read the dataset:

In [4]:
text = unidecode.unidecode(open(path_to_file).read())
# length of text is the number of characters in it
print (len(text))

1115394


Quick exploration of dataset

In [5]:
print(text[:150])

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

A


Creating dictionaries to map from characters to their indices and vice-versa, which will be used to vectorize the inputs

In [6]:
# unique contains all the unique characters in the file
unique = sorted(set(text))

# creating a mapping from unique characters to indices
char2idx = {u:i for i, u in enumerate(unique)}
idx2char = {i:u for i, u in enumerate(unique)}

In [31]:
print(char2idx)

{'\n': 0, ' ': 1, '!': 2, '$': 3, '&': 4, "'": 5, ',': 6, '-': 7, '.': 8, '3': 9, ':': 10, ';': 11, '?': 12, 'A': 13, 'B': 14, 'C': 15, 'D': 16, 'E': 17, 'F': 18, 'G': 19, 'H': 20, 'I': 21, 'J': 22, 'K': 23, 'L': 24, 'M': 25, 'N': 26, 'O': 27, 'P': 28, 'Q': 29, 'R': 30, 'S': 31, 'T': 32, 'U': 33, 'V': 34, 'W': 35, 'X': 36, 'Y': 37, 'Z': 38, 'a': 39, 'b': 40, 'c': 41, 'd': 42, 'e': 43, 'f': 44, 'g': 45, 'h': 46, 'i': 47, 'j': 48, 'k': 49, 'l': 50, 'm': 51, 'n': 52, 'o': 53, 'p': 54, 'q': 55, 'r': 56, 's': 57, 't': 58, 'u': 59, 'v': 60, 'w': 61, 'x': 62, 'y': 63, 'z': 64}


In [38]:
idx2char[np.random.randint(65)]

'H'

In [42]:
# setting the maximum length sentence we want for a single input in characters
max_length = 100

# length of the vocabulary in chars
vocab_size = len(unique)

# the embedding dimension 
embedding_dim = 256

# number of RNN (here GRU) units
units = 48 #1024 OOM

# batch size 
BATCH_SIZE = 2 #64 gave OOM error

# buffer size to shuffle our dataset
BUFFER_SIZE = 10000


In [43]:
print(vocab_size)

65


### Creating the input and output tensors 

We create *max_length* chunks of input, where each input vector is all the characters in that chunk apart from the last one. The target vector is all of the characters in the chunk except the first.

eg if text = 'tensorflow' and max_length = 9:

So, the input = 'tensorflo' and output = 'ensorflow'

After creating the vectors, we convert each character into numbers using the char2idx dictionary we created above.

In [44]:
input_text = []
target_text = []

for f in range(0, len(text)-max_length, max_length):
    inps = text[f:f+max_length]
    targ = text[f+1:f+1+max_length]

    input_text.append([char2idx[i] for i in inps])
    target_text.append([char2idx[t] for t in targ])
    
print (np.array(input_text).shape)
print (np.array(target_text).shape)


(11153, 100)
(11153, 100)


### Creating batches and shuffling them using tf.data

In [45]:
dataset = tf.data.Dataset.from_tensor_slices((input_text, target_text)).shuffle(BUFFER_SIZE)
dataset = dataset.batch(BATCH_SIZE, drop_remainder=True)


In [46]:
dataset

<BatchDataset shapes: ((2, 100), (2, 100)), types: (tf.int32, tf.int32)>

### Creating the model

We use the Model Subclassing API which gives us full flexibility to create the model and change it however we like. We use 3 layers to define our model.

- Embedding layer
- GRU layer (you can use an LSTM layer here)
- Fully connected layer

In [47]:
class Model(tf.keras.Model):
    def __init__(self, vocab_size, embedding_dim, units, batch_size):
        super(Model, self).__init__()
        self.units = units
        self.batch_sz = batch_size

        self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)

        if tf.test.is_gpu_available():
            self.gru = tf.keras.layers.CuDNNGRU(self.units, 
                                          return_sequences=True, 
                                          return_state=True, 
                                          recurrent_initializer='glorot_uniform')
#             self.gru = tf.keras.layers.GRU(self.units, 
#                                           return_sequences=True, 
#                                           return_state=True, 
#                                           recurrent_initializer='glorot_uniform')
        else:
            print("GPU not available!")
            raise NotImplementedError('No GPU')
            self.gru = tf.keras.layers.GRU(self.units, 
                                         return_sequences=True, 
                                         return_state=True, 
                                         recurrent_activation='sigmoid', 
                                         recurrent_initializer='glorot_uniform')

        self.fc = tf.keras.layers.Dense(vocab_size)
        
    def call(self, x, hidden):
        x = self.embedding(x)

        # output shape == (batch_size, max_length, hidden_size) 
        # states shape == (batch_size, hidden_size)

        # states variable to preserve the state of the model
        # this will be used to pass at every step to the model while training
        output, states = self.gru(x, initial_state=hidden)
#         output, states = self.lstm(x, initial_state=hidden)


        # reshaping the output so that we can pass it to the Dense layer
        # after reshaping the shape is (batch_size * max_length, hidden_size)
        output = tf.reshape(output, (-1, output.shape[2]))

        # The dense layer will output predictions for every time_steps(max_length)
        # output shape after the dense layer == (max_length * batch_size, vocab_size)
        x = self.fc(output)

        return x, states

Call the model and set the optimizer and the loss function

In [48]:
model = Model(vocab_size, embedding_dim, units, BATCH_SIZE)
optimizer = tf.train.AdamOptimizer()

# using sparse_softmax_cross_entropy so that we don't have to create one-hot vectors
def loss_function(real, preds):
    return tf.losses.sparse_softmax_cross_entropy(labels=real, logits=preds)

In [49]:
checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
checkpoint = tf.train.Checkpoint(optimizer=optimizer,
                                 model=model)



### Train the model
Use a custom training loop with the help of GradientTape()

- Initialize the hidden state of the model with zeros and shape = (batch_size, number of rnn units). We do this by calling the function defined while creating the model.
- Iterate over the dataset(batch by batch) and calculate the predictions and the hidden states associated with that input.
> There are a lot of interesting things happening here.
>
>    - The model gets hidden state(initialized with 0), lets call that H0 and the first batch of input, lets call that I0.
>    - The model then returns the predictions P1 and H1.
>    - For the next batch of input, the model receives I1 and H1.
>    - The interesting thing here is that we pass H1 to the model with I1 which is how the model learns. The context learned from batch to batch is contained in the hidden state.
>    - We continue doing this until the dataset is exhausted and then we start a new epoch and repeat this.
>

- After calculating the predictions, we calculate the loss using the loss function defined above. Then we calculate the gradients of the loss with respect to the model variables(input)
- Finally, we take a step in that direction with the help of the optimizer using the apply_gradients function.


In [50]:
# Training step

EPOCHS = 20
# EPOCHS = 1

for epoch in range(EPOCHS):
    start = time.time()
    
    # initializing the hidden state at the start of every epoch
    hidden = model.reset_states()
    
    for (batch, (inp, target)) in enumerate(dataset):
        with tf.GradientTape() as tape:
              # feeding the hidden state back into the model
              # This is the interesting step
            predictions, hidden = model(inp, hidden)
              
              # reshaping the target because that's how the 
              # loss function expects it
            target = tf.reshape(target, (-1,))
            loss = loss_function(target, predictions)
              
        grads = tape.gradient(loss, model.variables)
        optimizer.apply_gradients(zip(grads, model.variables))

        if batch % 1000 == 0:
            print ('Epoch {} Batch {} Loss {:.4f}'.format(epoch+1,
                                                    batch,
                                                    loss))
    # saving (checkpoint) the model every 5 epochs
    if (epoch + 1) % 5 == 0:
        checkpoint.save(file_prefix = checkpoint_prefix)

    print ('Epoch {} Loss {:.4f}'.format(epoch+1, loss))
    print('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

Epoch 1 Batch 0 Loss 4.1950
Epoch 1 Batch 1000 Loss 2.4477
Epoch 1 Batch 2000 Loss 2.1120
Epoch 1 Batch 3000 Loss 1.8236
Epoch 1 Batch 4000 Loss 1.9768
Epoch 1 Batch 5000 Loss 2.0564
Epoch 1 Loss 1.8947
Time taken for 1 epoch 67.89263153076172 sec

Epoch 2 Batch 0 Loss 1.8408
Epoch 2 Batch 1000 Loss 1.7586
Epoch 2 Batch 2000 Loss 1.9422
Epoch 2 Batch 3000 Loss 1.7976
Epoch 2 Batch 4000 Loss 1.9543
Epoch 2 Batch 5000 Loss 1.9162
Epoch 2 Loss 1.8788
Time taken for 1 epoch 68.19366955757141 sec

Epoch 3 Batch 0 Loss 1.7400
Epoch 3 Batch 1000 Loss 1.9125
Epoch 3 Batch 2000 Loss 1.8218
Epoch 3 Batch 3000 Loss 1.5420
Epoch 3 Batch 4000 Loss 1.7195
Epoch 3 Batch 5000 Loss 1.8223
Epoch 3 Loss 1.7167
Time taken for 1 epoch 67.27204251289368 sec

Epoch 4 Batch 0 Loss 1.7982
Epoch 4 Batch 1000 Loss 1.7003
Epoch 4 Batch 2000 Loss 1.6080
Epoch 4 Batch 3000 Loss 1.5928
Epoch 4 Batch 4000 Loss 1.5555
Epoch 4 Batch 5000 Loss 1.8167
Epoch 4 Loss 1.7226
Time taken for 1 epoch 67.7898895740509 sec

Epoch

In [51]:
# restoring the latest checkpoint in checkpoint_dir
checkpoint.restore(tf.train.latest_checkpoint(checkpoint_dir))

<tensorflow.python.training.checkpointable.util.CheckpointLoadStatus at 0x7fe58c039828>

### Prediction

In [59]:
# Evaluation step(generating text using the model learned)

# number of characters to generate
num_generate = 1000

# You can change the start string to experiment
start_string = 'F'
start_string = idx2char[np.random.randint(65)]

# converting our start string to numbers(vectorizing!) 
input_eval = [char2idx[s] for s in start_string]
input_eval = tf.expand_dims(input_eval, 0)

# empty string to store our results
text_generated = ''

# low temperatures results in more predictable text.
# higher temperatures results in more surprising text
# experiment to find the best setting
temperature = 1.0

# hidden state shape == (batch_size, number of rnn units); here batch size == 1
hidden = [tf.zeros((1, units))]
for i in range(num_generate):
    predictions, hidden = model(input_eval, hidden)

    # using a multinomial distribution to predict the word returned by the model
    predictions = predictions / temperature
    predicted_id = tf.multinomial(predictions, num_samples=1)[0][0].numpy()
    
    # We pass the predicted word as the next input to the model
    # along with the previous hidden state
    input_eval = tf.expand_dims([predicted_id], 0)
    
    text_generated += idx2char[predicted_id]

print (start_string + text_generated)

To: begnother, letter seece
Nay, hegor is the pant relace,
A have fish, busit wit e to Capoll that ot;
And or, they Boodig'ld Banget, stroke.
Ime, art nugs oah.

FRIAR ANNES:
Held iffildy: his some?

HAMONSIO:
Shall the dood and I henose's iline,
But these till be countran:
Monrour Manchoses and the mine cloovor the rememons
My taince for it and spoke to jurd is her sweet
WhiUS
At cleast the plascle,
Fret from forly digntian.
Ainsman of had will ding; I mave prese.

BUCKINGHAM:
My pleagred gors, an Gild with Sleasteds Glost bloy, hope she meet,
She set on Macleming all being him,
Now I gond know not thou dounds of sworn
To post: thapother mean she the wan not is
Pargament he pleages is to must my malmiels.
The are of vick but a crangumineal
To sters, it beciribremy, for my dislemmed.

GROSPE:
And pertany all you to presicite, look all:
And he sneother.

SICINIUS:
Who fellay, the jery against thus for then, I did thee, wa;
As fish them noble thee our laceswe command
Is ten and no reewop