<a href="https://colab.research.google.com/github/surfaceowl/google.colab_machinelearning/blob/master/RNN_for_text_generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install unidecode

Collecting unidecode
[?25l  Downloading https://files.pythonhosted.org/packages/31/39/53096f9217b057cb049fe872b7fc7ce799a1a89b76cf917d9639e7a558b5/Unidecode-1.0.23-py2.py3-none-any.whl (237kB)
[K    100% |████████████████████████████████| 245kB 4.9MB/s 
[?25hInstalling collected packages: unidecode
Successfully installed unidecode-1.0.23


In [0]:
# Import TensorFlow >= 1.10 and enable eager execution
import tensorflow as tf

# Note: Once you enable eager execution, it cannot be disabled. 
tf.enable_eager_execution()

import numpy as np
import os
import re
import random
import unidecode
import time

In [3]:
# path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

# alternate file - grimms fairy tails
path_to_file = tf.keras.utils.get_file('2591-0.txt', 'http://www.gutenberg.org/files/2591/2591-0.txt')

Downloading data from http://www.gutenberg.org/files/2591/2591-0.txt


In [4]:
text = unidecode.unidecode(open(path_to_file).read())
# length of text is the number of characters in it
print (len(text))

540240


In [0]:
# unique contains all the unique characters in the file
unique = sorted(set(text))

# creating a mapping from unique characters to indices
char2idx = {u:i for i, u in enumerate(unique)}
idx2char = {i:u for i, u in enumerate(unique)}

In [0]:
# setting the maximum length sentence we want for a single input in characters
max_length = 100

# length of the vocabulary in chars
vocab_size = len(unique)

# the embedding dimension 
embedding_dim = 256

# number of RNN (here GRU) units
units = 1024

# batch size 
BATCH_SIZE = 64

# buffer size to shuffle our dataset
BUFFER_SIZE = 10000

In [7]:
input_text = []
target_text = []

for f in range(0, len(text)-max_length, max_length):
    inps = text[f:f+max_length]
    targ = text[f+1:f+1+max_length]

    input_text.append([char2idx[i] for i in inps])
    target_text.append([char2idx[t] for t in targ])
    
print (np.array(input_text).shape)
print (np.array(target_text).shape)

(5402, 100)
(5402, 100)


In [0]:
dataset = tf.data.Dataset.from_tensor_slices((input_text, target_text)).shuffle(BUFFER_SIZE)
dataset = dataset.batch(BATCH_SIZE, drop_remainder=True)

In [0]:
class Model(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, units, batch_size):
    super(Model, self).__init__()
    self.units = units
    self.batch_sz = batch_size

    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)

    if tf.test.is_gpu_available():
      self.gru = tf.keras.layers.CuDNNGRU(self.units, 
                                          return_sequences=True, 
                                          return_state=True, 
                                          recurrent_initializer='glorot_uniform')
    else:
      self.gru = tf.keras.layers.GRU(self.units, 
                                     return_sequences=True, 
                                     return_state=True, 
                                     recurrent_activation='sigmoid', 
                                     recurrent_initializer='glorot_uniform')

    self.fc = tf.keras.layers.Dense(vocab_size)
        
  def call(self, x, hidden):
    x = self.embedding(x)

    # output shape == (batch_size, max_length, hidden_size) 
    # states shape == (batch_size, hidden_size)

    # states variable to preserve the state of the model
    # this will be used to pass at every step to the model while training
    output, states = self.gru(x, initial_state=hidden)


    # reshaping the output so that we can pass it to the Dense layer
    # after reshaping the shape is (batch_size * max_length, hidden_size)
    output = tf.reshape(output, (-1, output.shape[2]))

    # The dense layer will output predictions for every time_steps(max_length)
    # output shape after the dense layer == (max_length * batch_size, vocab_size)
    x = self.fc(output)

    return x, states

In [0]:
model = Model(vocab_size, embedding_dim, units, BATCH_SIZE)

In [0]:
optimizer = tf.train.AdamOptimizer()

# using sparse_softmax_cross_entropy so that we don't have to create one-hot vectors
def loss_function(real, preds):
    return tf.losses.sparse_softmax_cross_entropy(labels=real, logits=preds)

In [0]:
checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
checkpoint = tf.train.Checkpoint(optimizer=optimizer,
                                 model=model)

Here we will use a custom training loop with the help of GradientTape()

We initialize the hidden state of the model with zeros and shape == (batch_size, number of rnn units). We do this by calling the function defined while creating the model.

Next, we iterate over the dataset(batch by batch) and calculate the predictions and the hidden states associated with that input.

There are a lot of interesting things happening here.

The model gets hidden state(initialized with 0), lets call that H0 and the first batch of input, lets call that I0.
The model then returns the predictions P1 and H1.
For the next batch of input, the model receives I1 and H1.
The interesting thing here is that we pass H1 to the model with I1 which is how the model learns. The context learned from batch to batch is contained in the hidden state.
We continue doing this until the dataset is exhausted and then we start a new epoch and repeat this.
After calculating the predictions, we calculate the loss using the loss function defined above. Then we calculate the gradients of the loss with respect to the model variables(input)

Finally, we take a step in that direction with the help of the optimizer using the apply_gradients function.

Note:- If you are running this notebook in Colab which has a Tesla K80 GPU it takes about 23 seconds per epoch.

In [13]:
# Training step

EPOCHS = 20

for epoch in range(EPOCHS):
    start = time.time()
    
    # initializing the hidden state at the start of every epoch
    hidden = model.reset_states()
    
    for (batch, (inp, target)) in enumerate(dataset):
          with tf.GradientTape() as tape:
              # feeding the hidden state back into the model
              # This is the interesting step
              predictions, hidden = model(inp, hidden)
              
              # reshaping the target because that's how the 
              # loss function expects it
              target = tf.reshape(target, (-1,))
              loss = loss_function(target, predictions)
              
          grads = tape.gradient(loss, model.variables)
          optimizer.apply_gradients(zip(grads, model.variables))

          if batch % 100 == 0:
              print ('Epoch {} Batch {} Loss {:.4f}'.format(epoch+1,
                                                            batch,
                                                            loss))
    # saving (checkpoint) the model every 5 epochs
    if (epoch + 1) % 5 == 0:
      checkpoint.save(file_prefix = checkpoint_prefix)

    print ('Epoch {} Loss {:.4f}'.format(epoch+1, loss))
    print('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

Epoch 1 Batch 0 Loss 4.4301
Epoch 1 Loss 2.3504
Time taken for 1 epoch 13.794904708862305 sec

Epoch 2 Batch 0 Loss 2.3696
Epoch 2 Loss 2.0150
Time taken for 1 epoch 11.403647899627686 sec

Epoch 3 Batch 0 Loss 2.0679
Epoch 3 Loss 1.8547
Time taken for 1 epoch 11.41964054107666 sec

Epoch 4 Batch 0 Loss 1.8692
Epoch 4 Loss 1.6768
Time taken for 1 epoch 11.44180679321289 sec

Epoch 5 Batch 0 Loss 1.6367
Epoch 5 Loss 1.4849
Time taken for 1 epoch 11.583993911743164 sec

Epoch 6 Batch 0 Loss 1.5318
Epoch 6 Loss 1.4849
Time taken for 1 epoch 11.481489896774292 sec

Epoch 7 Batch 0 Loss 1.4408
Epoch 7 Loss 1.3584
Time taken for 1 epoch 11.455058336257935 sec

Epoch 8 Batch 0 Loss 1.3290
Epoch 8 Loss 1.3010
Time taken for 1 epoch 11.458365201950073 sec

Epoch 9 Batch 0 Loss 1.2572
Epoch 9 Loss 1.2576
Time taken for 1 epoch 11.467390298843384 sec

Epoch 10 Batch 0 Loss 1.2363
Epoch 10 Loss 1.1804
Time taken for 1 epoch 11.533094644546509 sec

Epoch 11 Batch 0 Loss 1.2028
Epoch 11 Loss 1.1916


In [14]:
# restoring the latest checkpoint in checkpoint_dir
checkpoint.restore(tf.train.latest_checkpoint(checkpoint_dir))

<tensorflow.python.training.checkpointable.util.CheckpointLoadStatus at 0x7ff05c988978>

The below code block is used to generated the text

We start by choosing a start string and initializing the hidden state and setting the number of characters we want to generate.

We get predictions using the start_string and the hidden state

Then we use argmax to calculate the index of the predicted word. We use this predicted word as our next input to the model

The hidden state returned by the model is fed back into the model so that it now has more context rather than just one word. After we predict the next word, the modified hidden states are again fed back into the model, which is how it learns as it gets more context from the previously predicted words.

If you see the predictions, the model knows when to capitalize, make paragraphs and the text follows a shakespeare style of writing which is pretty awesome!

In [19]:
# Evaluation step(generating text using the model learned)

# number of characters to generate
num_generate = 200

# You can change the start string to experiment
start_string = 'Wolf'
# converting our start string to numbers(vectorizing!) 
input_eval = [char2idx[s] for s in start_string]
input_eval = tf.expand_dims(input_eval, 0)

# empty string to store our results
text_generated = ''

# hidden state shape == (batch_size, number of rnn units); here batch size == 1
hidden = [tf.zeros((1, units))]
for i in range(num_generate):
    predictions, hidden = model(input_eval, hidden)

    # using argmax to predict the word returned by the model
    predicted_id = tf.argmax(predictions[-1]).numpy()
    
    # We pass the predicted word as the next input to the model
    # along with the previous hidden state
    input_eval = tf.expand_dims([predicted_id], 0)
    
    text_generated += idx2char[predicted_id]

print (start_string + text_generated)

Wolf!' said the soldier; 'but I will not be a fine thing for your dearest child, and she said to him: 'You may come out of the window and said: 'I have been drinking a side pate at the spindle of the room
