<a href="https://colab.research.google.com/github/elsa9421/Interactive-IPython-Demos/blob/main/RNN__Application_Text_Generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Application of RNN : to Generate text using character-based RNN

Credit : Code similar to https://www.tensorflow.org/tutorials/text/text_generation 


This Notebook demostrates the use of different Recurrent Neural Networks (RNN) with Keras for Text Generation.

There are three built-in RNN layers in Keras:

* `keras.layers.SimpleRNN` :  a fully-connected RNN where the output from previous timestep is to be fed to next timestep.

* `keras.layers.GRU`: first proposed in [Cho et al., 2014](https://arxiv.org/abs/1406.1078)

* `keras.layers.LSTM`: first proposed in [Hochreiter & Schmidhuber, 1997](https://www.bioinf.jku.at/publications/older/2604.pdf)

<br> The built-in `keras.layers.RNN`, `keras.layers.LSTM`, `keras.layers.GRU` layers enable you to quickly build recurrent models without having to make difficult configuration choices.
<br> More about RNN with Keras [here](https://www.tensorflow.org/guide/keras/rnn)


<br>  The model is trained on small batches of text (100 characters each), and is still able to generate a longer sequence of text with coherent structure.


# Import Libraries

In [None]:
import tensorflow as tf
import numpy as np
import os
import time

# Dataset


## 1. Download Dataset

Project Gutenberg is a library of over 60,000 free eBooks.You will find some of the world's greatest literature here, with focus on older works for which U.S. copyright has expired. Thousands of volunteers digitized and diligently proofread the eBooks, for enjoyment and education.
These texts can be used to create generative models. [Project Gutenberg](https://www.gutenberg.org/)

In this demo we use [Alice’s Adventures in Wonderland by Lewis Carroll](http://www.gutenberg.org/cache/epub/28885/pg28885.txt)




In [None]:
path_to_file = tf.keras.utils.get_file('Alice_In_Wonderland.txt','http://www.gutenberg.org/cache/epub/28885/pg28885.txt')

# Read, then decode for py3 compat. 
# Note that the text on the website  is UTF-8 encoded and 
# therefore we use the `decode` method to convert bytes to unicode code 
# points as we are reading data from a file into strings
#Since Python 3.0, all strings are stored as Unicode in an instance of the str type Encoded strings on the other hand are represented as binary data in the form of instances of the bytes type.
#Conceptually, str refers to text, whereas bytes refers to data. Use str.encode() to go from str to bytes, and bytes.decode() to go from bytes to str
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')

# the first 250 characters in text
print(" Printing the first 250 characters:\n",text[:250])
print('Length of text: {} characters'.format(len(text)))




Downloading data from http://www.gutenberg.org/cache/epub/28885/pg28885.txt
 Printing the first 250 characters:
 ﻿Project Gutenberg's Alice's Adventures in Wonderland, by Lewis Carroll

This eBook is for the use of anyone anywhere at no cost and with
almost no restrictions whatsoever.  You may copy it, give it away or
re-use it under the terms of the Projec
Length of text: 177412 characters


##2. Process the data 

In order to prepare the data for modeling by the neural network. We cannot model the characters directly, instead we must convert the characters to integers. This can be done by :
1. Create a set of all of the distinct characters in the book
2. **Vectorize the Text** :Create two lookup tables: one mapping characters to numbers, and another for numbers to characters.

In [None]:
# The unique characters in the file
vocab = sorted(set(text))
print ('\nNo of unique characters={}'.format(len(vocab)))

# Creating a dictionary for mapping from unique characters to indices
char2idx = {ch:idx for idx, ch in enumerate(vocab)}
idx2char = np.array(vocab)

# Represent each character in the text with the appropriate numeric code 
text_as_int = np.array([char2idx[c] for c in text])
print("Total number of characters in text=",len(text_as_int))

# Show how the first 13 characters from the text are mapped to integers
print ('{} ---- characters mapped to int ---- > {}'.format(repr(text[:13]), text_as_int[:13]))


No of unique characters=89
Total number of characters in text= 177412
'\ufeffProject Gute' ---- characters mapped to int ---- > [88 46 77 74 69 64 62 79  2 37 80 79 64]


## Create training examples and targets

Next divide the text into example sequences. Each input sequence will contain `seq_length` characters from the text. This would correspond to the number of timesteps used for training per batch

For each input sequence, the corresponding targets contain the same length of text, except shifted one character to the right.

So break the text into chunks of `seq_length+1`. For example, say `seq_length` is 4 and our text is "Hello". The input sequence would be "Hell", and the target sequence "ello".

To do this first use the `tf.data.Dataset.from_tensor_slices` function to convert the text vector into a stream of character indices.
i.e creates a dataset with a separate element for each row of the input tensor:
eg:-
<br> `t = tf.constant([[1, 2], [3, 4]])`
<br> `ds = tf.data.Dataset.from_tensor_slices(t)   # [1, 2], [3, 4]`

In [None]:
# The maximum length sentence we want for a single input in characters
seq_length = 100
examples_per_epoch = len(text)//(seq_length+1)
# Create training examples / targets
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)  # Where each char is a separate tensor

# for i in char_dataset.take(5):
#   print(idx2char[i.numpy()])
  


The `batch` method lets us easily convert these individual characters to sequences of the desired size.

And For each sequence, duplicate and shift it to form the input and target text by using the map method to apply a simple function to each batch




In [None]:

sequences = char_dataset.batch(seq_length+1, drop_remainder=True)   #BatchDataset shapes: (101,)--> groups the tensor slices so that each batch has seq_length+1 tensors
#print(len(list(sequences)))  # There are 1756 such batches
# for i in sequences.take(1):
#   print(repr(''.join(idx2char[i])))

def split_input_target(chunk):
  '''
  Returns input_text, target_text for given chunk (sequence)

  '''
  input_text = chunk[:-1]
  target_text = chunk[1:]
  return input_text, target_text


dataset = sequences.map(split_input_target)  #((100,), (100,))  -->177412/101

# Printing the examples and corresponding target data
print("\n\n")
for input,target in dataset.take(1):
   print("The Input sequence=",repr(''.join(idx2char[input])))
   print("The Target sequence=",repr(''.join(idx2char[target])))
   print("\n")


# TO show input and  expected prediction at each time step

for i, (input_idx, target_idx) in enumerate(zip(input[:5], target[:5])):
    print("Time Step {:4d}".format(i))
    print("  Input: {} ({:s})".format(input_idx, repr(idx2char[input_idx])))
    print("  Expected Output: {} ({:s})".format(target_idx, repr(idx2char[target_idx])))





The Input sequence= "\ufeffProject Gutenberg's Alice's Adventures in Wonderland, by Lewis Carroll\r\n\r\nThis eBook is for the use"
The Target sequence= "Project Gutenberg's Alice's Adventures in Wonderland, by Lewis Carroll\r\n\r\nThis eBook is for the use "


Time Step    0
  Input: 88 ('\ufeff')
  Expected Output: 46 ('P')
Time Step    1
  Input: 46 ('P')
  Expected Output: 77 ('r')
Time Step    2
  Input: 77 ('r')
  Expected Output: 74 ('o')
Time Step    3
  Input: 74 ('o')
  Expected Output: 69 ('j')
Time Step    4
  Input: 69 ('j')
  Expected Output: 64 ('e')


## Create training batches

We used `tf.data` to split the text into manageable sequences i.e batches. But before feeding this data into the model, we need to shuffle the data and pack it into batches.

`BATCH_SIZE` : Computations are normaly made in batches. The batch size is the number of training samples in one forward/backward pass. The higher the batch size, the more memory space you will need.
<br/>Eg:
<br/>  If there are `1000` training samples. And batch size is `100`, the algorithm takes the first `100 samples (0 to 100)` from the training dataset and train the network, then the next `100 samples (101 to 200)` and train the network again until the end.



In [None]:
# Batch size
BATCH_SIZE = 64

# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences,
# so it doesn't attempt to shuffle the entire sequence in memory. Instead,
# it maintains a buffer in which it shuffles elements).
# BUFFER_SIZE = 10000
#dataset2 = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

dataset2 = dataset.batch(BATCH_SIZE, drop_remainder=True)   # dataset2 ((64, 100), (64, 100)) --> 27 such sequences (1756/64)

print(len( list(dataset2)))
print(dataset2)

27
<BatchDataset shapes: ((64, 100), (64, 100)), types: (tf.int64, tf.int64)>


## The model

### Define the different models

Use `tf.keras.Sequential` to define the model. For this simple example three layers are used to define our model:

* `tf.keras.layers.Embedding`: The input layer. A trainable lookup table that will map the numbers of each character to a vector with `embedding_dim` dimensions;  Read more about Embedding layer [here](https://keras.io/api/layers/core_layers/embedding/#embedding)
  <br> - input_dim: This is the size of the vocabulary in the text data. 
  <br> - output_dim: It defines the size of the output vectors from this layer  for each word. 
  <br> - input_length: This is the length of input sequences, as you would define for any input layer of a Keras model. For example, if all of your input documents are comprised of 1000 words, this would be 1000
* `tf.keras.layers.GRU`: A type of RNN with size `units=rnn_units` (You can also use a LSTM layer here.) first proposed in [Cho et al., 2014](https://arxiv.org/abs/1406.1078)
   <br> OR
* `tf.keras.layers.SimpleRNN`  :   a fully-connected RNN where the output from previous timestep is to be fed to next timestep.
   <br> OR
* `tf.keras.layers.LSTM` :first proposed in [Hochreiter & Schmidhuber, 1997](https://www.bioinf.jku.at/publications/older/2604.pdf)

* `tf.keras.layers.Dense`: The output layer, with `vocab_size` outputs.


<br> For each character the model looks up the embedding, runs the GRU / LSTM /RNN one timestep with the embedding as input, and applies the dense layer to generate logits predicting the log-likelihood of the next character.

In [None]:
# Length of the vocabulary in chars
vocab_size = len(vocab)

# The embedding dimension
embedding_dim = 256

# Number of RNN units
rnn_units = 1024

def build_model_GRU(vocab_size, embedding_dim, rnn_units, batch_size):
  '''
  1. tf.keras.layers.Embedding :
  Input shape:=(batch_size,None)
  Output Shape:=(batch_size,None,vocab_size)

  2. tf.keras.layers.GRU :
  Input shape:= (batch_size,None,vocab_size)
  Output shape=(batch_size,None,rnn_units)

  3. tf.keras.layers.Dense :
  Input shape=(batch_size,None,rnn_units)
  Output shape=(batch_size, None, vocab_size) 

  '''
  # Keras sequential model is used here since all the layers in the model only have single input and produce single output.
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim,
                              batch_input_shape=[batch_size, None]),  #embedding_dim*vocab_size=trainable parameters
    tf.keras.layers.GRU(rnn_units,                                    # No of feed forward networks g = 3 (GRU has 3 FFNNs)
                        return_sequences=True,                        # Number of trainable parameters= # g × [h(h+i) + h] => 3x[1024(1024+256)+1024]
                        stateful=True,
                        recurrent_initializer='glorot_uniform'),
    tf.keras.layers.Dense(vocab_size)                                #number of parameters in Dense is rnn_units*vocab_size+vocab_size
  ])
  return model

def build_model_SimpleRNN(vocab_size, embedding_dim, rnn_units, batch_size):
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim,
                              batch_input_shape=[batch_size, None]),   #embedding_dim*vocab_size=trainable parameters
    tf.keras.layers.SimpleRNN(rnn_units,return_sequences=True,        #number of parameters in simple RNN layer is rnn_units * (rnn_units + embedding_dim + 1)
                        stateful=True,
                        recurrent_initializer='glorot_uniform'),
    tf.keras.layers.Dense(vocab_size)                                 #number of parameters in Dense is rnn_units*vocab_size+vocab_size
  ])
  return model

def build_model_LSTM(vocab_size, embedding_dim, rnn_units, batch_size):    #[for LSTM number of feed forward networks =4 i.e g=4]
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim,
                              batch_input_shape=[batch_size, None]),      # No. of trainable parameters= sembedding_dim*vocab_size
    tf.keras.layers.LSTM(rnn_units,                                       # g × [h(h+i) + h] => 4x[1024(1024+256)+1024]
                        return_sequences=True,
                        stateful=True,
                        recurrent_initializer='glorot_uniform'),
    tf.keras.layers.Dense(vocab_size)                                     #number of parameters in Dense is rnn_units*vocab_size+vocab_size
  ])
  return model


model_GRU = build_model_GRU(
  vocab_size = len(vocab),
  embedding_dim=embedding_dim,
  rnn_units=rnn_units,
  batch_size=BATCH_SIZE)

model_SimpleRNN = build_model_SimpleRNN(
  vocab_size = len(vocab),
  embedding_dim=embedding_dim,
  rnn_units=rnn_units,
  batch_size=BATCH_SIZE)

model_LSTM = build_model_LSTM(
  vocab_size = len(vocab),
  embedding_dim=embedding_dim,
  rnn_units=rnn_units,
  batch_size=BATCH_SIZE)

In [None]:
# model_GRU.summary()

# model_SimpleRNN.summary()
# model_LSTM.summary()

In [None]:
# Try the model

for input_example_batch, target_example_batch in dataset2.take(1):
  #print(input_example_batch.shape)  #(64,100)
  example_batch_predictions = model_LSTM(input_example_batch)
  print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")
  example_batch_predictions = model_SimpleRNN(input_example_batch)
  print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")
  example_batch_predictions = model_GRU(input_example_batch)
  print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")
#This gives us, at each timestep, a prediction of the next character index:
sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1)   # picks one from most likely sample from every row i.e (100,1) based on logits
sampled_indices = tf.squeeze(sampled_indices,axis=-1).numpy()   #(100,)


#Decode these to see the text predicted by this untrained model:

print("Input: \n", repr("".join(idx2char[input_example_batch[0]])))
print()
print("Next Char Predictions: \n", repr("".join(idx2char[sampled_indices ])))

(64, 100, 89) # (batch_size, sequence_length, vocab_size)
(64, 100, 89) # (batch_size, sequence_length, vocab_size)
(64, 100, 89) # (batch_size, sequence_length, vocab_size)
Input: 
 "\ufeffProject Gutenberg's Alice's Adventures in Wonderland, by Lewis Carroll\r\n\r\nThis eBook is for the use"

Next Char Predictions: 
 'fVh\nuO;$36mz4]JT7]8@EI:MpO*T[5·P&Ov_,vRQRo0Y30Al_9I9Up-Te$:H1m\ufeff10P\ufeffitcMA!%Y7BQ.QqQj#g]nIw#[iD?]MwvDF'


### Optimizer and Loss Function

In [None]:
def loss(labels, logits):
  return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)
  #return tf.keras.losses.categorical_crossentropy(labels, logits, from_logits=False)
example_batch_loss  = loss(target_example_batch, example_batch_predictions)
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")
print("scalar_loss:      ", example_batch_loss.numpy().mean())

model_SimpleRNN.compile(optimizer='adam', loss=loss)
model_GRU.compile(optimizer='adam', loss=loss)
model_LSTM.compile(optimizer='adam', loss=loss)

Prediction shape:  (64, 100, 89)  # (batch_size, sequence_length, vocab_size)
scalar_loss:       4.4923635


### Create checkpoint to save the model

In [None]:
# Directory where the checkpoints will be saved
checkpoint_dir_GRU = './training_checkpoints_GRU'
checkpoint_dir_RNN='./training_checkpoints_RNN'
checkpoint_dir_LSTM='./training_checkpoints_LSTM'
# Name of the checkpoint files
checkpoint_prefix_GRU = os.path.join(checkpoint_dir_GRU, "ckpt_{epoch}")
checkpoint_prefix_RNN = os.path.join('./training_checkpoints_RNN', "ckpt_{epoch}")
checkpoint_prefix_LSTM = os.path.join('./training_checkpoints_LSTM', "ckpt_{epoch}")


## change filepath to select particular model
checkpoint_callback_RNN=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix_RNN,
    save_weights_only=True)

## change filepath to select particular model
checkpoint_callback_GRU=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix_GRU,
    save_weights_only=True)

checkpoint_callback_LSTM=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix_LSTM,
    save_weights_only=True)

## Training
### Train the model 

To keep training time reasonable, use 10 epochs to train the model. In Colab, set the runtime to GPU for faster training.

UNCOMMENT THE FOLLOWING LINES TO TRAIN. Please note, you can also load saved checkpoints from GitHub, see next code block

In [None]:
# EPOCHS=30
# %time history_RNN = model_SimpleRNN.fit(dataset2, epochs=EPOCHS, callbacks=[checkpoint_callback_RNN])
# %time history_GRU = model_GRU.fit(dataset2, epochs=EPOCHS, callbacks=[checkpoint_callback_GRU])
# %time history_LSTM= model_LSTM.fit(dataset2, epochs=EPOCHS, callbacks=[checkpoint_callback_LSTM])

 COMMENT THE FOLLOWING IF YOU HAVE TRAINED THE MODEL

 The checkpoints for the 30th epoch have been save in GitHub, loading the checkpoints

In [None]:
 # UNCOMMENT THE FOLLOWING LINES TO INSTALL git lfs so that large files can be downloaded from GitHub!!!!
 # This is used as Checkpoints are > 25MB
###########

!curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
!sudo apt-get install git-lfs
!git lfs install

##########
!git  clone https://github.com/EECS545-FA2020/RNN_Checkpoints.git Checkpoint

# # load the network weights
checkpoint_dir_RNN = "/content/Checkpoint/SimpleRNN/"

checkpoint_dir_GRU = "/content/Checkpoint/GRU/"
checkpoint_dir_LSTM = "/content/Checkpoint/LSTM/"


Detected operating system as Ubuntu/bionic.
Checking for curl...
Detected curl...
Checking for gpg...
Detected gpg...
Running apt-get update... done.
Installing apt-transport-https... done.
Installing /etc/apt/sources.list.d/github_git-lfs.list...done.
Importing packagecloud gpg key... done.
Running apt-get update... done.

The repository is setup! You can now install packages.
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-440
Use 'sudo apt autoremove' to remove it.
The following NEW packages will be installed:
  git-lfs
0 upgraded, 1 newly installed, 0 to remove and 66 not upgraded.
Need to get 6,877 kB of archives.
After this operation, 16.4 MB of additional disk space will be used.
Get:1 https://packagecloud.io/github/git-lfs/ubuntu bionic/main amd64 git-lfs amd64 2.11.0 [6,877 kB]
Fetched 6,877 kB in 1s (8,497 kB/s)
debconf: unable to ini

### Restore model from checkpoint

To keep this prediction step simple, use a batch size of 1.

Because of the way the RNN state is passed from timestep to timestep, the model only accepts a fixed batch size once built.

To run the model with a different `batch_size`, we need to rebuild the model and restore the weights from the checkpoint.


In [None]:
model_SimpleRNN = build_model_SimpleRNN(vocab_size, embedding_dim, rnn_units, batch_size=1)

model_SimpleRNN.load_weights(tf.train.latest_checkpoint(checkpoint_dir_RNN))

model_SimpleRNN.build(tf.TensorShape([1, None]))


model_GRU = build_model_GRU(vocab_size, embedding_dim, rnn_units, batch_size=1)

model_GRU.load_weights(tf.train.latest_checkpoint(checkpoint_dir_GRU))

model_GRU.build(tf.TensorShape([1, None]))



model_LSTM = build_model_LSTM(vocab_size, embedding_dim, rnn_units, batch_size=1)

model_LSTM.load_weights(tf.train.latest_checkpoint(checkpoint_dir_LSTM))

model_LSTM.build(tf.TensorShape([1, None]))

print("Model_SimpleRNN Summary\n")
model_SimpleRNN.summary()
print("\n\n")


print("Model_GRU Summary\n")
model_GRU.summary()
print("\n\n")

print("Model_LSTM Summary\n")
model_LSTM.summary()
print("\n\n")

Model_SimpleRNN Summary

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_3 (Embedding)      (1, None, 256)            22784     
_________________________________________________________________
simple_rnn_1 (SimpleRNN)     (1, None, 1024)           1311744   
_________________________________________________________________
dense_3 (Dense)              (1, None, 89)             91225     
Total params: 1,425,753
Trainable params: 1,425,753
Non-trainable params: 0
_________________________________________________________________



Model_GRU Summary

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_4 (Embedding)      (1, None, 256)            22784     
_________________________________________________________________
gru_1 (GRU)                  (1, None, 10

## Generate Text

In [None]:
def generate_text(model, start_string,num_generate):
  # Evaluation step (generating text using the learned model)

  # Number of characters to generate
  #num_generate = 100

  # Converting our start string to numbers (vectorizing)
  input_eval = [char2idx[s] for s in start_string]   #[len(start_string)]
  input_eval = tf.expand_dims(input_eval, 0)         #(1,len(start_string))
  #print(input_eval.shape)

  # Empty string to store our results
  text_generated = []

  # Low temperatures results in more predictable text.
  # Higher temperatures results in more surprising text.
  # Experiment to find the best setting.
  temperature = 1.0

  # Here batch size == 1
  if model=='SimpleRNN':
    model=model_SimpleRNN
    print("\n For Simple RNN:\n\n")
  elif model=='GRU':
    model=model_GRU
    print("\n For GRU:\n\n")
  elif model=='LSTM':
    model=model_LSTM
    print("\n For LSTM:\n\n")
  model.reset_states()
  print(start_string,end="")
  for i in range(num_generate):
      predictions = model(input_eval)
      # remove the batch dimension
      predictions = tf.squeeze(predictions, 0)

      # using a categorical distribution to predict the character returned by the model
      predictions = predictions / temperature
      predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

      # We pass the predicted character as the next input to the model
      # along with the previous hidden state
      input_eval = tf.expand_dims([predicted_id], 0)
      print(idx2char[predicted_id],end="")
      
      text_generated.append(idx2char[predicted_id])
  return (start_string + ''.join(text_generated))

In [None]:

types_m=['SimpleRNN','GRU','LSTM']
for m in types_m:
  print("\n","_"*50)
  generate_text(model=m, start_string="Open the window",num_generate=200)
  print("\n","_"*50)


 __________________________________________________

 For Simple RNN:


Open the window, she such hir. IY, addean, soister, so hard every Every,  if you," If I het kild!"

The F ceragis.

[Illuste;"-the King.

Grese, herpent-exs
bleash, so shilk Bill himply
kreapure, so it a ver
 __________________________________________________

 __________________________________________________

 For GRU:


Open the window,"

"Hander dimesty furrieds, "you indeed
to for, then!"

"Stupid eyes very decause he
surpered tone, "she's must be reach nobsiggland by the court,_ as the Gryphon reperied im distribution
rom
 __________________________________________________

 __________________________________________________

 For LSTM:


Open the window, Hatter brimbed and-bringied in
helf unto a pack in or," he began Fromouting fir the rest ard such
it almstards,) the said just not that her varined tire
she spake at up liffering violently
prope
 __________________________________________________
