**Week 10 Lab - RNNs for Text Generation**

This lab was modified from an example originally written by "The TensorFlow Authors" in 2019 and distributed under Apache License 2.0. This lab creates a model that can generate text using a character-based RNN. A character-based RNN learns sequences of characters from a corpus of text. Once the model is trained, one can present an input character sequence and the model will generate the character that it predicts would be most likely to appear next. By repeatedly calling the model for new predictions with the previously built sequence, one can create a string of text that "looks like" sentences from the original training text. Note that depending upon the amount of training material and the details of the training procedure, this generated text may look more or less like gibberish.

Also note that, depending on the specific types of layers you are training, you might want to try enabling GPU acceleration to execute this notebook faster. In Colab: *Runtime > Change runtime type > Hardware accelerator > GPU*. Do this before you start to run any code, because it generally restarts the runtime and your local variables will be lost.

## Setup

### Import TensorFlow and other libraries

In [1]:
import tensorflow as tf
from tensorflow.keras.layers.experimental import preprocessing

import numpy as np
import os # To access local files; for saving checkpoints
import time

from urllib import request # We will need this to read from an URL

In [2]:
# TensorFlow 2.0 offers "Eager Execution," a more practical model for
# running tf code. Are we using it here?
tf.executing_eagerly()

True

### Download the Text Data

Load a plain text book from Project Gutenberg:

In [3]:
# Some collected plays by Anton Chekhov
url = "https://www.gutenberg.org/files/7986/7986-0.txt"
response = request.urlopen(url)
text = response.read().decode('utf8')
type(text), len(text)

(str, 411576)

In [4]:
#
# Exercise 10.1: Examine some of the contents of text using slicing
# Write a comment saying what you see. Is there any
# pre-processing that should be done at this stage? :: Removing punctuations?

print(text[:200])




﻿Project Gutenberg’s Plays by Chekhov, Second Series, by Anton Chekhov

This eBook is for the use of anyone anywhere at no cost and with
almost no restrictions whatsoever.  You may copy it, give it


In [5]:
# Additional URLs of plain text Chekhov plays for later use
uncle_vanya = "https://www.gutenberg.org/cache/epub/1756/pg1756.txt"
the_seagull = "https://www.gutenberg.org/files/1754/1754-0.txt"

### Read the data

First, look in the text to see what we have. Note that this is a character-based model, so we are interested to know what different characters are used throughout the whole text. Note that this should make the process largely language independent: Any language where words are formed through sequences of characters should work in training this kind of model.

In [6]:
# The unique text characters in the file
vocab = sorted(set(text))
print(f'{len(vocab)} unique characters')


94 unique characters


In [7]:
#
# Exercise 10.2: Display the list of unique text characters.
# Write a comment saying what you see.

# letters, numbers, punctuation marks, and special characters

print(vocab)

['\n', '\r', ' ', '!', '#', '$', '%', '(', ')', '*', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '=', '?', '@', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '[', ']', '_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '£', 'à', 'è', 'é', 'ê', '‘', '’', '“', '”', '\ufeff']


## Process the text

### Vectorize the text

Before training, we will convert the strings to a numerical representation. The approach in this notebook uses a so-called "ragged tensor." Here's an excerpt from the TensorFlow documentation:

"Ragged tensors are the TensorFlow equivalent of nested variable-length lists. They make it easy to store and process data with non-uniform shapes, including: Variable-length features, such as the set of actors in a movie; Batches of variable-length sequential inputs, such as sentences or video clips; Hierarchical inputs, such as text documents that are subdivided into sections, paragraphs, sentences, and words."



In [8]:
example_texts = ['abcdefg', 'xyz']

chars = tf.strings.unicode_split(example_texts, input_encoding='UTF-8') #this splits each string in input into a sequence of Unicode code points.
chars

<tf.RaggedTensor [[b'a', b'b', b'c', b'd', b'e', b'f', b'g'], [b'x', b'y', b'z']]>

In [9]:
print(chars) # In TF 2.0, the print() function can also show tensors

<tf.RaggedTensor [[b'a', b'b', b'c', b'd', b'e', b'f', b'g'], [b'x', b'y', b'z']]>


We have now created a ragged tensor. Now or later, you might want to read some of the TensorFlow documentation describing what a tensor actually is and what it means to be ragged. There's a nice tutorial with code here: https://www.tensorflow.org/guide/tensor

Note that each element in the structure above begins with b. That means that the function has returned a "byte-coded" version of the string. This is an alternative representation to plain text. Also, the idea of "ragged" is demonstrated by this example. The first nested list has seven items, whereas the second list has three. Ragged means that different numbers of elements are permissible within each element of a list. The term ragged comes from publishing, where a common way of justifying text in a book is called "ragged right" (meaning that different lines are different lengths).

Next, create a preprocessing.StringLookup layer. This layer maps string features to integer indices and integer indices to string features, depending on the value its invert attribute. If it's False (default), it maps string features to integer indices.

In [10]:
# Note that we are passing in the vocab from parsing the whole book in an
# earlier cell. Examine the first argument closely: https://www.tensorflow.org/api_docs/python/tf/keras/layers/StringLookup
ids_from_chars = preprocessing.StringLookup(vocabulary=list(vocab), mask_token=None)

# Shows the resulting class type and that we now have an instance
type(ids_from_chars), isinstance(ids_from_chars, preprocessing.StringLookup)

(keras.src.layers.preprocessing.string_lookup.StringLookup, True)

In [53]:
#
# Exercise 10.3: Display the vocabulary associated with ids_from_chars. Hint: Use
# the pop-up help, the dir() command, or the TensorFlow doc to locate the
# appropriate bound method for revealing the vocabulary.
#

print(ids_from_chars.vocabulary_size())
print(ids_from_chars.get_vocabulary())

95
['[UNK]', '\n', '\r', ' ', '!', '#', '$', '%', '(', ')', '*', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '=', '?', '@', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '[', ']', '_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '£', 'à', 'è', 'é', 'ê', '‘', '’', '“', '”', '\ufeff']


Our instances of the preprocessing.StringLookup can convert from byte coded character tokens to numeric character IDs:

In [12]:
ids = ids_from_chars(chars)


<tf.RaggedTensor [[59, 60, 61, 62, 63, 64, 65], [82, 83, 84]]>

In [13]:
# Curiosity question: Why these codes for the input characters?

In [14]:
# This creates an instance of the "inverter". Note that the invert is now set to True.
chars_from_ids = tf.keras.layers.experimental.preprocessing.StringLookup(
    vocabulary=ids_from_chars.get_vocabulary(), invert=True, mask_token=None)

chars = chars_from_ids(ids) # Now use the inverter to process our tiny example
chars



<tf.RaggedTensor [[b'a', b'b', b'c', b'd', b'e', b'f', b'g'], [b'x', b'y', b'z']]>

Note that we have gotten back the byte codes of the characters from the vectors of IDs, and we are seeing them as a `tf.RaggedTensor` of characters. We can now use another utility, tf.strings.reduce_join, to join the characters back into strings. This is demonstrated here because it will be helpful later when we want to generate new text.

In [15]:
tf.strings.reduce_join(chars, axis=-1).numpy() #https://www.tensorflow.org/api_docs/python/tf/strings/reduce_join

array([b'abcdefg', b'xyz'], dtype=object)

In [16]:
# Let's turn that utility into a function that we can call.
def text_from_ids(ids):
  return tf.strings.reduce_join(chars_from_ids(ids), axis=-1)

In [57]:
#
# Exercise 10.4: Create a new small example of example_texts with at least
# three character strings in the list. Include some upper and lower case and
# some numerals. Run the text through the string preprocessor and then recover the original strings.
#

example_texts = ['Roshan1', 'HelloWorld321', 'nLplaB10']
my_ids = ids_from_chars(example_texts)

recovered_strings = text_from_ids(my_ids)
print(recovered_strings)


tf.Tensor(b'[UNK][UNK][UNK]', shape=(), dtype=string)


### The prediction task

Given a character, or a sequence of characters, what is the most probable next character? This is the task that our RNN model will perform. The input to the model will be sequences of characters from the book we read in at the top of the notebook.


### Create training examples and targets

Next, we will divide the text into example sequences. Each input sequence will contain `seq_length` characters from the text. For each input sequence, the corresponding targets contain the same length of text, except shifted one character to the right. For example, if the text is "Hello" and "seq_length" is 4. Then, the input sequence is "Hell", and the target sequence is "ello". This ensures that we have tons of training examples and that each training example captures context both around the input string and the output string.

To do this first use the `tf.data.Dataset.from_tensor_slices` function to convert the text vector into a stream of character indices.

In [18]:
all_ids = ids_from_chars(tf.strings.unicode_split(text, 'UTF-8')) #this splits each string in input into a sequence of Unicode code points.



In [19]:
#
# Exercise 10.5: Use the get_shape() bound method to reveal the shape of the resulting tensor
#

In [20]:
# This creates a dataset whose elements are slices from the original tensor
ids_dataset = tf.data.Dataset.from_tensor_slices(all_ids) #https://www.tensorflow.org/api_docs/python/tf/data/Dataset Documentation for tf.data.Dataset
type(ids_dataset)

tensorflow.python.data.ops.from_tensor_slices_op._TensorSliceDataset

In [21]:
#
# Exercise 10.6: Comment each line of the following codes to say what it is doing.   https://www.tensorflow.org/api_docs/python/tf/data/Dataset
# Also comment on the output. What does it mean?
# Add a line of code to convert the output back to the characters
temp = ids_dataset.unique()
for element in temp.as_numpy_iterator():
  print(element)

94
45
76
73
68
63
61
78
3
36
79
72
60
65
91
77
70
59
83
32
66
69
80
11
48
62
67
30
2
1
49
31
64
81
71
13
54
74
12
41
25
44
37
47
52
38
33
17
15
20
56
34
5
22
24
23
21
57
50
16
35
10
39
43
40
51
82
18
42
92
93
8
9
75
19
26
84
28
46
27
14
55
4
88
85
53
58
87
90
86
89
7
29
6


In [22]:
# The take() bound method is helpful for examining data in a tensor. It is
# modeled after the take() methods from numpy. It creates a Dataset with at most count elements from this dataset.
for ids in ids_dataset.take(18):
    print(chars_from_ids(ids).numpy().decode('utf-8'))

﻿
P
r
o
j
e
c
t
 
G
u
t
e
n
b
e
r
g


In [59]:
#
# Exercise 10.7: Add the skip() bound method to the expression in the for loop
# in the cell just above. To demonstrate how it works, skip 18 elements and then
# take the next 10 elements. The invocation of skip() in the expression should
# precede the invocation of take(). https://www.tensorflow.org/api_docs/python/tf/data/Dataset#skip
#'

for ids in ids_dataset.skip(18).take(10):
    print(chars_from_ids(ids).numpy().decode('utf-8'))

’
s
 
P
l
a
y
s
 
b


In [24]:
# While an RNN can theoretically handle a continuous stream of data
# here we are considering the data in small groupings whose length
# is controlled by seq_length. Leave it at its current value for now, but
# in the future you might consider making it either shorter or longer in the
# training run.
seq_length = 80 # About one line of standard text
examples_per_epoch = len(text)//(seq_length+1) #the floor division // rounds the result down to the nearest whole number

In [25]:
#
# Exercise 10.8: Explain why seq_length+1 is used in the next line of code, The `batch` method lets you easily convert these individual characters to sequences of the desired size
#
sequences = ids_dataset.batch(seq_length+1, drop_remainder=True) # drop_remainder: if set to be True, the last batch should be dropped in the case it has fewer than batch_size elements
for seq in sequences.take(2):
  print(chars_from_ids(seq))

tf.Tensor(
[b'\xef\xbb\xbf' b'P' b'r' b'o' b'j' b'e' b'c' b't' b' ' b'G' b'u' b't'
 b'e' b'n' b'b' b'e' b'r' b'g' b'\xe2\x80\x99' b's' b' ' b'P' b'l' b'a'
 b'y' b's' b' ' b'b' b'y' b' ' b'C' b'h' b'e' b'k' b'h' b'o' b'v' b','
 b' ' b'S' b'e' b'c' b'o' b'n' b'd' b' ' b'S' b'e' b'r' b'i' b'e' b's'
 b',' b' ' b'b' b'y' b' ' b'A' b'n' b't' b'o' b'n' b' ' b'C' b'h' b'e'
 b'k' b'h' b'o' b'v' b'\r' b'\n' b'\r' b'\n' b'T' b'h' b'i' b's' b' ' b'e'
 b'B'], shape=(81,), dtype=string)
tf.Tensor(
[b'o' b'o' b'k' b' ' b'i' b's' b' ' b'f' b'o' b'r' b' ' b't' b'h' b'e'
 b' ' b'u' b's' b'e' b' ' b'o' b'f' b' ' b'a' b'n' b'y' b'o' b'n' b'e'
 b' ' b'a' b'n' b'y' b'w' b'h' b'e' b'r' b'e' b' ' b'a' b't' b' ' b'n'
 b'o' b' ' b'c' b'o' b's' b't' b' ' b'a' b'n' b'd' b' ' b'w' b'i' b't'
 b'h' b'\r' b'\n' b'a' b'l' b'm' b'o' b's' b't' b' ' b'n' b'o' b' ' b'r'
 b'e' b's' b't' b'r' b'i' b'c' b't' b'i' b'o' b'n' b's'], shape=(81,), dtype=string)


It's easier to see what this is doing if you join the tokens back into strings:

In [26]:
for seq in sequences.take(2):
  print(text_from_ids(seq).numpy())

b'\xef\xbb\xbfProject Gutenberg\xe2\x80\x99s Plays by Chekhov, Second Series, by Anton Chekhov\r\n\r\nThis eB'
b'ook is for the use of anyone anywhere at no cost and with\r\nalmost no restrictions'


For training you'll need a dataset of `(input, label)` pairs. Where `input` and
`label` are sequences. At each time step the input is the current character and the label is the next character. Here's a function that takes a sequence as input, duplicates it, and shifts it to align the input and label for each timestep:

In [27]:
def split_input_target(sequence):
    input_text = sequence[:-1]
    target_text = sequence[1:]
    return input_text, target_text

In [28]:
# Let's test the function - We can do the test using
# a regular character string converted to a list.
split_input_target(list("Tensorflow"))

(['T', 'e', 'n', 's', 'o', 'r', 'f', 'l', 'o'],
 ['e', 'n', 's', 'o', 'r', 'f', 'l', 'o', 'w'])

In [29]:
# This is a curious construction - i.e., passing a function into the
# sequences.map() bound method:
dataset = sequences.map(split_input_target)
#
# Exercise 10.9: Look up the documentation for the map() bound: https://www.tensorflow.org/api_docs/python/tf/data/Dataset#map
# method and then add a comment to explain the line of code in this block.
# the split_input_target function is applied to every element in the sequences dataset, which separates input and target data.

In [30]:
for input_example, target_example in dataset.take(1):
    print("Input :", text_from_ids(input_example).numpy())
    print("Target:", text_from_ids(target_example).numpy())

Input : b'\xef\xbb\xbfProject Gutenberg\xe2\x80\x99s Plays by Chekhov, Second Series, by Anton Chekhov\r\n\r\nThis e'
Target: b'Project Gutenberg\xe2\x80\x99s Plays by Chekhov, Second Series, by Anton Chekhov\r\n\r\nThis eB'


### Create training batches

We have used `tf.data` to split the text into manageable sequences. Before using these data to train the model, we need to shuffle the data and pack it into batches. Remember from class that batching, AKA mini-batching, is a method of processing a group of input-output pairs together. This facilitates parallelization and can prevent overfitting.

In [31]:
# Batch size
BATCH_SIZE = 64

# Buffer size to shuffle the dataset:
# TF data is designed to work with possibly infinite sequences,
# so it doesn't attempt to shuffle the entire sequence in memory. Instead,
# it maintains a buffer in which it shuffles elements.
BUFFER_SIZE = 10000

dataset = (
    dataset
    .shuffle(BUFFER_SIZE)
    .batch(BATCH_SIZE, drop_remainder=True)
    .prefetch(tf.data.experimental.AUTOTUNE))

dataset

<_PrefetchDataset element_spec=(TensorSpec(shape=(64, 80), dtype=tf.int64, name=None), TensorSpec(shape=(64, 80), dtype=tf.int64, name=None))>

In [32]:
#
# Exercise 10.10: Examine the documentation for shuffle, batch, and prefetch.
# Try starting your exploration here: https://www.tensorflow.org/guide/data_performance
# Write a one line comment explaining each concept. Make sure you mention
# what AUTOTUNE is.


# shuffle: Randomly shuffles the elements in a dataset to introduce randomness
# batch: Groups elements in a dataset into batches
# prefetch: Overlaps data preprocessing and model execution
# AUTOTUNE: A special constant that allows TensorFlow to dynamically adjust the level of parallelism during data processing

## Build The Model

This section defines the model as a `keras.Model` subclass (For details see [Making new Layers and Models via subclassing](https://www.tensorflow.org/guide/keras/custom_layers_and_models)).

This model has three layers:

* `tf.keras.layers.Embedding`: The input layer. A trainable lookup table that will map each character-ID to a vector with `embedding_dim` dimensions;
* `tf.keras.layers.SimpleRNN`: A type of RNN with size `units=rnn_units` (You could also use a GRU or LSTM layer here.)
* `tf.keras.layers.Dense`: The output layer, with `vocab_size` outputs. It outputs one logit for each character in the vocabulary. These are the log-likelihood of each character according to the model.

In [33]:
# Length of the vocabulary in chars
vocab_size = len(vocab)

# The embedding dimension
embedding_dim = 256 # 256 is kind of a generic choice that should work well
# for various-sized character vocabularies. There is a danger of overfitting
# when the size of the embedding layer exceeds the size of the vocabulary. This is essentially the dimension of x

# Number of RNN units
rnn_units = 1024 # This is tunable. Remember that each RNN unit has a little
# bit of memory for what came before. Here 1024 provides four nodes for every
# node in the embedding layer. Later, you might want to experiment with half
# as many or twice as many. This is essentially the dimension of h

vocab_size, embedding_dim, rnn_units

(94, 256, 1024)

In [34]:
# This builds a custom class for instantiating the Keras model
#
# Exercise 10.11: Add comments on the appropriate lines of code to
# document each layer of the model. https://www.tensorflow.org/api_docs/python/tf/keras/layers/RNN
#
class MyModel(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, rnn_units):
    super().__init__(self)

    # What's this layer?     Embedding Layer
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)

    # What's this layer? SimpleRNN Layer
    self.rnn = tf.keras.layers.SimpleRNN(rnn_units,
                                   return_sequences=True,
                                   return_state=True)

    # What's this layer? Dense Layer
    self.dense = tf.keras.layers.Dense(vocab_size)

  def call(self, inputs, states=None, return_state=False, training=False):
    x = inputs
    x = self.embedding(x, training=training)
    if states is None:
      states = self.rnn.get_initial_state(x)
    x, states = self.rnn(x, initial_state=states, training=training)
    x = self.dense(x, training=training)

    if return_state:
      return x, states
    else:
      return x

We could have used a `keras.Sequential` model here, as this architecture is quite simple. However, to  generate text later we will need to manage the RNN's internal state. It's simpler to include the state input and output options upfront, than it is to rearrange the model architecture later. For more details see the [Keras RNN guide](https://www.tensorflow.org/guide/keras/rnn#rnn_state_reuse).

In [35]:
# Now instantiate the class defined above.

model = MyModel(
    # Be sure the vocabulary size matches the `StringLookup` layers.
    vocab_size=len(ids_from_chars.get_vocabulary()),
    embedding_dim=embedding_dim,
    rnn_units=rnn_units)

For each character the model looks up the embedding, runs the RNN one timestep with the embedding as input, and applies the dense layer to generate logits predicting the log-likelihood of the next character. A logit can be turned into an odds ratio with exponentiation. Statisticians like to work with logits because they behave linearly.

## Try the model

Now run the model to see that it behaves as expected.

First check the shape of the output:

In [36]:
for input_example_batch, target_example_batch in dataset.take(1):
    example_batch_predictions = model(input_example_batch)
    print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")

(64, 80, 95) # (batch_size, sequence_length, vocab_size)


In the above example the sequence length of the input is `80` but the model can be run on inputs of any length:

In [37]:
model.summary()

Model: "my_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       multiple                  24320     
                                                                 
 simple_rnn (SimpleRNN)      multiple                  1311744   
                                                                 
 dense (Dense)               multiple                  97375     
                                                                 
Total params: 1433439 (5.47 MB)
Trainable params: 1433439 (5.47 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


## Train the model

At this point the problem can be treated as a standard classification problem. Given the previous RNN state, and the input this time step, predict the class of the next character.

### Attach an optimizer, and a loss function

The standard `tf.keras.losses.sparse_categorical_crossentropy` loss function works in this case because it is applied across the last dimension of the predictions.

Because your model returns logits, you need to set the `from_logits` flag.


In [38]:
loss = tf.losses.SparseCategoricalCrossentropy(from_logits=True)

In [39]:
example_batch_loss = loss(target_example_batch, example_batch_predictions)
mean_loss = example_batch_loss.numpy().mean()
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")
print("Mean loss:        ", mean_loss)

Prediction shape:  (64, 80, 95)  # (batch_size, sequence_length, vocab_size)
Mean loss:         4.562402


A newly initialized model shouldn't be too sure of itself, the output logits should all have similar magnitudes. To confirm this you can check that the exponential of the mean loss is approximately equal to the vocabulary size. A much higher loss means the model is sure of its wrong answers, and is badly initialized:

In [40]:
tf.exp(mean_loss).numpy()

95.813324

In [63]:
#
# Exercise 10.12: Add some code that compares the exponentiated mean loss with
# the size of the vocabulary (look in earlier cells for this value). If the
# exp(mean_loss) is more than 10% larger than the vocab size, print a warning.
#

vocab_size = len(vocab)
exp_mean_loss = tf.exp(mean_loss).numpy()

if exp_mean_loss > 1.1* vocab_size:
  print("print a warning")

Configure the training procedure using the `tf.keras.Model.compile` method. Use `tf.keras.optimizers.Adam` with default arguments and the loss function.

In [42]:
model.compile(optimizer='adam', loss=loss)

### Configure checkpoints

Use a `tf.keras.callbacks.ModelCheckpoint` to ensure that checkpoints are saved during training:

In [43]:
# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

#https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ModelCheckpoint
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

### Execute the training

To keep training time reasonable, use 20 epochs to train the model. In Colab, set the runtime to GPU for faster training. Note that with all of the starter defaults in this notebook, each epoch takes about 10 seconds, so only about 3 minutes to train this model.

In [44]:
EPOCHS = 20

In [45]:
history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [46]:
#
# Exercise 10.13: Click on the file folder in the left hand control bar.
# Open up the training_checkpoints folder. How many training checkpoints
# do you see. Add a comment saying why there are that many checkpoints.
# 20 Checkpoint: That is becuase of the 20 epoch


## Generate text

The simplest way to generate text with this model is to run it in a loop, and keep track of the model's internal state as it runs. Each time we call the model we pass in a slice of text and an internal state.

The model returns a prediction for the next character as well as its new state. Pass the prediction and state back in to continue generating text. The class defined below accomplishes one step in this chain of model runs. When the generate_one_step() bound method is called, it makes a single step prediction.  

In [47]:
class OneStep(tf.keras.Model):
  def __init__(self, model, chars_from_ids, ids_from_chars, temperature=1.0):
    super().__init__()
    self.temperature = temperature   #The temperature to use for scaling the logits. https://theailearner.com/tag/temperature-scaling/
    self.model = model
    self.chars_from_ids = chars_from_ids
    self.ids_from_chars = ids_from_chars

    # Create a mask to prevent "[UNK]" from being generated.
    skip_ids = self.ids_from_chars(['[UNK]'])[:, None]
    sparse_mask = tf.SparseTensor(
        # Put a -inf at each bad index.
        values=[-float('inf')]*len(skip_ids),
        indices=skip_ids,
        # Match the shape to the vocabulary
        dense_shape=[len(ids_from_chars.get_vocabulary())])
    self.prediction_mask = tf.sparse.to_dense(sparse_mask)

  @tf.function
  def generate_one_step(self, inputs, states=None):
    # Convert strings to token IDs.
    input_chars = tf.strings.unicode_split(inputs, 'UTF-8')
    input_ids = self.ids_from_chars(input_chars).to_tensor()

    # Run the model.
    # predicted_logits.shape is [batch, char, next_char_logits]
    predicted_logits, states = self.model(inputs=input_ids, states=states,
                                          return_state=True)
    # Only use the last prediction.
    predicted_logits = predicted_logits[:, -1, :]
    predicted_logits = predicted_logits/self.temperature
    # Apply the prediction mask: prevent "[UNK]" from being generated.
    predicted_logits = predicted_logits + self.prediction_mask

    # Sample the output logits to generate token IDs.
    predicted_ids = tf.random.categorical(predicted_logits, num_samples=1)
    predicted_ids = tf.squeeze(predicted_ids, axis=-1)

    # Convert from token ids to characters
    predicted_chars = self.chars_from_ids(predicted_ids)

    # Return the characters and model state.
    return predicted_chars, states

In [48]:
# Now we can instantiate the class. Take note of the arguments we are
# passing in.
one_step_model = OneStep(model, chars_from_ids, ids_from_chars)

In [49]:
#
# Exercise 10.14: Display the type() of each of the agruments passed into
# the class initializer in the previous cell. Explain why each of the
# three arguments is needed to initialize the OneStep class.
#

Run it in a loop to generate some text. Looking at the generated text, you'll see the model knows when to capitalize, make paragraphs and imitates a Chekhov-like  vocabulary. With the small number of training epochs, it has not yet learned to form coherent sentences.

In [50]:
start = time.time()
states = None
next_char = tf.constant(['NATALYA STEPANOVNA.'])
result = [next_char]


for n in range(1000):
  next_char, states = one_step_model.generate_one_step(next_char, states=states)
  result.append(next_char)

result = tf.strings.join(result)
end = time.time()
print(result[0].numpy().decode('utf-8'), '\n\n' + '_'*80)
print('\nRun time:', end - start)


NATALYA STEPANOVNA. Denely? [Tells his opinian _acturely! [Pause] Yes, for everybady fawes, Verya. He’s do sitting the or frobodacians homes a Greating me the wind] Chargh of MARAA]

LUBOV Khar] Midis she dascilied, litter has bey Kistrabiear. Such a bair, and itn’t it a minfivissible love phenines roaning were squaybliste; the way so
mine.]

VARYA. [Apprices to-day. [Coulail GAEVS Putemank and elgey dogning reading] If you haven’t by
or Monoin and copyeng--If home] my wife! . In nothing vight, Bat doesn’t. You gellencluted to treirsed about this
causa; cages quiet him to-morrow. I’ll crysely, lock is sted everything’s love of Correction round in always a must and you can all even doingsible piterly sehmoning a traif it? I want for you are even sharectroan.... Marie...
nothing concest to his wishev, but on the
prace] The what something ears, the stride] Wrate, was a minute.]

Curtain.

SHIPUCHIN. How letty to
man,
and doesn’t bel, feoule of the man, just its-you... I’ve 

The easiest thing you can do to improve the results is to train it for longer (try `EPOCHS = 30`).

You can also experiment with a different start string, try adding another RNN layer to improve the model's accuracy, or adjust the temperature parameter to generate more or less random predictions.

If you want the model to generate text faster the easiest thing you can do is batch the text generation. In the example below the model generates three texts in about the same time it took to generate just one above.

In [51]:
start = time.time()
states = None
next_char = tf.constant(['NATALYA STEPANOVNA.', 'NATALYA STEPANOVNA.', 'NATALYA STEPANOVNA.'])
result = [next_char]

for n in range(1000):
  next_char, states = one_step_model.generate_one_step(next_char, states=states)
  result.append(next_char)

result = tf.strings.join(result)
end = time.time()
print(result, '\n\n' + '_'*80)
print('\nRun time:', end - start)

tf.Tensor(
[b'NATALYA STEPANOVNA. Once is very one manvice for aftet you ask my on jester it.\r\nTentingincelf and without forthail Masha, I knaw, pleep and excuse me, your both?\r\n\r\nSMIRNOV. Nipes, werrithors, and the maid adowing and YAAS\r\nDEECTERS\r\n\r\n\r\nACO\xe2\x80\x99S\r\nANDREY,\r\nscrunh things which it something under little firthy is it, and a blaid\xe2\x80\x99s\r\npersins.... I shall be selves. I\xe2\x80\x99ll have\r\nyou. Yes, Beriv. [In your\r\nshall life as aur younge, Va!]\r\n\r\nCHUBUKOV. [Innibles interestinasing a fillisiin is in bediea, Ever little flower your wife.... I can\xe2\x80\x99t bly fleece, or other like Louss ever\r\npresented] Undrilvisment, why are all diedsicagly more.]\r\n\r\nLUBOV. I was something it, your Lidels bying to distant this tirent falling\r\nexcellency, My year pereeps a brive\r\noal and\r\nmorring usfully-sixcellency,\r\nincluan\r\nit.\r\nIt fieles no TUZENBACH, angry. The\r\npusitingly! gaver.\r\n\r\nOLGA. I can go anyted to-day af

In [52]:
#
# Exercise 10.15: Add a comment describing why the model produces different text
# even though you have provided the same seed three times.
# Becuase of randomness due to the nature of (RNNs)

**Improving the Model**

There are several strategies that might improve the performance of the model. Try them in the following order:

* Reduce the temperature in the initialization of the OneStep class to reduce the randomness in generation of new characters (https://cs.stackexchange.com/questions/79241/what-is-temperature-in-lstm-and-neural-networks-generally)
* Increase the training epochs by 50% or more to get the model loss to be lower
* Increase the number of nodes in the existing SimpleRNN layer to give the model more "intelligience"
* Add an additional layer of RNN after the first layer to improve the model's "memory"

Depending upon the amount of time left in the lab, try one or more of these techniques. For the moment we don't have a way of documenting model quality other than your own read of the generated text to see whether it is creating real words and sensible sentences. But that's a fine criterion for now. Make sure to add comments documenting what you find out.