# Automated Text Generation

Feedforward neural nets are generally great for classification and regression problems. CNNs are great for complex image classification. But, activations for feedforward and CNNs flow only in one direction, from the input layers to the output layer. Since signals flow in only one direction, feedforward and convolutional nets are not ideal if patterns in data change over time. So, we need a different network architecture to work with data impacted by time.

A **recurrent neural network** (RNN) looks a lot like a feedforward neural network, but it also has connections pointing backward. That is, output from one layer can act as input back into another layer earlier in the network. The capability of one layer to inform another layer earlier in the network means that a RNN has a built-in feedback loop mechanism that allows it to act as a forecasting engine. So, RNNs are great as forecasters because they naturally work well as data changes with time.

A RNN remembers the past and its decisions are influenced by what it has learned from the past. Feedforward and convolutional networks only remember what they learn during training. For example, a feedforward image classifier learns what an image looks like during training and then uses that knowledge to classify other images in production. While RNNs learn similarly during training, they also remember what they learned so they can make good decisions as data changes.

# Natural Language Processing

A fascinating advancement in maching learning is the ability to teach a machine how to understand human communication. The area of machine learning that concentrates on understanding how humans communicate is natural language processing. Formally, **natural language processing** (NLP) is a field in machine learning concentrating on the ability of a computer to understand, analyze, manipulate, and potentially generate human language. RNNs are a very important variant of neural networks used heavily for NLP.

RNNs are great for NLP because their standard input is a word instead of the entire sample taken as standard input by sequential nets like feedforward and convolutional networks. So, RNNs have the flexibility to work with varying lengths of sentences, which cannot be achieved by sequential neural networks because of thier fixed structure. RNNs can also share features learned across different positions of text because of their flexible structure.

The feedback loop ability of a RNN allows it to parse each word of a sentence and run an activation on it. The activation value from the word can then be fed back to the layer that is parsing the sentence. So, the activation value informs the sentence of what was learned from each word! And, the cycle continues for each word until the network understands the sentence. In machine learning speak, a RNN treats each word of a sentence as a separate input occurring at a particular time 't' and uses the activation value for this input 't-1' as feedback to the orignal sentence.

Enable the GPU (if not already enabled):
1.	click **Runtime** in the top left menu
2.	click **Change runtime type** from the drop-down menu
3.	choose **GPU** from the *Hardware accelerator* drop-down menu
4.	click **SAVE**

Test is GPU is active:

In [1]:
import tensorflow as tf

# display tf version and test if GPU is active

tf.__version__, tf.test.gpu_device_name()

('2.3.0', '/device:GPU:0')

Import the tensorflow library. If '/device:GPU:0' is displayed, the GPU is active. If '..' is displayed, the regular CPU is active.

# Generating Text with a Character-Level RNN Model

As noted, RNNs are commonly used for natural language tasks. Typically, we can model natural language tasks by character or word. We begin by building a **character-level** model that generates text. In the next chapter, we build a word-level model that predicts sentiment.

## The Text File

We are going to work with **A Tale of Two Cities** by Charles Dickens. For convenience, we’ve already downloaded the Plain Text UTF-8 and **processed** it so you don’t have to.

To get the processed text file, just follow these simple steps:

1. go to the GitHub URL for this book: https://github.com/paperd/tensorflow
2. locate the file: click **chapter8**, click **data**, click **two_cities.txt**
3. click the **Raw** button
4. copy the text (**Ctrl** + **a** + **c**)
5. paste it into **Notepad** or another basic text editor (**Ctrl** + **v**)
6. save it on your computer as **two_cities.txt**
7. drag and drop file to your Google Drive **Colab Notebooks** folder

## Mount Google Drive

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Click on the URL, choose a Google account, click **Allow**, copy the authorization code and paste it into Colab, and press the **Enter** key on your keyboard.

Be sure that you have the file in the **Colab Notebooks** directory on your Google Drive!

## Read the Corpus into Memory

In NLP, a text document is often referred to as a corpus. A **corpus** is a collection of written texts, especially the entire works of a particular author or a body of writing on a particular subject.

Read data into memory:

In [3]:
two_cities = 'drive/My Drive/Colab Notebooks/two_cities.txt'

with open(two_cities) as f:
  corpus = f.read()

## Verify Corpus

Display some text from the beginning of the corpus:

In [4]:
print (corpus[:74])

A TALE OF TWO CITIES

A STORY OF THE FRENCH REVOLUTION

By Charles Dickens


Since we start from the beginning, it is pretty easy to verify. But, verifying the end takes a bit more work.

Get the length of the corpus:

In [5]:
len(corpus)

757247

Now, we know where the corpus ends.

With a bit of trial and error, we can display the famous quote from the end:

In [6]:
print (corpus[757116:])

“It is a far, far better thing that I do, than I have ever done; it is a
far, far better rest that I go to than I have ever known.”


If you want to explore other online books for NLP, a great place to start is Project Gutenberg at the following URL: https://www.gutenberg.org/

## Create Vocabulary

Since our goal is to generate text with a character-level model, we train the model to predict the next character in a sequence. We can then repeatedly call the model to generate longer sequences of text. We begin creating a vocabulary of unique characters contained in the corpus and store it in **vocab**.    

Create a vocabulary of unique characters contained in the corpus:

In [7]:
# unique characters in the corpus

vocab = sorted(set(corpus))
print ('{} unique characters'.format(len(vocab)))

74 unique characters


So, we have a vocabulary of 74 unique characters in our corpus.

## Vectorize the Text

Algorithms process numbers, not text. So, we must devise a numerical representation of the corpus. An easy solution is to vectorize text. **Text vectorization** is the process of converting text into numerical representation.

Let's start by creating dictionary **int_map** to hold integer mappings of unique characters. Next, create numpy array **char_map** to hold character mappings of each integer representation. The numpy array allows us to translate encoded integer mappings back into their character representations. Once we vectorize the corpus, we can build the input pipeline for TensorFlow consumption.

## Create Integer Mappings 

We use dictionary comprehension to create **int_map**, which holds integer mappings for the corpus. **Dictionary comprehension** is a method for creating dictionaries using simple expressions. A dictionary comprehension takes the form **{key: value for (key, value) in iterable}**. In our case, the key is a unique character in the corpus and the value is the integer mapping of the unique character.

In [8]:
# create a dictionary with integer representations of characters

int_map = {key : value for value, key in enumerate(vocab)}
int_map['a']

45

So, integer '46' represents the letter 'a'. Let's validate that this is the case:

In [9]:
# create numpy array to hold character mappings of integers

import numpy as np

char_map = np.array(vocab)
char_map[45]

'a'

It looks like our mappings work. Let's try it on a sequence.

In [10]:
# create variable to hold line break
br = '\n'

# simple sequence
sequence = 'hello world'
print ('original sequence:', sequence, br)

# map to integer representations
maps = np.array([int_map[c] for c in sequence])
print ('integer mappings:', maps, br)

# map integer representations back into characters
s = [char_map[i] for i in maps]

# create string from list of characters
s = ''.join(s)
print ('translation:', s)

original sequence: hello world 

integer mappings: [52 49 56 56 59  1 67 59 62 56 48] 

translation: hello world


## Vectorize the Corpus

Now, we are ready to vectorize the corpus:

In [11]:
# vectorize the corpus
encoded = np.array([int_map[c] for c in corpus])
encoded[:20], char_map[encoded[:20]]

(array([19,  1, 38, 19, 30, 23,  1, 33, 24,  1, 38, 41, 33,  1, 21, 27, 38,
        27, 23, 37]),
 array(['A', ' ', 'T', 'A', 'L', 'E', ' ', 'O', 'F', ' ', 'T', 'W', 'O',
        ' ', 'C', 'I', 'T', 'I', 'E', 'S'], dtype='<U1'))

Numpy array **encoded** holds the vectorized corpus. We display twenty integer mappings and their decodings to verify.

## Predict the Next Character

At each time step during training, our goal is to predict the next probable character given a character or a sequence of characters. So, input to the model must be a sequence of characters. To reach our goal, we must feed the model proper training data.

## Create Training Input Sequences

To create training intances, we divide the corpus into input sequences. Each input sequence contains *seq_length* characters from the corpus. The **seq_length** is the maximum length sentence we want for a single input sequence in characters. We break the corpus into equal length sequences for better performance.

For each input sequence, the sample contains the text and the corresponding target contains the text shifted one character to the right. So, we break the text into chunks of *seq_length + 1*.

Let's begin by converting the encoded corpus to tensors.

In [12]:
# intialize maximum length sequence for a single input in characters

seq_length = 100

# create training dataset

ds = tf.data.Dataset.from_tensor_slices(encoded)
ds

<TensorSliceDataset shapes: (), types: tf.int64>

Display some samples from the TensorFlow dataset. We convert the integer representations of each character back to character state with the *char_map* array created earlier.

In [13]:
for i in ds.take(6):
  print (i.numpy(), ':', char_map[i])

19 : A
1 :  
38 : T
19 : A
30 : L
23 : E


# Batch Sequences

The batch method lets us easily convert individual characters to sequences of the desired size.

In [14]:
sequences = ds.batch(seq_length + 1, drop_remainder=True)

for i in sequences.take(1):
  print (char_map[i], br)
  print ('batch size:', len(i))

['A' ' ' 'T' 'A' 'L' 'E' ' ' 'O' 'F' ' ' 'T' 'W' 'O' ' ' 'C' 'I' 'T' 'I'
 'E' 'S' '\n' '\n' 'A' ' ' 'S' 'T' 'O' 'R' 'Y' ' ' 'O' 'F' ' ' 'T' 'H' 'E'
 ' ' 'F' 'R' 'E' 'N' 'C' 'H' ' ' 'R' 'E' 'V' 'O' 'L' 'U' 'T' 'I' 'O' 'N'
 '\n' '\n' 'B' 'y' ' ' 'C' 'h' 'a' 'r' 'l' 'e' 's' ' ' 'D' 'i' 'c' 'k' 'e'
 'n' 's' '\n' '\n' '\n' 'C' 'O' 'N' 'T' 'E' 'N' 'T' 'S' '\n' '\n' '\n' ' '
 ' ' ' ' ' ' ' ' 'B' 'o' 'o' 'k' ' ' 't' 'h' 'e'] 

batch size: 101


Batch size is 101 to account for the shift of 1 character to the right for the target.

# Create Samples and Targets

For each input sequence, use the **map** method to apply the *create_sample_target* function. The function shifts an input sequence by *1* to form the sample and target texts for each batch.

In [15]:
def create_sample_target(piece):
  sample = piece[:-1]
  target = piece[1:]
  return sample, target

dataset = sequences.map(create_sample_target)

Display the first split input sequence.

In [16]:
for sample, target in  dataset.take(1):
  print ('sample:', char_map[sample], br)
  print ('target:', char_map[target])

sample: ['A' ' ' 'T' 'A' 'L' 'E' ' ' 'O' 'F' ' ' 'T' 'W' 'O' ' ' 'C' 'I' 'T' 'I'
 'E' 'S' '\n' '\n' 'A' ' ' 'S' 'T' 'O' 'R' 'Y' ' ' 'O' 'F' ' ' 'T' 'H' 'E'
 ' ' 'F' 'R' 'E' 'N' 'C' 'H' ' ' 'R' 'E' 'V' 'O' 'L' 'U' 'T' 'I' 'O' 'N'
 '\n' '\n' 'B' 'y' ' ' 'C' 'h' 'a' 'r' 'l' 'e' 's' ' ' 'D' 'i' 'c' 'k' 'e'
 'n' 's' '\n' '\n' '\n' 'C' 'O' 'N' 'T' 'E' 'N' 'T' 'S' '\n' '\n' '\n' ' '
 ' ' ' ' ' ' ' ' 'B' 'o' 'o' 'k' ' ' 't' 'h'] 

target: [' ' 'T' 'A' 'L' 'E' ' ' 'O' 'F' ' ' 'T' 'W' 'O' ' ' 'C' 'I' 'T' 'I' 'E'
 'S' '\n' '\n' 'A' ' ' 'S' 'T' 'O' 'R' 'Y' ' ' 'O' 'F' ' ' 'T' 'H' 'E' ' '
 'F' 'R' 'E' 'N' 'C' 'H' ' ' 'R' 'E' 'V' 'O' 'L' 'U' 'T' 'I' 'O' 'N' '\n'
 '\n' 'B' 'y' ' ' 'C' 'h' 'a' 'r' 'l' 'e' 's' ' ' 'D' 'i' 'c' 'k' 'e' 'n'
 's' '\n' '\n' '\n' 'C' 'O' 'N' 'T' 'E' 'N' 'T' 'S' '\n' '\n' '\n' ' ' ' '
 ' ' ' ' ' ' 'B' 'o' 'o' 'k' ' ' 't' 'h' 'e']


Notice that the target is one character ahead of the sample. We do this so the algorithm can learn from the target how to predict the next character.

# Time Step Prediction

Each index of the sample and target vectors is processed as a single time step. That is, each character processed in a sample and target is a time step. So, for the input at time step 0, the model receives the index (input_idx) for **A** and tries to predict the index for **a space** as the next character. At the next time step, the model repeats the same process. But, the RNN model considers the context of the previous step in addition to the current input character. The output verifies that the sample and target sets were created properly.

Let't look at the first 5 timesteps:

In [17]:
for i, (input_idx, target_idx) in enumerate(
    zip(sample[:5], target[:5])):
  print('Step:', i)
  print(' input:', input_idx.numpy(),
        char_map[input_idx])
  print(' expected output:', target_idx.numpy(),
        char_map[target_idx])
  if i < 4: print()

Step: 0
 input: 19 A
 expected output: 1  

Step: 1
 input: 1  
 expected output: 38 T

Step: 2
 input: 38 T
 expected output: 19 A

Step: 3
 input: 19 A
 expected output: 30 L

Step: 4
 input: 30 L
 expected output: 23 E


As we can see, the sample and target data were created properly.

# Create Training Batches

We already split the text into manageable sequences. But, before feeding the data into the model we need to shuffle it and pack it into batches.

In [18]:
# batch size
BATCH_SIZE = 64

# buffer size
BUFFER_SIZE = 10000

corpus_ds = (dataset
  .shuffle(BUFFER_SIZE)
  .batch(BATCH_SIZE, drop_remainder=True)
  .cache().prefetch(1))

corpus_ds

<PrefetchDataset shapes: ((64, 100), (64, 100)), types: (tf.int64, tf.int64)>

TensorFlow data is designed to work with infinite sequences. So, it doesn't attempt to shuffle the entire sequence in memory. Instead, it maintains a buffer where it shuffles elements. We set *BUFFER_SIZE=10000* to give TensorFlow a fairly large buffer size, but not too big that we cause memory issues. We can see that our dataset contains training samples and targets with batch sizes of 64 and sequence lengths of 100.

# Build the Model

Begin by initializing some important variables. We set *vocab_size* to the number of unique characters in the corpus. We set *embedding dimension* to 256. **Word embedding** is a learning technique in NLP where words or phrases from the vocabulary are mapped to vectors of real numbers. In practice, we use word embedding vectors with dimensions between 50 and 500. We use *256*, which we believe is a nice compromise between processing time and performance. The higher the number of word embeddings, the more performance we can squeeze out of our model. But, higher embedding dimensions are computationally expensive. We set *rnn_units* to 1024, which represents the number of neurons output from a layer.

In [19]:
# length of the vocabulary in chars
vocab_size = len(vocab)

# the embedding dimension
embedding_dim = 256

# number of RNN units
rnn_units = 1024

## Generate Seed and Import Libraries

In [20]:
# plant random seeds for reproducibility
tf.random.set_seed(0)
np.random.seed(0)

# clear any previous models
tf.keras.backend.clear_session()

# import libraries
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import GRU, Dense,\
Embedding
from tensorflow.keras import losses

## Create Layers

The first layer is an embedding layer with the vocabulary size, embedding dimensions, and the input shape of the batch as inputs. The output from the embedding layer feeds into the second layer, which is a GRU layer with 1024 neurons (identified by the *rnn_units* variable). To retain what was learned at this layer, we set *return_sequences=True* and *stateful=True*. We also want to tell the GRU layer to draw samples from a uniform distribution so we set *recurrent_initializer='glorot_uniform'*. The output from the GRU layer feeds into the final Dense layer with vocabulary size as input.

In [21]:
# create the model

model = Sequential([
  Embedding(vocab_size, embedding_dim,
            batch_input_shape=[BATCH_SIZE, None]),
  GRU(rnn_units, return_sequences=True,
      stateful=True, recurrent_initializer='glorot_uniform'),
  Dense(vocab_size)
])

## Display Model Summary

In [22]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (64, None, 256)           18944     
_________________________________________________________________
gru (GRU)                    (64, None, 1024)          3938304   
_________________________________________________________________
dense (Dense)                (64, None, 74)            75850     
Total params: 4,033,098
Trainable params: 4,033,098
Non-trainable params: 0
_________________________________________________________________


The first layer is an embedding. So, calculate the number of learnable parameters by multiplying vobulary size of 74 by embedding dimension of 256 for a total of 18,944. 

The second layer is a GRU. The number of learnable parameters is thereby based on the formula **3 x (n<sup>2</sup> x mn + 2n)** where *m* is the input dimension and *n* is the output dimension. Multiply by *3* because there are three sets of operations requiring weight matrices of these sizes. Multiply n by *2* because of the feedback loops of a RNN. So, we get 3,938,304 learnable parameters. Here's how we break down the result:
* 3 x (1024<sup>2</sup> + 1024 x 256 + 2 x 1024)
* 3 x (1048576 + 262144 + 2048)
* 3 x 1312768
* 3,938,304

Calculating learnable parameters for the second layer is pretty complex. Let's break it down logically. A GRU layer is a feedfoward layer with feedback loops. Learnable parameters for a feedforward network are calculated by multiplying output from the previous layer (256 neurons) with neurons at the current layer (1024 neurons). With a feedforward network, we also have to account for the 1024 neurons at this layer. But, we multiply the 1024 neurons at this layer by 2 because of the feedback mechanism of a RNN. Finally, the current layer's 1024 neurons are fed back resulting in 1024<sup>2</sup> learnable parameters. A GRU uses three sets of operations (hidden state, reset gate, and update gate) requiring weight matrices, so we multiply the learnable parameters by 3.

The third layer is dense. So, calculate the number of learnable parameters by multiplying output dimension of 74 by input dimension of 1,024 and adding 74 to account for the number of neurons at this layer for a total of 75,850.

For a deep discussion of GRUs, consult the following URL: https://arxiv.org/ftp/arxiv/papers/1701/1701.05923.pdf

## Check Output Shape

Display the shape of the first batch in the dataset.

In [23]:
for sample, target in corpus_ds.take(1):
  example_batch_predictions = model(sample)
  
example_batch_predictions.shape

TensorShape([64, 100, 74])

So, the first batch has *batch_size* of 64, *sequence_length* of 100, and *vocab_size* of 74 as expected. Notice that the output shape from displaying model.summary() is (64, None, 74). The sequence length is not included because the model can be run on inputs of any length.

## Calculate Loss

We sample from the output distribution to predict character indices. The output distribution is defined by the logits over our character vocabulary. A **logit** is a probability value between 0 and 1, and negative infinity to infinity derived from a logit function. Simply, a logit is a prediction. The logit function is an inverse to the sigmoid function as it limits values between 0 and 1 across the Y-axis rather than the X-axis. Since our model returns logits, we need to set the **from_logits** flag to calculate loss.

In [24]:
def loss(labels, logits):
  return losses.sparse_categorical_crossentropy(
      labels, logits, from_logits=True)

The model expects a 3D tensor consisting of batch size, sequence length, and vocabulary size. So, let's test our loss function that it is working as expected.

In [25]:
pre_trained_loss = loss(target, example_batch_predictions)

print('pred shape: ', example_batch_predictions.shape)
print('scalar_loss: ', pre_trained_loss.numpy().mean())

pred shape:  (64, 100, 74)
scalar_loss:  4.302785


Great! We can see that the prediction shape has batch size of 64, sequence length of 100, and vocabulary size of 74. We also display the average loss from the pretrained model.

## Compile the Model

In [26]:
model.compile(loss=loss,
              optimizer='adam')

## Configure Checkpoints

With a RNN, we want to save what the model learned at each timestep. One way to do this is to save the checkpoints that hold this information with a callback method. **Checkpoints** capture the exact value of all TensorFlow parameters used by a model.

In [27]:
import os

# directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'

# name of the checkpoint files
checkpoint_files = os.path.join(checkpoint_dir,
                                'ckpt_{epoch}')

# callback method
checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_files,
    save_weights_only=True)

## Train the Model

Let's train the model on 10 epochs. You can add epochs to improve performance. We tell the model to save checkpoints with **checkpoint_callback**.

In [28]:
EPOCHS = 10

history = model.fit(corpus_ds, epochs=EPOCHS,
                    callbacks=[checkpoint_callback])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


## Rebuild Model for Text Creation

### Restore Weights from Checkpoints

First, restore the weights from the checkpoints we established during training. We need to restore the checkpoints to obtain what the RNN learned at each time step.

In [29]:
tf.train.latest_checkpoint(checkpoint_dir)

'./training_checkpoints/ckpt_10'

### Rebuild with Batch Size of 1

Second, to keep prediction simple we use a batch size of 1. Because of the way the RNN state is passed from timestep to timestep, the model only accepts a fixed batch size once built. So, let's rebuild it with batch size of 1 (instead of 64).

In [30]:
# generate seed for reproducibility
tf.random.set_seed(0)
np.random.seed(0)

# clear any previous models
tf.keras.backend.clear_session()

# set batch size to 1
BATCH_SIZE = 1

# Rebuild model
model = Sequential([
  Embedding(vocab_size, embedding_dim,
            batch_input_shape=[BATCH_SIZE, None]),
  GRU(rnn_units, return_sequences=True,
      stateful=True, recurrent_initializer='glorot_uniform'),
  Dense(vocab_size)
])

### Load Weights and Reshape

Third, load the weights and reshape the model to ensure that tensors have batch size of 1.

In [31]:
model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))
model.build(tf.TensorShape([1, None]))

In [32]:
# good idea to view model at this point

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (1, None, 256)            18944     
_________________________________________________________________
gru (GRU)                    (1, None, 1024)           3938304   
_________________________________________________________________
dense (Dense)                (1, None, 74)             75850     
Total params: 4,033,098
Trainable params: 4,033,098
Non-trainable params: 0
_________________________________________________________________


All is well!

## Create New Text

To create new text, create a function and initialize a set of variables to feed to the function. To prepare the starting string for TensorFlow consumption, vectorize and reshape it before passing it to the function. Let's begin with the function.

### Create the Function

The function accepts the model, vectorized starting string, temperature, and the original starting string. It begins by intializing a list to hold the new text created and resetting the states of the model. The function continues by iterating **n** times (the number of characters we wish to create).

**Temperature** is a hyperparameter of neural networks used to control the randomness of predictions by scaling the logits before applying softmax. A bit more simply, temperature represents how much to divide the logits by before computing softmax.

During iteration, the function models the encoded starting string and places the result in **predictions**. It then removes the extra '1' dimension so it can divide the contents of 'predictions' by the temperature. The next task of the function is to use a categorical distribution to predict the next character returned by the model. The function needs to add the '1' dimension back so that it can pass the predicted character as the next input to the model along with the previous hidden state. The process repeats until the loop is extinghuised.

In [33]:
def create_text(model, input_eval, temperature, start_string):

  # Empty string to store our results
  new_text = []

  # Here batch size == 1
  model.reset_states()

  for i in range(n):
    # model encoded input
    predictions = model(input_eval)

    # remove batch dimension so we can manipulate predictions
    predictions = tf.squeeze(predictions, 0)

    # divide predictions by temperature
    predictions = predictions / temperature

    # use a categorical distribution to predict character
    # returned by model
    predicted_id = tf.random.categorical(
        predictions, num_samples=1)[-1,0].numpy()

    # pass predicted character as next input to model
    # with previous hidden state
    input_eval = tf.expand_dims([predicted_id], 0)

    # append generated characters to text
    new_text.append(char_map[predicted_id])

  return (start_string + ''.join(new_text))

### Initialize Variables

Now that we have the function, let's initialize. We begin by setting n to the number of characters we wish to create. We continue by setting the temperature and the starting string. Low temperatures result in more predictable text, while higher temperatures result in more surprising text. You can experiment to find the best setting.

In [34]:
n = 500
temp = 0.3
start_string = 'Tale'

We can try different start strings, but we chose 'Tale' because we know that the corpus contains this name.

### Vectorize and Reshape Starting String

We need to vectorize the starting string because the model only recognizes numbers. We need to reshape the vectorized starting string for TensorFlow consumption. We display the shapes to verify that all is well.

In [35]:
# vectorize starting string
input_vectorized = [int_map[s] for s in start_string]
print ('original shape:', end=' ')
print (str(np.array(input_vectorized).ndim) + 'D', br)

# reshape string for TensorFlow model consumption
input_vectorized = tf.expand_dims(input_vectorized, 0)
print ('new shape:', input_vectorized.shape)

original shape: 1D 

new shape: (1, 4)


### Create New Text with Function Invocation

We are now ready to create new text. Plant random seeds for reproducibility. Next, call the function.

In [36]:
tf.random.set_seed(0)
np.random.seed(0)

print (create_text(model, input_vectorized, temp, start_string))

Tale the ways of the chair that was not my patriot with a daughter, and that they were near the manner of the paper than he came to the chair that the traveller was the way of the paper than the days when the chateau was closed to himself. “He had been the case while I can't say that the paper than the chair to the case which he came to the case of the name of the way of the chair was the way of the case when they were left them that the way to the chateau with the carriage of the chair than the cha


Wow! Although the sentences are nonsensical, the model creates actual sentences.