# Deep Learning for Text Generation 
> A Practioners Guide : Part II

+ Project Gutenberg : [The Adventures of Sherlock Holmes](https://www.gutenberg.org/ebooks/1661)
+ [Karpathy](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)
+ [Freecode](https://www.freecodecamp.org/news/applied-introduction-to-lstms-for-text-generation-380158b29fb3/)

Temperature is a scaling factor applied to the outputs of our dense layer before applying the softmaxactivation function. In a nutshell, it defines how conservative or "creative" the model's guesses are for the next character in a sequence. Lower values of temperature (e.g., 0.2) will generate "safe" guesses whereas values of temperature above 1.0 will start to generate "riskier" guesses. Think of it as the amount of surpise you'd have at seeing an English word start with "st" versus "sg". When temperature is low, we may get lots of "the"s and "and"s; when temperature is high, things get more unpredictable.

## Training a Text Generator from Scratch

In [1]:
import tensorflow as tf

import numpy as np
import os
import time

In [2]:
datafile_path = r'data/the_adventures_of_sherlock_holmes_1661-0.txt'

In [3]:
# Load the text file
text = open(datafile_path, 'rb').read().decode(encoding='utf-8')

# Get the number of characters
print ('Length of text: {} characters'.format(len(text)))

Length of text: 594197 characters


In [4]:
# Sample text
print(text[1300:1500])

I. A SCANDAL IN BOHEMIA


I.

To Sherlock Holmes she is always _the_ woman. I have seldom heard him
mention her under any other name. In his eyes she eclipses and
predominates the whole of her 


In [5]:
# We remove first 1300 characters to remove details related to project gutenberg
text = text [1300:]

In [6]:
# Get quick details on unique characters
vocab = sorted(set(text))
print ('{} unique characters'.format(len(vocab)))

96 unique characters


## Prepare Text

In [7]:
# Creating a mapping from unique characters to indices
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)

text_as_int = np.array([char2idx[c] for c in text])

In [8]:
print('{')
for char,_ in zip(char2idx, range(20)):
    print('  {:4s}: {:3d},'.format(repr(char), char2idx[char]))
print('  ...\n}')

{
  '\n':   0,
  '\r':   1,
  ' ' :   2,
  '!' :   3,
  '"' :   4,
  '$' :   5,
  '%' :   6,
  '&' :   7,
  "'" :   8,
  '(' :   9,
  ')' :  10,
  '*' :  11,
  ',' :  12,
  '-' :  13,
  '.' :  14,
  '/' :  15,
  '0' :  16,
  '1' :  17,
  '2' :  18,
  '3' :  19,
  ...
}


In [9]:
# Show how the first 13 characters from the text are mapped to integers
print ('{} ---- characters mapped to int ---- > {}'.format(repr(text[:13]), text_as_int[:13]))

'I. A SCANDAL ' ---- characters mapped to int ---- > [38 14  2 30  2 48 32 30 43 33 30 41  2]


### Prepare Input-> Target dataset

In [10]:
# The maximum length sentence we want for a single input in characters
seq_length = 100
examples_per_epoch = len(text)//(seq_length+1)

# Create training examples / targets
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

for i in char_dataset.take(10):
    print(idx2char[i.numpy()])

I
.
 
A
 
S
C
A
N
D


### Prepare Batch

In [11]:
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

for item in sequences.take(10):
    print(repr(''.join(idx2char[item.numpy()])))

'I. A SCANDAL IN BOHEMIA\r\n\r\n\r\nI.\r\n\r\nTo Sherlock Holmes she is always _the_ woman. I have seldom heard '
'him\r\nmention her under any other name. In his eyes she eclipses and\r\npredominates the whole of her se'
'x. It was not that he felt any emotion\r\nakin to love for Irene Adler. All emotions, and that one part'
'icularly,\r\nwere abhorrent to his cold, precise but admirably balanced mind. He\r\nwas, I take it, the m'
'ost perfect reasoning and observing machine that\r\nthe world has seen, but as a lover he would have pl'
'aced himself in a\r\nfalse position. He never spoke of the softer passions, save with a gibe\r\nand a sne'
'er. They were admirable things for the observer—excellent for\r\ndrawing the veil from men’s motives an'
'd actions. But for the trained\r\nreasoner to admit such intrusions into his own delicate and finely\r\na'
'djusted temperament was to introduce a distracting factor which might\r\nthrow a doubt upon all his men'
'tal results. Grit in a sensit

In [12]:
def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

dataset = sequences.map(split_input_target)

In [13]:
for input_example, target_example in  dataset.take(1):
    print ('Input data: ', repr(''.join(idx2char[input_example.numpy()])))
    print ('Target data:', repr(''.join(idx2char[target_example.numpy()])))

Input data:  'I. A SCANDAL IN BOHEMIA\r\n\r\n\r\nI.\r\n\r\nTo Sherlock Holmes she is always _the_ woman. I have seldom heard'
Target data: '. A SCANDAL IN BOHEMIA\r\n\r\n\r\nI.\r\n\r\nTo Sherlock Holmes she is always _the_ woman. I have seldom heard '


In [14]:
for i, (input_idx, target_idx) in enumerate(zip(input_example[:5], target_example[:5])):
    print("Step {:4d}".format(i))
    print("  input: {} ({:s})".format(input_idx, repr(idx2char[input_idx])))
    print("  expected output: {} ({:s})".format(target_idx, repr(idx2char[target_idx])))

Step    0
  input: 38 ('I')
  expected output: 14 ('.')
Step    1
  input: 14 ('.')
  expected output: 2 (' ')
Step    2
  input: 2 (' ')
  expected output: 30 ('A')
Step    3
  input: 30 ('A')
  expected output: 2 (' ')
Step    4
  input: 2 (' ')
  expected output: 48 ('S')


### Prepare Training Batch

In [15]:
# Batch size
BATCH_SIZE = 64

In [16]:
# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences,
# so it doesn't attempt to shuffle the entire sequence in memory. Instead,
# it maintains a buffer in which it shuffles elements).
BUFFER_SIZE = 10000

dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

dataset

<BatchDataset shapes: ((64, 100), (64, 100)), types: (tf.int64, tf.int64)>

## Model

In [18]:
# Length of the vocabulary in chars
vocab_size = len(vocab)

# The embedding dimension
embedding_dim = 256

# Number of RNN units
rnn_units = 1024

In [19]:
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
    model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim,
                              batch_input_shape=[batch_size, None]),
    tf.keras.layers.GRU(rnn_units,
                        return_sequences=True,
                        stateful=True,
                        recurrent_initializer='glorot_uniform'),
    tf.keras.layers.Dense(vocab_size)
  ])
    return model

In [20]:
model = build_model(
  vocab_size = len(vocab),
  embedding_dim=embedding_dim,
  rnn_units=rnn_units,
  batch_size=BATCH_SIZE)

In [21]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (64, None, 256)           24576     
_________________________________________________________________
gru (GRU)                    (64, None, 1024)          3938304   
_________________________________________________________________
dense (Dense)                (64, None, 96)            98400     
Total params: 4,061,280
Trainable params: 4,061,280
Non-trainable params: 0
_________________________________________________________________


In [22]:
def loss(labels, logits):
    return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

In [23]:
model.compile(optimizer='adam', loss=loss)

In [24]:
# Directory where the checkpoints will be saved
checkpoint_dir = r'data/training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

### Train

In [25]:
EPOCHS = 12

In [26]:
history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])

Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


## Generate Text

In [27]:
tf.train.latest_checkpoint(checkpoint_dir)

'data/training_checkpoints/ckpt_12'

In [28]:
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)

model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))

model.build(tf.TensorShape([1, None]))

In [29]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (1, None, 256)            24576     
_________________________________________________________________
gru_1 (GRU)                  (1, None, 1024)           3938304   
_________________________________________________________________
dense_1 (Dense)              (1, None, 96)             98400     
Total params: 4,061,280
Trainable params: 4,061,280
Non-trainable params: 0
_________________________________________________________________


In [30]:
def generate_text(model, start_string):
    # Evaluation step (generating text using the learned model)

    # Number of characters to generate
    num_generate = 1000

    # Converting our start string to numbers (vectorizing)
    input_eval = [char2idx[s] for s in start_string]
    input_eval = tf.expand_dims(input_eval, 0)

    # Empty string to store our results
    text_generated = []

    # Low temperatures results in more predictable text.
    # Higher temperatures results in more surprising text.
    # Experiment to find the best setting.
    temperature = 1.0

    # Here batch size == 1
    model.reset_states()
    for i in range(num_generate):
        predictions = model(input_eval)
        # remove the batch dimension
        predictions = tf.squeeze(predictions, 0)

        # using a categorical distribution to predict the character returned by the model
        predictions = predictions / temperature
        predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

        # We pass the predicted character as the next input to the model
        # along with the previous hidden state
        input_eval = tf.expand_dims([predicted_id], 0)

        text_generated.append(idx2char[predicted_id])

    return (start_string + ''.join(text_generated))

In [31]:
print(generate_text(model, start_string=u"Watson "))

Watson very much
more. I am about soming seven slyes, lazed’s
places, poosons, all over I will upon the flow. Is sourch is to ask I am arrested.

“‘Ar0 what you can as we sat a goose incicually vowict.’

“We must spend the beg. Sid Rose a headt and salary, and wrstand
aside all by him. There is a pair of my compan side, and he had no maited face been there very
means, I should impossible to hat fig murderstanding in the bird. We am afraid that I was
less Lestrade on a cabar clean?”

“Well, it is. She leaves a tere of a dargers, for there had caught it fermuding.

“But by a Geangal important I wno got looking and bytell curton luscing
is beside that he well walked solved. “It is twenty, Holmes, that you
 explained was every hadfeder chance.”

“Well, you say, I would suppose, and what I was a death, and I will indeed a new circumstants? How
curree had occurred with his eye of the wood
well?”

“But within a wooden clothes, and to bo you to ask with the ran.
“There wa

## Decoding Strategies

## Challenges