In [1]:
import keras
keras.__version__

Using TensorFlow backend.


'2.2.4'

# Text generation with LSTM

This notebook contains the code samples found in Chapter 8, Section 1 of [Deep Learning with Python](https://www.manning.com/books/deep-learning-with-python?a_aid=keras&a_bid=76564dff). Note that the original text features far more content, in particular further explanations and figures: in this notebook, you will only find source code and related comments.

----

[...]

## Implementing character-level LSTM text generation


Let's put these ideas in practice in a Keras implementation. The first thing we need is a lot of text data that we can use to learn a 
language model. You could use any sufficiently large text file or set of text files -- Wikipedia, the Lord of the Rings, etc. In this 
example we will use some of the writings of Nietzsche, the late-19th century German philosopher (translated to English). The language model 
we will learn will thus be specifically a model of Nietzsche's writing style and topics of choice, rather than a more generic model of the 
English language.

## Preparing the data

Let's start by downloading the corpus and converting it to lowercase:

In [2]:
import keras
import numpy as np

path = keras.utils.get_file(
    'nietzsche.txt',
    origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()
print('Corpus length:', len(text))

Downloading data from https://s3.amazonaws.com/text-datasets/nietzsche.txt
Corpus length: 600901



Next, we will extract partially-overlapping sequences of length `maxlen`, one-hot encode them and pack them in a 3D Numpy array `x` of 
shape `(sequences, maxlen, unique_characters)`. Simultaneously, we prepare a array `y` containing the corresponding targets: the one-hot 
encoded characters that come right after each extracted sequence.

In [3]:
# Length of extracted character sequences
maxlen = 60

# We sample a new sequence every `step` characters
step = 3

# This holds our extracted sequences
sentences = []

# This holds the targets (the follow-up characters)
next_chars = []

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('Number of sequences:', len(sentences))

# List of unique characters in the corpus
chars = sorted(list(set(text)))
print('Unique characters:', len(chars))
# Dictionary mapping unique characters to their index in `chars`
char_indices = dict((char, chars.index(char)) for char in chars)

# Next, one-hot encode the characters into binary arrays.
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Number of sequences: 200281
Unique characters: 59
Vectorization...


## Building the network

Our network is a single `LSTM` layer followed by a `Dense` classifier and softmax over all possible characters. But let us note that 
recurrent neural networks are not the only way to do sequence data generation; 1D convnets also have proven extremely successful at it in 
recent times.

In [4]:
from keras import layers

model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))






Since our targets are one-hot encoded, we will use `categorical_crossentropy` as the loss to train the model:

In [5]:
optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)





## Training the language model and sampling from it


Given a trained model and a seed text snippet, we generate new text by repeatedly:

* 1) Drawing from the model a probability distribution over the next character given the text available so far
* 2) Reweighting the distribution to a certain "temperature"
* 3) Sampling the next character at random according to the reweighted distribution
* 4) Adding the new character at the end of the available text

This is the code we use to reweight the original probability distribution coming out of the model, 
and draw a character index from it (the "sampling function"):

In [6]:
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)


Finally, this is the loop where we repeatedly train and generated text. We start generating text using a range of different temperatures 
after every epoch. This allows us to see how the generated text evolves as the model starts converging, as well as the impact of 
temperature in the sampling strategy.

In [None]:
import random
import sys

for epoch in range(1, 60):
    print('epoch', epoch)
    # Fit the model for 1 epoch on the available training data
    model.fit(x, y,
              batch_size=128,
              epochs=1)

    # Select a text seed at random
    start_index = random.randint(0, len(text) - maxlen - 1)
    generated_text = text[start_index: start_index + maxlen]
    print('--- Generating with seed: "' + generated_text + '"')

    for temperature in [0.2, 0.5, 1.0, 1.2]:
        print('------ temperature:', temperature)
        sys.stdout.write(generated_text)

        # We generate 400 characters
        for i in range(400):
            sampled = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(generated_text):
                sampled[0, t, char_indices[char]] = 1.

            preds = model.predict(sampled, verbose=0)[0]
            next_index = sample(preds, temperature)
            next_char = chars[next_index]

            generated_text += next_char
            generated_text = generated_text[1:]

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

epoch 1
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where

Epoch 1/1
--- Generating with seed: " physical man."[19] this
dictum, grown hard and cutting bene"
------ temperature: 0.2
 physical man."[19] this
dictum, grown hard and cutting beness of the consciences of the soul of the consciences of the moral the consciences of the consciences of the soul of the soul of the soul of the in the soul of the still and and and the still the greater the consciences of the man and the own and and and the consciences of the consciences of the some has of the consciences of the soul and and one and and the soul of the and and the still the deligh
------ temperature: 0.5
and one and and the soul of the and and the still the delight to be apon of the desired" and sould profornd, the enture of the moral the or good
the conduries is all the expresion, of the religions and opposient of the gowne, the whole they a deligure of such and all be a proficence and a

 in an endurish and stronger of he and create compared, where manking, st themselves manows to mags as litenioned of philosophicing of miscrong
complient fow phred imperway re rous command times--bading to theil one is "faund must as the posit toward, are the highest vistous elequence of the evilw man serenic right is complaik, in out. that even reabons greatest sup
faws, possible, him. to be
poster, is defanges he wishous becomes of once of pute or a gett
------ temperature: 1.2
er, is defanges he wishous becomes of once of pute or a gettest gree pusion of them newis .. wishoutsorially ma
tafter and that,
make anything myst
ideas,
than it?--  ", and enstine wo deselvas,d rga
 .


 
this hand difinons--it
not
a besein conmpontenwler, such inconquraquted it, we misuxflinical disexe sare areiably in
misunderstand the refsenting most belose without theur--for his inbicy," howe--causs: wat does nothieg, inalchious, sensual.--for exhpis
epoch 5
Epoch 1/1
--- Generating with seed: "en constr

esires and decided and and the sense of the problem of the saint which the comment and commenance of the ease the greatese man we consideration, and soul and the position and the difficult to please of the ethics which is not as the same the present that as a the father and former because of the world of the reason the inclusions the strength with the heart be and the commendance, and has been excertion is instinct, the complien personal consideration the 
------ temperature: 1.0
ertion is instinct, the complien personal consideration the noble-drem of only the from here imincless ever a clised in this.wjessenes for individad," that wrets relaon, one may mabanter, as mention, appronces loves doett (commany dong in as the rema jay the world, and
fore, and power, the that lang!--but a special guese itself himself,
and in averinction, the wholen,
but what sang pand of
science, amginate,
it may see,
there is a langshis. society of the 
------ temperature: 1.2
, amginate,
it may see,
there 

praises not become who has been the free spirits, and the sense of the common and the spirit and the said the common and the problem of the common the more intellect and in the spirit of the conscience of the common the act of the spirit without the said the continuent the desires of the same deceive the feelings the religious the faith of the desire to say the said as the said the master in the same deceive the
------ temperature: 0.5
 say the said as the said the master in the same deceive the good of the free must our for in the whole of his comperis in the master of the said to his most we last desereness, as stand the intellect such perhaps a man is it is a should a most life is the the person of the child may be which every learn stand of the basis of the sacrifice and is long to be succession of the sense of life of condition, who has the act weaker extent, no longer contrasting t
------ temperature: 1.0
tion, who has the act weaker extent, no longer contrasting toy yen slake fo

subsecroment. here, "
revoestologi worldele--what
as passions wis
epoch 16
Epoch 1/1
--- Generating with seed: " to be approved of; he passes the judgment: "what is
injurio"
------ temperature: 0.2
 to be approved of; he passes the judgment: "what is
injurion of the saint of the soul the saint of the standard of the superficialisis of the standard of the saint of the consequence, and the special the saint of the sameness of the subject of the saint of the strength of the sameness of the saint of the sameness of the spiritual to the standard and the states of the saint of the sameness, and the samenes himself to the saints in the standard of the spiri
------ temperature: 0.5
e samenes himself to the saints in the standard of the spirit of the strengthing and religious in the influence, and dispropers of his promised
in the sronge the sameness, and who will in the strength. in the distrust of the house in the antical intellect in the saints and remained and science of himself that he dec

  This is separate from the ipykernel package so we can avoid doing imports until


he spirit of the spirit of the spiri
------ temperature: 0.5
hest the consequent of the spirit of the spirit of the spirit of the bad a there he we consist of the profound still to be interesting to an attentation and spring that which is the spirit of the hands in the subjects and the contemptation of the world be standing of the case the spring to men at once of sense of the decided to sentiments of the new and spirits of which the probable the dangerous
spirit of personality and always from the spirits, where als
------ temperature: 1.0
spirit of personality and always from the spirits, where also a spirit enlict and nevery ourselves in the logical fine the fintnes"" to trotts condicion! strong-intellectuancy of the
looking
il talk for the untranotic, in one's parsiosing and to the recare civque of one enter do won a side. the wear, in moral course, and question of -who is present instingh is without
high and abstanday
in letpe of the teres; power reself to thew above at rise compre

insight, sympathy, and solitude. the senses of the subject to the same dispart of the same dispart of the suppose it is allowed and common and subject to be so the common a still in the same to the suffering that it is the suppose it is the suppose it is the same to the spiritual the command, and the suppose it is the same success and single and supality of the present does not be such a personality of the suffering the subject o
------ temperature: 0.5
oes not be such a personality of the suffering the subject of a man is has hitherto viridiant, in everything possible be of the higher in the whole person what is the virbus the religious and consequently and suspecimate the truth is a feeling who is now the inferior of the most experience, and wished to the supposition. in the greates from the spirit of the confess a companing with a man seems to the attained to be the world of the pupperism a place of th
------ temperature: 1.0
 the attained to be the world of the pupperism a place of


As you can see, a low temperature results in extremely repetitive and predictable text, but where local structure is highly realistic: in 
particular, all words (a word being a local pattern of characters) are real English words. With higher temperatures, the generated text 
becomes more interesting, surprising, even creative; it may sometimes invent completely new words that sound somewhat plausible (such as 
"eterned" or "troveration"). With a high temperature, the local structure starts breaking down and most words look like semi-random strings 
of characters. Without a doubt, here 0.5 is the most interesting temperature for text generation in this specific setup. Always experiment 
with multiple sampling strategies! A clever balance between learned structure and randomness is what makes generation interesting.

Note that by training a bigger model, longer, on more data, you can achieve generated samples that will look much more coherent and 
realistic than ours. But of course, don't expect to ever generate any meaningful text, other than by random chance: all we are doing is 
sampling data from a statistical model of which characters come after which characters. Language is a communication channel, and there is 
a distinction between what communications are about, and the statistical structure of the messages in which communications are encoded. To 
evidence this distinction, here is a thought experiment: what if human language did a better job at compressing communications, much like 
our computers do with most of our digital communications? Then language would be no less meaningful, yet it would lack any intrinsic 
statistical structure, thus making it impossible to learn a language model like we just did.


## Take aways

* We can generate discrete sequence data by training a model to predict the next tokens(s) given previous tokens.
* In the case of text, such a model is called a "language model" and could be based on either words or characters.
* Sampling the next token requires balance between adhering to what the model judges likely, and introducing randomness.
* One way to handle this is the notion of _softmax temperature_. Always experiment with different temperatures to find the "right" one.