In [1]:
from tensorflow import keras
keras.__version__

'2.2.4-tf'

# Text generation with LSTM

This notebook contains the code samples found in Chapter 8, Section 1 of [Deep Learning with Python](https://www.manning.com/books/deep-learning-with-python?a_aid=keras&a_bid=76564dff). Note that the original text features far more content, in particular further explanations and figures: in this notebook, you will only find source code and related comments.

----

[...]

## Implementing character-level LSTM text generation


Let's put these ideas in practice in a Keras implementation. The first thing we need is a lot of text data that we can use to learn a 
language model. You could use any sufficiently large text file or set of text files -- Wikipedia, the Lord of the Rings, etc. In this 
example we will use some of the writings of Nietzsche, the late-19th century German philosopher (translated to English). The language model 
we will learn will thus be specifically a model of Nietzsche's writing style and topics of choice, rather than a more generic model of the 
English language.

## Preparing the data

Let's start by downloading the corpus and converting it to lowercase:

In [3]:
from tensorflow import keras
import numpy as np

path = keras.utils.get_file(
    'nietzsche.txt',
    origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()
print('Corpus length:', len(text))

Corpus length: 600893



Next, we will extract partially-overlapping sequences of length `maxlen`, one-hot encode them and pack them in a 3D Numpy array `x` of 
shape `(sequences, maxlen, unique_characters)`. Simultaneously, we prepare a array `y` containing the corresponding targets: the one-hot 
encoded characters that come right after each extracted sequence.

In [7]:
# Length of extracted character sequences
maxlen = 60

# We sample a new sequence every `step` characters
step = 3

# This holds our extracted sequences
sentences = []

# This holds the targets (the follow-up characters)
next_chars = []

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('Number of sequences:', len(sentences))

# List of unique characters in the corpus
chars = sorted(list(set(text)))
print('Unique characters:', len(chars))
# Dictionary mapping unique characters to their index in `chars`
char_indices = dict((char, chars.index(char)) for char in chars)

# Next, one-hot encode the characters into binary arrays.
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Number of sequences: 200278
Unique characters: 57
Vectorization...


## Building the network

Our network is a single `LSTM` layer followed by a `Dense` classifier and softmax over all possible characters. But let us note that 
recurrent neural networks are not the only way to do sequence data generation; 1D convnets also have proven extremely successful at it in 
recent times.

In [9]:
from tensorflow.keras import layers

model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 128)               95232     
_________________________________________________________________
dense_1 (Dense)              (None, 57)                7353      
Total params: 102,585
Trainable params: 102,585
Non-trainable params: 0
_________________________________________________________________


Since our targets are one-hot encoded, we will use `categorical_crossentropy` as the loss to train the model:

In [10]:
optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

## Training the language model and sampling from it


Given a trained model and a seed text snippet, we generate new text by repeatedly:

* 1) Drawing from the model a probability distribution over the next character given the text available so far
* 2) Reweighting the distribution to a certain "temperature"
* 3) Sampling the next character at random according to the reweighted distribution
* 4) Adding the new character at the end of the available text

This is the code we use to reweight the original probability distribution coming out of the model, 
and draw a character index from it (the "sampling function"):

In [11]:
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)


Finally, this is the loop where we repeatedly train and generated text. We start generating text using a range of different temperatures 
after every epoch. This allows us to see how the generated text evolves as the model starts converging, as well as the impact of 
temperature in the sampling strategy.

In [12]:
import random
import sys

# train model for 60 epochs
for epoch in range(1, 60):
    print('epoch', epoch)
    # Fit the model for 1 epoch on the available training data
    model.fit(x, y,
              batch_size=128,
              epochs=1)

    # Select a text seed at random
    
    start_index = random.randint(0, len(text) - maxlen - 1)
    generated_text = text[start_index: start_index + maxlen]
    print('--- Generating with seed: "' + generated_text + '"')

    for temperature in [0.2, 0.5, 1.0, 1.2]:
        print('------ temperature:', temperature)
        sys.stdout.write(generated_text)

        # We generate 400 characters
        for i in range(400):
            sampled = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(generated_text):
                sampled[0, t, char_indices[char]] = 1.

            preds = model.predict(sampled, verbose=0)[0]
            next_index = sample(preds, temperature)
            next_char = chars[next_index]

            generated_text += next_char
            generated_text = generated_text[1:]

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

epoch 1
Train on 200278 samples
--- Generating with seed: ",
which wants to appreciate and act according to motives, ac"
------ temperature: 0.2
,
which wants to appreciate and act according to motives, according to the conscent to an a soul of the says and the says and the science to the says of the says to the says of the says and the soul the such and the says and the conscience of the says and be the conscience of the conscience of the self-the says and the same the self-the something to the says and the sayn and the self-and the comet to the says and the decessity of the counture to the self-t
------ temperature: 0.5
 to the says and the decessity of the counture to the self-the conticuations and the the to lave to anather and the sanction and art for seems and that sayst to the sould and every may he stall a superitual and present to the are the self-knowledge the
destrustouss of such ans from the recrustions, from the
have to the more to are was faint in the says afforments and 

f gained hyid. how without appair--the enchippent deluse the "time tistrantinut"--thatthers and
theney
nlaod
his pureition,
at their ampetions and that
kant which what

corricactial litureormir to loxive. greek, justmes, time ton may?--notses: when
algays estims--n"stetions of -idbution. n-ingutais had heather themselvcs. not presims, was schilouded introcise deepess on; mi: it that the gader, far which everybed"). inno men why he must illess and
tustance 
epoch 5
Train on 200278 samples
--- Generating with seed: "he
specialized, minutest departments of science are dealt wi"
------ temperature: 0.2
he
specialized, minutest departments of science are dealt with a some the desire of the subtlety is a such a man is the being and conscience, the same to be such a power to the fastions of the same and conscience, and in the same to perhaps the bad the same to be such a conscience, and such a periodation of the subtlety of the same the same to the same and in the same to be such a such a man

is an attemping every broman takes belief in giles in the disisselations, therefore-masaky of hint effect, to mani
matience
the light requising sortance instebst, from all inacitorory fo, an incrius alti. just which he
must has gubse at conseptimently has been individfmous becomes"--and on oneger, but the promonational leftcess in light and ormanizes, as the beluseories and pain, 
------ temperature: 1.2
ftcess in light and ormanizes, as the beluseories and pain, about in
tenk strongpithest of getter.sr--a ardoneful
asman
comvules onewing times, and
if something bereers all
opinesse and
geofica, thinkever neced quited really that of which only f?o fear, belongs?nts
and
wosti. oftenerson
ascement, dimd that praver. near, intentiou,
of which that should also who are abssact, and deai that venturment is. the prosestions, there feln. here syn adposite glebsly,
epoch 9
Train on 200278 samples
--- Generating with seed: "in eye and mind) of relationship and
equality, a calm confid"
------ tem

 of the conscience of the spirit of the world of the states and moral power of every are nature with the dosicism of the power in such the society the father more for in the masters and specially and angunds and self-sense of general world and with herest be its religious stronger and spirit and state of promises of powers and object for it is also much the development and national sense of religious power of one its and that is the order and task of ocili
------ temperature: 1.0
ous power of one its and that is the order and task of ociling only the countrance which not to be an
abmutly from riscolous notuom of the
ethorning and bile his general rifferents to themselves, in every no longer you close--is become one has
unity last regarding which he would like eny readly are with at in effect acmulous entire
is her properifital ob"ised, prives to litt termss, present acts of life and sure of really more
exprasity,i wi, like sour, so
------ temperature: 1.2
f life and sure of really more

but just that of our impulses--for thinking is only a relations of the strength of the strength of the superiority of the strength of the strength of the superstition of the strength of the strength of the strength of the strength of the strength of the superstition of the sense of the most constinction of the sense of the states of the soul, and the strength of the sense of the more of the strength of the strength and something in the states of the sens
------ temperature: 0.5
ngth of the strength and something in the states of the sense the origin to the suffers of the properse who has hitherto been ever to be another of the hand and the god and the most desimated and for the constitute to say itself and in the same into the strength of entered in order of the strength and distrust and any one
more persons a more of religious entent and more suffers the more something it has a masters and the properse, and they are to set of th
------ temperature: 1.0
it has a masters and the propers

soopition, or "rungly being century
in men as a badn an ventent of thmer one than it is be, with purposess systheding see  as european from ?wchith must make giving capacb
epoch 20
Train on 200278 samples
--- Generating with seed: "fications of human life as they shine here and
there: those "
------ temperature: 0.2
fications of human life as they shine here and
there: those present distinguished in the same the same the spirities and the more spirits, and the more in the same the promise and the philosophers of the will to a spiritical and the more strong and spiritual and the more and the self-and the same time thereby be an art and souls the property and strong of the same the same the property of the same there is the same the philosophers and spirits of the same 
------ temperature: 0.5
 there is the same the philosophers and spirits of the same our unity disposinent, and not an expection of the strength of comed of worth, in the same to the preachers
of assertions and ever entire

  This is separate from the ipykernel package so we can avoid doing imports until


ke, as it is the
suffacience is the my spoul than judgment of the experiences 
one.
they are even aurted by attain
------ temperature: 1.2
ment of the experiences 
one.
they are even aurted by attaing" upon coiked in its four and noolged never werm and fored newagquily, su most manifolde everything inhu
remiessijusip of religion in thatter of faralful
ultimately in the
eviltaity and apsrarter, invention
bepted and
close, a bunding ttained them; every exensice,
predect to beautibuted. in.
under to(? agen . and it is, something enstrance of idealing is "self-grepllate that good to
make the bemi
epoch 22
Train on 200278 samples
--- Generating with seed: "ere
is a drop of cruelty.

230. perhaps what i have said her"
------ temperature: 0.2
ere
is a drop of cruelty.

230. perhaps what i have said here the sense of the same to the strength of the same to the same themselves and the speciality of the sense of the same to the spirit which the state of the same to a solitude of the spirit of th

laition of the experience and state of the german mankind of the same the earth of the lines and conscience from self-conscience is the conceals of religious with the other them. there is a humaning--and the dangerous with the intellectual men that the most german states, the deeply regard to single of the cause of the earth to the sign of the same timess the most conscience is the fartaind of the conscience of deathing in the greatest in the stronger in w
------ temperature: 1.0
 conscience of deathing in the greatest in the stronger in whimab, his own de impolents. man wet of lotes
that free spirit, when they untact, then the god belie away individualy is not by that prexises wound
him himself and but
on the scause and self-mors himself, and godais
same nesterffuluated beful and takes to all them now the periodd decessernly and it says: then stronger me, which cause of the individual
developount--thou pardous are, that the spirit
------ temperature: 1.2
he individual
developount--tho

self, and not at all by us?--we men desire that woman should be the delicate the superiorion is the conception of the conception of the problems of the superiorions to the will to the problems of the common forms of the will and such an expreciation of the strength and the best of the superiorive to the common fine and soul the commander and such an art and such an art of the strength and such an external and proves the superiorions and superiorials its so
------ temperature: 0.5
external and proves the superiorions and superiorials its something is hence the superiorive the discapacity. the acts and in the prevail of the strength of the superiorically the individual thing the strength strengths is preacher forms and responsibility is regard all the problems, and hence is the constituted in every discipline the presentive of a traditional degree and science, is great world in which is a misunderstanity and philosophy has a comprehe
------ temperature: 1.0
 in which is a misunderstanity

verods youthfulness that ay-disiour very themselves? but why has beet the one natural utilita.itienats, wicks friends! arise what fast aimitactys! in any,y phermitualest, petely them drings--where perhapspent, and precisely is formfors "for the same awect the
personat scourgly inexialdty, gastly no muemotoly turnal f
epoch 33
Train on 200278 samples
--- Generating with seed: "s too
many interpretations, and consequently hardly any mean"
------ temperature: 0.2
s too
many interpretations, and consequently hardly any meaning of the strength and present in the secretly and the present his spirit is a soul and the present with the sense of the sense of the same thing the sense of the sense of the strive to the present in the sense of the sense of the sense of the same time is the strength and the strength and the most present and the sense of the sense of the sense of the same time the constituted the present would 
------ temperature: 0.5
he sense of the same time the constituted the pres

at the other
later tough" of the servitues of christians curturrigntion same earthment of the man, in the taged youth pain,
fingered by volrimant ill appearant such, skepticistic thing thee, jesuit wands a
claim is to parthe
------ temperature: 1.2
, skepticistic thing thee, jesuit wands a
claim is to parthe would not at callly; why maintanistic without ruling,
deakhy knowere (and brought view i may partolusici. knotion, with
ferciles, then conditions. it is troud may "fa-ibbous conversions,
historicalmil-modemaall
morating trhate petous, ands throughout senses
it even a signifgues--the most eyes, and the deared,
and these,
is mutument world different our
concess friendl". emplyolomias and anysoibili
epoch 37
Train on 200278 samples
--- Generating with seed: "ently these
very instincts will be most branded and defamed."
------ temperature: 0.2
ently these
very instincts will be most branded and defamed. the superficbiation of the strong and the spirit and individual and in the strong a

e strength and the conscience and in the same most and all the new philosopher, in the assume spirit of the doctrine and respectite the morality of the enorder and spirit, as the word all the species of the depart and paradenoution, and also the dehing, and so that which the place that it would be the morality of the experiences and more that "into the constitute and being himself of a powerfked, the conscience as a man now the spirit and the subject that 
------ temperature: 1.0
the conscience as a man now the spirit and the subject that the presegg for voluntarily him, and there condition and to himself that they
has yet jesonstive for whoch near in his ages, some is one gribeled in order this already development, and no rialing himself to his need think a intellect is a bodisor, the roiloticallid of the scretion
of allow only arise who is no onrwined
as an must can me that general experienced, as a dreammonce daring for teear--t
------ temperature: 1.2
hat general experienced, as a 

wallow, even in books, and something which is also and so that the subject the spirit of the most delicate which is the whole and interesting and such a saints of the whole and the sense of the most action of the spirit of the most decisive or the man who has always in the most deeper the problem of the same things are the problem of the man who has and suffering and subject of the same with the same which is a persons o
------ temperature: 0.5
g and subject of the same with the same which is a persons of the higher and will the entire
problem of interesting of the most good of the action of the granded to readily believes the excitical of the
sense that it is not this commaning and as the interesting consequences, and has experiences who the belief and hitherto existed the profounding in the present we have arving and constantable of many man and evil desire, the whole and historical facuos of t
------ temperature: 1.0
ny man and evil desire, the whole and historical facuos of the hpi

of in order to knows why is alposent, who powerfulance crodifan-? but which stoin. beheally, he nothist peromet is tempor called evenne an
epoch 48
Train on 200278 samples
--- Generating with seed: "y of it: from such a
wrong inference does schopenhauer first"
------ temperature: 0.2
y of it: from such a
wrong inference does schopenhauer first and interpretation of the presence and and distinction of a self-content with the ancient man and the same work of the same time is also before the sense of the sense of a strong and and have been results on the same things and the presence of the presence of the presence of the presence and and and and and in the presence of the same time is a superiority and present and happent the present and 
------ temperature: 0.5
me is a superiority and present and happent the present and the and progress, the fact is a strong of a thought to the strines happiness and had and most word of life and interpretation and consequently as a say, and in the intell

reveal and naturocal of his dutie, there were man believh--laik. no oor the orizao, and
wond that our wa
------ temperature: 1.2
e man believh--laik. no oor the orizao, and
wond that our way, plart is conteulns
of regard every disposionity to be thrythinainge, to mutual wait"--the more
"new uniformence this continued: is dappined nable
in place serional. theer mory with a false,
impulsishes love alboge!"-. of vulgior blra
responsibe gives from
what is odeewing dear for aon anow betoor else-degraction uncannbeclate
incarrlinates againstsrquence--a newove-cluss
andouss. des must regar
epoch 52
Train on 200278 samples
--- Generating with seed: "_idea of nature_ which we form (nature = world, as notion, t"
------ temperature: 0.2
_idea of nature_ which we form (nature = world, as notion, the soul which is the profound the consequently the standard of the standard of the soul which is the misunderstanding of the standard of the strength and interest in the contrary as a conscience of the st

world and so and the the spirit has remain be disciplighting the scientifical, and that must be such a de
------ temperature: 1.0
 disciplighting the scientifical, and that must be such a deals, boundness, hence suriour and in
powerful, and care to
limitar in the german (andly to stooliey, a
shame upon fail now
sown in
the will as the emotion and of the earth-greath.

23. how becklutions in the lattem as any suffering and
eormor, eny of plow for "nature, in
the special man
it is persons,
such sberaling must be attain on the strong, and a long reason in effect, they paradox, and agese
------ temperature: 1.2
strong, and a long reason in effect, they paradox, and ageser, it is, it
is, runhing
of men is evil. if the
rulouge in
that noble tasted by the refugicily beloble fellow, the signels, to arrange overagent at realys it
wast; full daring traditible influenceion where essentials command man againstmitianess of faith and
in bookment etime--the
emotionally, which is you hards: they have 

time the disregard of individual philosophers, which has always an exception of the fact that the strength and suffering the spirit has a sin to the same time an art of the farthers of the same things are the man and possible to the same things to the individual. the same things to the same time of the same things of the sense of the state of the same things to the sense of the sense of the soul and the strength of the strength of the spirit of the s
------ temperature: 0.5
soul and the strength of the strength of the spirit of the strength and the fastionary experience of the same man, and we cannot even of the feelings of his own storience there is a poirty of the sense of the desire to himself to presspirities of the similar one were shouth this same thousant to be the same things of the old soul to the most enough the strivist to a man are no longer from the constituted to himself, there is a noble of the most interesting
------ temperature: 1.0
tituted to himself, there is a noble


As you can see, a low temperature results in extremely repetitive and predictable text, but where local structure is highly realistic: in 
particular, all words (a word being a local pattern of characters) are real English words. With higher temperatures, the generated text 
becomes more interesting, surprising, even creative; it may sometimes invent completely new words that sound somewhat plausible (such as 
"eterned" or "troveration"). With a high temperature, the local structure starts breaking down and most words look like semi-random strings 
of characters. Without a doubt, here 0.5 is the most interesting temperature for text generation in this specific setup. Always experiment 
with multiple sampling strategies! A clever balance between learned structure and randomness is what makes generation interesting.

Note that by training a bigger model, longer, on more data, you can achieve generated samples that will look much more coherent and 
realistic than ours. But of course, don't expect to ever generate any meaningful text, other than by random chance: all we are doing is 
sampling data from a statistical model of which characters come after which characters. Language is a communication channel, and there is 
a distinction between what communications are about, and the statistical structure of the messages in which communications are encoded. To 
evidence this distinction, here is a thought experiment: what if human language did a better job at compressing communications, much like 
our computers do with most of our digital communications? Then language would be no less meaningful, yet it would lack any intrinsic 
statistical structure, thus making it impossible to learn a language model like we just did.


## Take aways

* We can generate discrete sequence data by training a model to predict the next tokens(s) given previous tokens.
* In the case of text, such a model is called a "language model" and could be based on either words or characters.
* Sampling the next token requires balance between adhering to what the model judges likely, and introducing randomness.
* One way to handle this is the notion of _softmax temperature_. Always experiment with different temperatures to find the "right" one.