## 使用LSTM生成文本

RNN不是序列数据生成的唯一方法，一维卷积神经网络也可以成功用于序列数据生成。

### 如何生成序列数据

机器学习模型能够对图像、音乐和故事的统计**潜在空间（latent space）**进行学习，然后从这个空间中**采样（sample）**，创造出与模型在训练数据中所见到的艺术作品具有相似特征的新作品。

RNN已被成功应用于音乐生成、对话生成、图像生成、语音合成和分子设计。

用DL生成序列数据的通用方法，就是使用前面的标记作为输入，训练一个网络（通常是RNN或CNN）来预测序列中接下来的一个或多个标记。标记（token）通常是字符或单词，给定前面的标记，能够对下一个标记的概率进行建模的任何网络都叫做**语言模型（language model）**。**语言模型能够捕捉到语言的潜在空间，即语言的统计结构。**

一旦训练好了一个语言模型，就可以从中**采样（sample，即生成新序列）**。向模型中输入一个初始文本字符串（即**条件数据（conditional data）**），要求语言模型生成下一个字符或下一个单词（甚至可以同时生成多个标记），然后将生成的输出添加到输入数据中，并多次重复这个过程。这个循环可以生成任意长度的序列，这些序列反映了模型训练数据的结构。

生成文本时，如何选择下一个字符至关重要。**采样策略：**
1. 贪婪采样（greedy sampling），就是始终选择可能性最大的下一个字符。但是会得到重复的、可预测的字符串，看起来不像是连贯的语言。
2. 随机采样（stochastic sampling），在采样过程中引入随机性，即从下一个字符的概率分布中进行采样。在这种情况下，根据模型结果，如果下一个字符是e的概率为0.3，那么有30%的概率选择它。

**从模型的`softmax`输出中进行概率采样是一种巧妙的方法。**甚至可以在某些时候采样到不常见的字符，从而生成看起来更加有趣的句子，而且有时会得到训练数据中没有的、听起来像是真实存在的新单词，从而表现出创造性。但这种方法有一个问题，就是它在采样过程中无法控制随机性的大小。

为了在采样过程中控制随机性的大小，引入`softmax temperature`参数，用以表示采样概率分布的熵，即表示所选择的下一个字符会有多么出人意料或多么可预测。更高的temperature得到的是更大的采样分布，会生成更加出人意料、更加无结构的生成数据，更低的temperature对应更小的随机性，以及更加可预测的生成数据。

### 实现字符级的LSTM文本生成

用一个`LSTM`层，向其输入从文本预料中提取的N个字符组成的字符串，然后训练模型来生成第N+1个字符。模型的输出是对所有可能的字符做`softmax`，得到下一个字符的概率分布。这个LSTM叫做**字符级的神经语言模型（character-level neural language model）**。

本例采用尼采的一些作品，要学习的语言模型将是针对于尼采的写作风格和主题的模型，而不是关于英语的通用模型。

In [1]:
import os
import numpy
import keras

Using TensorFlow backend.


In [2]:
# 下载并解析初始文本数据
path = keras.utils.get_file(
    'nietzsche.txt',
    origin='http://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()

Downloading data from http://s3.amazonaws.com/text-datasets/nietzsche.txt


In [4]:
path

'/home/bingli/.keras/datasets/nietzsche.txt'

In [9]:
type(text)

str

In [6]:
len(text)

600893

In [7]:
text[:100]

'preface\n\n\nsupposing that truth is a woman--what then? is there not ground\nfor suspecting that all ph'

提取长度为`maxlen`的序列（这些序列之间存在部分重叠），对他们进行`one-hot`编码，然后将其打包成形状为`(sequences, maxlen, unique_characters)`的三维NumPy数组。准备一个数组`y`，其中包含对应的目标，即在每一个所提取的序列之后的出现的字符（已进行`one-hot`编码）。

In [12]:
# 将字符序列向量化
maxlen = 60      # 提取60个字符组成的序列
step = 3         # 每3个字符采样一个新序列
sentences = []   # 保存提取的序列
next_chars = []  # 保存目标（即下一个字符）

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])

In [13]:
len(sentences)

200278

In [15]:
# 语料中唯一字符组成的列表
chars = sorted(list(set(text)))
len(chars)

57

In [16]:
chars

['\n',
 ' ',
 '!',
 '"',
 "'",
 '(',
 ')',
 ',',
 '-',
 '.',
 '0',
 '1',
 '2',
 '3',
 '4',
 '5',
 '6',
 '7',
 '8',
 '9',
 ':',
 ';',
 '=',
 '?',
 '[',
 ']',
 '_',
 'a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'j',
 'k',
 'l',
 'm',
 'n',
 'o',
 'p',
 'q',
 'r',
 's',
 't',
 'u',
 'v',
 'w',
 'x',
 'y',
 'z',
 'ä',
 'æ',
 'é',
 'ë']

In [18]:
# dict:将唯一字符映射为它在列表chars中的索引
char_indices = dict((char, chars.index(char)) for char in chars)

In [19]:
char_indices

{'\n': 0,
 ' ': 1,
 '!': 2,
 '"': 3,
 "'": 4,
 '(': 5,
 ')': 6,
 ',': 7,
 '-': 8,
 '.': 9,
 '0': 10,
 '1': 11,
 '2': 12,
 '3': 13,
 '4': 14,
 '5': 15,
 '6': 16,
 '7': 17,
 '8': 18,
 '9': 19,
 ':': 20,
 ';': 21,
 '=': 22,
 '?': 23,
 '[': 24,
 ']': 25,
 '_': 26,
 'a': 27,
 'b': 28,
 'c': 29,
 'd': 30,
 'e': 31,
 'f': 32,
 'g': 33,
 'h': 34,
 'i': 35,
 'j': 36,
 'k': 37,
 'l': 38,
 'm': 39,
 'n': 40,
 'o': 41,
 'p': 42,
 'q': 43,
 'r': 44,
 's': 45,
 't': 46,
 'u': 47,
 'v': 48,
 'w': 49,
 'x': 50,
 'y': 51,
 'z': 52,
 'ä': 53,
 'æ': 54,
 'é': 55,
 'ë': 56}

In [22]:
# 将字符one-hot编码为二进制数组
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)

for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
        y[i, char_indices[next_chars[i]]] = 1

In [23]:
x.shape

(200278, 60, 57)

In [24]:
y.shape

(200278, 57)

这个网络是一个单层`LSTM`，然后是一个`Dense`分类器和对所有可能字符的`softmax`。

In [26]:
# 构建网络
from keras import layers
from keras import models

model = models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

# targets是经过ont-hot编码的
optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(optimizer=optimizer, loss='categorical_crossentropy')

In [27]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 128)               95232     
_________________________________________________________________
dense_1 (Dense)              (None, 57)                7353      
Total params: 102,585
Trainable params: 102,585
Non-trainable params: 0
_________________________________________________________________


**训练语言模型并从中采样**

给定一个训练好的模型和一个种子文本片段，可以通过重复一下操作来生成新的文本：
1. 给定目前已生成的文本，从模型中得到下一个字符的概率分布
2. 根据某个`temperature`对分布进行重新加权
3. 根据重新加权后的分布对下一个字符进行随机采样
4. 将新字符添加到文本末尾。

In [31]:
# 给定模型预测，采样下一个字符的函数
def sample(preds, temperature=1.0):
    """对模型得到的原始概率分布进行重新加权，并从中抽取一个字符索引"""
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    # sample
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

下面这个循环反复训练并生成文本。在每轮过后都使用一系列不同的`temperature`来生成文本。这样可以看到，随着模型收敛，生成的文本如何变化，以及`temperature`对采样策略的影响。

In [35]:
# 文本生成循环
import sys
import random

for epoch in range(1, 60):
    print('epoch', epoch)
    model.fit(x, y, batch_size=128, epochs=1)  # 模型在数据集上拟合一次
    # 随机选择一个文本种子
    start_index = random.randint(0, len(text) - maxlen - 1)
    generated_text = text[start_index: start_index + maxlen]
    print('---Generating with seed: "' + generated_text + '"')
    for temperature in [0.2, 0.5, 1., 1.2]:
        print('------temperature:', temperature)
        sys.stdout.write(generated_text)
        
        # 从种子文本开始，生成400个字符
        for i in range(400):
            # 对目前生成的字符进行one-hot编码
            sampled = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(generated_text):
                sampled[0, t, char_indices[char]] = 1.
            
            preds = model.predict(sampled, verbose=0)[0]
            next_index = sample(preds, temperature)
            next_char = chars[next_index]
            
            generated_text += next_char
            generated_text = generated_text[1:]
            
            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

epoch 1
Epoch 1/1
---Generating with seed: " by their analysis and vivisection, which he
recommended so "
------temperature: 0.2
 by their analysis and vivisection, which he
recommended so the conscience of the religious and life to the same thing of the source of the same thing of the same thing of the superiorical and world of the most and the same thing of the science of the same thing of the same as a standing and the present the subjuction of the same thing of the same assention of the present the art of the same assention of the same time to the same thing of the science of th
------temperature: 0.5
tion of the same time to the same thing of the science of the conscience of this free present extrarpence of the frage of all the causance it have a deristing whre was a precisely still the externalists and philosophy of a cource of a science of superiors of the cases of society and suffering of being that the end of stronger and the depreces a things it is domain of morality of the o



ry sord, plesthing-flight-our of the does and cruelty
acound recognizeful
explains to learned most and life for the fastion of exiltiblicalsy b appeet to all can so or thoughlid that the dee with. never say darg time and quality thus of semulist" and make that fass orly cryinesa! which everyted the helpless and dight, its, fa
epoch 5
Epoch 1/1
---Generating with seed: "tion of enjoyments, according to which an inferior, higher,
"
------temperature: 0.2
tion of enjoyments, according to which an inferior, higher,
and the present and consequence of the most and the way and the man of the superiority of the superiority of the entire and sense of the senses and the sense of the present and the conscience of the same things of the sensition of the present and the comparise of the superiority of the same as the faith of the sensition of the most man of the consequence of the entireness of the same things of the
------temperature: 0.5
 the consequence of the entireness of the same things of th

would bedratine!


  . teach is has in
know it is plu, the seeming strobedias.
even, and juch, all germans of object the causa; and as to batter about bestra
------temperature: 1.2
l germans of object the causa; and as to batter about bestraitify memoral different
and detards dackuats, that
is perfect, to
phasis and clanment, no pheyerpmachiale sclent bounder of values are, holdhing error. when "pasche.

in
tuneks judgmint.
whose art
longest the become choises the
runce 
i foor in unlet
correates himself: in
 "himart: he cowles, the a
nery, of their
"ugmortusted ssphetume inconsistening
no
dack a unnotidate develoslen. and prosess re
epoch 9
Epoch 1/1
---Generating with seed: "touchstone with respect to "great" and
"small": perhaps he w"
------temperature: 0.2
touchstone with respect to "great" and
"small": perhaps he was and great the sense of the senses to the sense of the scientific to the sense of the sense of the sense of the sense of the sense of the same time and the contrary th

nd disciplies the senses of the same to the end a man, the delices of the rational,
a lets himself dealts than some eitnecs of which me that which a place,
unconsidertion
of the delusions
of the philosopher. but the superficial certald to
moraling not therefore, the diave-oupwary their controur through oul ahbel ought them for which an exceptions provesateds.
the world! consequences with the line attentions badly religion
force of truth--the sa-posite, ost
------temperature: 1.2
attentions badly religion
force of truth--the sa-posite, ostricad trinchgrased; as gernal waysy?, and according tof, which a point comongs.=--lack it
sort quation.


217

=who to
cutted and predention--greanis; natures. certain ovence greemence (philosopher, but can be ; especially explained until it, thin arily has wornd ourgety because ouchersmaked
attekm as a selfg with
sympathy is firely
not as harder with them--sone, even all
the dangerogd under to lo
epoch 13
Epoch 1/1
---Generating with seed: "s
towards 

ious to the problem of the strong the superious and the sense to have as a man as a consequently serptions of the last more we be prover for the such and the course is inmret of the provirs of the possession of the future--the same things should not been almost the supersionace of the cased and all the history will the promise of the and the
sense of the sense of the higher proporture
to values of the morality; here as one of a sure of the derists and asse
------temperature: 1.0
 the morality; here as one of a sure of the derists and assentation. what? i should granting it--this learned varieve should inderpteme of
a consequently, as likews a
world, every his charm
of the most artances a more has once remains of an evipnes as
notive oppomensed the distimble world arrange, the bolthing elubloon to he properate to but conjurement and common tradbation of the laws make stridged in those a realarion has the entime to discrenting of pa
------temperature: 1.2
ged in those a realarion has the

factors of the soul, and in the power and suffering of the part of the suffering and the power of the same things that the suffering of the suffering and sense of the morality of the suffering and the most and spirit of the sense in the part of the suffering in the same things and more and the suffering of the spirit of the surprefact and sufficiently believe the same thing to the superiority of the suffe
------temperature: 0.5
ently believe the same thing to the superiority of the suffering and all the spirit and soul or to far and even the same men in the most existence of
distrement of the perhaps the only the power--the man and intermitude, the nature and dimeans of the morals, because all the power man has and the interest and sentence of the morality in the sense of the moral and sympathy of the same at a worthing and the north the german condition of a readorated the moral
------temperature: 1.0
and the north the german condition of a readorated the morality.


114

=also the sc

pleathhth by ridicism has diterling just be dunguit. amotimis, in
epoch 24
Epoch 1/1
---Generating with seed: "ngs--which no longer _concern_ him.


5

a step further in r"
------temperature: 0.2
ngs--which no longer _concern_ him.


5

a step further in respection of the sense of the sense of the same thinker of the sense of the sentiment of the sense of the sense of the sense of the sense of the sense of the sense of the sense of the sense of the sense of the most consists the senses of the present allow the sense of the estimulation of the experience of the senses of the sentiment of the superficially sense of the sentiment of the same thing of 
------temperature: 0.5
e superficially sense of the sentiment of the same thing of the that the schopenhauer's power of the experience and freedom of the man is virtues, in the every significance in europe of this thing has the stronger and that the middless and "precisely in the subjuction of the philosophy of this intradical from one is a 

there are sswnequality should formity originivaring appeal is unperjoy,
is an emetom onch as a decirinks.=--their essentially "
bhate difference simple one shall precise to lookjumfies--"thinker and homey. let us can
flattedo
fath revealed jucting to logic instinct, doits chant virtuifice than
will
hero man a ruble reflect dimacr) and boughts it may be doubidity;
and another or to competact"
th
epoch 28
Epoch 1/1
---Generating with seed: "werful, the superior, the
original state founder, who subjug"
------temperature: 0.2
werful, the superior, the
original state founder, who subjugge of the spirit of the spirit of the spirit of the spirit of the spirit of the subjuse of the spirit of the spirit of the spirit of the spirit of the spirit of the spirit and sense of the spirit and strong the spirit and such a problem of the sense of the spirit and senses and strength of the spirit of the stand and sense of the spirit of the spirit of the same time them in the sense in the probl
------tempe

commander, he may be in his determined cane now belongs
he must quitely bothtings if everywhere, as the merely difference shut inunity
rugh is if, it pressions underspe he enwersing
the time he come castem.--that think command patience, 
------temperature: 1.2
sing
the time he come castem.--that think command patience, so.

rék
oriable farces rulement weveloknedh himself the
bruzvfidity
and littleated cashonge chrise ug--altoz surmands! pathnerzesto doing religiouss his insistedness seems, as
gelod and executo than
nejosruptarity and educations!
lon which is you knot thrands,
he is hear re-doesnentod. accumanelg whoe--the only
among revengation nont
witherstand
have not its leavess with the cillatured
imbign lin
epoch 32
Epoch 1/1
---Generating with seed: " to the same phenomena, just
the tyrannically inconsiderate "
------temperature: 0.2
 to the same phenomena, just
the tyrannically inconsiderate the supposing of the supposing and soul and the interpretion of the spirit, and the mora

in the sense of the experiences of the slavent as a shortness of the intellectual self strength grateful, and the ears and the others among the strength and "i
------temperature: 1.0
teful, and the ears and the others among the strength and "i be founder with a means matter it though to profoundly has enough spirit and becomes at the ears as remares its applatable of
dreaged thereway fronh traces of being from the sharlness of cires does humanity from obligation hithhere and has down the explatay strength reeww producted solituage, which self every does two more as together fulthiest, they beings emplay to a radical against atagism of 
------temperature: 1.2
lthiest, they beings emplay to a radical against atagism of ties, at clisely misflicted kance, perhaps abalt: the "free squesteration, optrony heap: "he deceived! and omus. no beast rightly to the weakes? by a coacaus of uniffigfoum womonly: if i hamord deight. to thinhers
hardlyer is first adapted from curse optens now dogmastific

served to the present men of the superiorists of the science with a man with the existence from the reserved of the science and spirits of the most will the demands with regard to many indeed the man who had be an old and not a discoverect to the commanding of the smord in the earth--as feels himself them something and the age of the soul and superiorists of the same aristopals, and he is will them since subtle society, so the reason of the man with the pr
------temperature: 1.0
m since subtle society, so the reason of the man with the presentems.

1uee grest and
ex ahrou".--the becomeant, and also diseloom that this lacking of made goism and inhively now do to long power in geture, regard to inderlaristesable
estables, is the defelly heady religion and moin
and suffer aope, promise themselves spire lave to line, to be gearn may suffice outw
womansk
of all account--vownd of oke aumh themselves to soul who peace of will to do withi
------temperature: 1.2
of oke aumh themselves to soul w

nd most painful in this immense and almost new domain of danger the sense of the fact that the present in the same and the conscience of the action of the superiority of the superiority of the subjection of the sense of the sense of the man and self-classicate and the spiritually and the artists and the success of the art of the fact that the subjection of the same time and the sense of the subjection of the sense of the superiority of the spirit and the f
------temperature: 0.5
tion of the sense of the superiority of the spirit and the fundamental interpretation of a man is allowa, as in the only strong propersician and spirit regard to anything in the scient it is present everything in the fact and new recognism, it is not like the sense of maniforation of the spirit: and this wait and a properting the through the perhaps the precisely the hardenance of the same sacrifice and spark and an accountericular of the only the most und
------temperature: 1.0
ce and spark and an accountericu

differently hencefulness as by therely, at as viminal keep, poot of nature a truth something, breakian thirns and gradfuted their irre-century watn unpervolencly"--man life is not do our developed: later to circeovening, with the strong.--you neighbr of philosophy will noodh on bed
accordange of
exertomed
epoch 47
Epoch 1/1
---Generating with seed: " emotions to an innocent
mean at which they may be satisfied"
------temperature: 0.2
 emotions to an innocent
mean at which they may be satisfied of the experience and the soul of the same that the philosophers of the soul of the superiority of the same soul and will to be still and intermination of the proble the sense of the power of the power of the problem and the soul and all the more and the state and the soul and the one may be been the same as the act on the same all the philosophy and still and self-desire the conception of the ph
------temperature: 0.5
hilosophy and still and self-desire the conception of the philosophy
souls of t

alwer and distance, or at the god". is not themselves the skian logald. butij;ony.--there is influence which do , in under spirit cendefully is inclokes, of "feeling) would let us see findinary, where long and germaning height! or unremata wad thus impusie, the favour" dring the oberade, the faither
domine if i mankind to mistakenin
------temperature: 1.2
g the oberade, the faither
domine if i mankind to mistakening frouca, and europe, action abvetures: age af
inventined. they lookedsuoy, with      or"f-estement, according to supe" but does
lafd indignes, the question". in a womansce, and coffj(j? it regulence and new thinker wat almost is kingem of etching,
it there is interpritable, taken to habit of foreers, "adarate. allyingly metunj(ly feels, error ea enpuccount, is their
wagners pentibigism--the yarbi
epoch 51
Epoch 1/1
---Generating with seed: "ite immaterial; in any event it
had no reference to good and"
------temperature: 0.2
ite immaterial; in any event it
had no reference to

nds of the fact to the stood and the protected to the sense and the senses among the bad and believe of the german worn at once well the most man of reason and attain the lose the distinguished the best, and when i a the higher and comparison in the compares to the value of the sense specers such a woman--the most problem which distore into the present the mimsted the german the community and respect the stound and the sense, the value of the state and man
------temperature: 1.0
ect the stound and the sense, the value of the state and man sign it actual beearly: utmeic, orden godelfomant. inkfuln functions of the would on which in under the ancient higher orgarrable should thinks to a course. the worn only an a state may ashorly iver lest divine stemply and man" was exploition, and perhaps the our prolumists at the last male the farre and to the variation
leavent impossess the presented and extent -every could manifests," man reli
------temperature: 1.2
s the presented and extent -ever

developed to be and so that the strange, and the sense of the sense of the sense of the sense is a sort of the sense of the sense of the sense of the sense of the sentiment of the sense of the sense where is not to be so far as a depths, the same precisely the sense of the same and the soul that the sense of the sense of the sense of the sense of the strength of the same and the stranges and the sentime
------temperature: 0.5
of the strength of the same and the stranges and the sentiment to be so farsto the defined the same closes the will on the soul of the incertaing and more a purition to the deated to culture to the master with this desire to steerness with the feelings of the morality of the worth subtleties, as the religion has consequences and very dechsises and the intellectuality, the sense of the are original, believe in the most stranges to the rest of consciences in
------temperature: 1.0
, believe in the most stranges to the rest of consciences in all the upwatno assubmlis