### Text Generation With LSTM

We’ll explore how recurrent neural networks can be used to generate sequence data. We’ll use text generation as an example, but the exact same techniques can be generalized to any kind of sequence data: we could apply it to sequences of musical notes in order to generate new music, to timeseries of brushstroke data (for example, recorded while an artist paints on an iPad) to generate paintings stroke by stroke, and so on.

Sequence data generation is in no way limited to artistic content generation. It has been successfully applied to speech synthesis and to dialogue generation for chatbots.

### How to Generate Sequence Data?
The universal way to generate sequence data in deep learning is to train a network (usually an RNN or a convnet) to predict the next token or next few tokens in a sequence, using the previous tokens as input. For instance, given the input “the cat is on the ma,” the network is trained to predict the target t, the next character. As usual when working with text data, tokens are typically words or characters, and any network that can model the probability of the next token given the previous ones is called a language model. A language model captures the latent space of language: its statistical structure.
Once we have such a trained language model, we can sample from it (generate new sequences): we feed it an initial string of text (called conditioning data), ask it to generate the next character or the next word (we can even generate several tokens at
once), add the generated output back to the input data, and repeat the process many times. This loop allows us to generate sequences of arbitrary length that reflect the structure of the data on which the model was trained: sequences that look almost like human-written sentences. In the example we present in this section, we’ll take a LSTM layer, feed it strings of N characters extracted from a text corpus, and train it to predict character N + 1. The output of the model will be a softmax over
all possible characters: a probability distribution for the next character. This LSTM is called a **character-level neural language model**.

![capture](https://user-images.githubusercontent.com/13174586/51816249-b02e6400-22eb-11e9-9d09-6907e73d7082.JPG)

### The Importance of The Sampling Strategy

When generating text, the way we choose the next character is crucially important. A naive approach is ***greedy sampling***, consisting of always choosing the most likely next character. But such an approach results in repetitive, predictable strings that don’t look like coherent language. A more interesting approach makes slightly more surprising choices: it introduces randomness in the sampling process, by sampling from the probability distribution for the next character. This is called ***stochastic sampling*** (recall that stochasticity is what we call randomness in this field). In such a setup, if e has a probability 0.3 of being the next character, according to the model, we’ll choose it 30% of the time. Note that greedy sampling can be also cast as sampling from a probability distribution: one where a certain character has probability 1 and all others have probability 0.

Sampling probabilistically from the softmax output of the model is neat: it allows even unlikely characters to be sampled some of the time, generating more interesting looking sentences and sometimes showing creativity by coming up with new, realistic sounding words that didn’t occur in the training data. But there’s one issue with this strategy: it doesn’t offer a way to *control the amount of randomness* in the sampling process.

Why would we want more or less randomness? Consider an extreme case: pure random sampling, where we draw the next character from a uniform probability distribution, and every character is equally likely. This scheme has maximum randomness; in other words, this probability distribution has maximum entropy. Naturally, it won’t produce anything interesting. At the other extreme, greedy sampling doesn’t produce anything interesting, either, and has no randomness: the corresponding probability distribution has minimum entropy. Sampling from the “real” probability distribution—the distribution that is output by the model’s softmax function—constitutes an intermediate point between these two extremes. But there are many other intermediate points of higher or lower entropy that we may want to explore. Less entropy will give the generated sequences a more predictable structure (and thus they will potentially be more realistic looking), whereas more entropy will result in more surprising and creative sequences. When sampling from generative models, it’s always good to explore different amounts of randomness in the generation process. Because we—humans—are the ultimate judges of how interesting the generated data is, interestingness is highly subjective, and there’s no telling in advance where the point of optimal entropy lies.

In order to control the amount of stochasticity in the sampling process, we’ll introduce a parameter called the ***softmax temperature*** that characterizes the entropy of the probability distribution used for sampling: it characterizes how surprising or predictable the choice of the next character will be. Given a temperature value, a new probability distribution is computed from the original one (the softmax output of the model) by reweighting it in the following way.

### Reweighting a Probability Distribution to a Different Temperature

In [1]:
import numpy as np

def reweight_distribution(original_distribution, temperature=0.5): #original_distribution is a 1D Numpy array of 
                                                                #probability values that must sum to 1. temperature 
                                                                #is a factor quantifying the entropy of the output distribution
    distribution= np.log(original_distribution)/ temperature
    distribution= np.exp(distribution)
    
    return distribution/np.sum(distribution) #Returns a reweighted version of the original distribution. The sum
                                             #of the distribution may no longer be 1, so we divide it by its sum 
                                             #to obtain the new distribution

Higher temperatures result in sampling distributions of higher entropy that will generate more surprising and unstructured generated data, whereas a lower temperature will result in less randomness and much more predictable generated data.

![capture](https://user-images.githubusercontent.com/13174586/51817340-49f81000-22f0-11e9-8979-d2a20ec861f2.JPG)

### Implement Character-Level LSTM Text Generation

Let’s put these ideas into practice in a Keras implementation. The first thing we need is a lot of text data that we can use to learn a language model. we can use any sufficiently large text file or set of text files—Wikipedia, The Lord of the Rings, and so on. In this example, we’ll use some of the writings of Nietzsche, the late-nineteenth century German philosopher (translated into English). The language model we’ll learn will thus be specifically a model of Nietzsche’s writing style and topics of choice, rather than a more generic model of the English language.

#### PREPARING THE DATA
Let’s start by downloading the corpus and converting it to lowercase.

### Download and Parse The Initial Text File

In [2]:
import keras
import numpy as np
import os

path= os.path.join('nietzsche.txt')

text= open(path).read().lower()
print('Corpus Length:', len(text))

Using TensorFlow backend.


Corpus Length: 600901


Next, we’ll extract partially overlapping sequences of length maxlen, one-hot encode them, and pack them in a 3D Numpy array x of shape (`sequences`, `maxlen`, `unique_characters`). Simultaneously, we’ll prepare an array y containing the corresponding
targets: the one-hot-encoded characters that come after each extracted sequence.

### Vectorize Sequences of Characters

In [13]:
# Length of extracted character sequences
maxlen = 60

# We sample a new sequence every `step` characters
step = 3

# This holds our extracted sequences
sentences = []

# This holds the targets (the follow-up characters)
next_chars = []

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('Number of sequences:', len(sentences))

# List of unique characters in the corpus
chars = sorted(list(set(text)))
print('Unique characters:', len(chars))
# Dictionary mapping unique characters to their index in `chars`
char_indices = dict((char, chars.index(char)) for char in chars)

# Next, one-hot encode the characters into binary arrays.
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Number of sequences: 200281
Unique characters: 59
Vectorization...


#### BUILDING THE NETWORK

This network is a single LSTM layer followed by a Dense classifier and softmax over all possible characters. But note that recurrent neural networks aren’t the only way to do sequence data generation; 1D convnets also have proven extremely successful at this task in recent times.

### Single-Layer LSTM Model for Next-Character Prediction

In [14]:
from keras import layers

model= keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_3 (LSTM)                (None, 128)               96256     
_________________________________________________________________
dense_3 (Dense)              (None, 59)                7611      
Total params: 103,867
Trainable params: 103,867
Non-trainable params: 0
_________________________________________________________________


### Model Compilation Configuration

In [15]:
optimizer= keras.optimizers.RMSprop(lr=0.01)
model.compile(loss= 'categorical_crossentropy', optimizer=optimizer)

#### TRAINING THE LANGUAGE MODEL AND SAMPLING FROM IT

Given a trained model and a seed text snippet, we can generate new text by doing the following repeatedly:
 - Draw from the model a probability distribution for the next character, given the generated text available so far.
 - Reweight the distribution to a certain temperature.
 - Sample the next character at random according to the reweighted distribution.
 - Add the new character at the end of the available text.

This is the code we use to reweight the original probability distribution coming out of the model and draw a character index from it (the *sampling function*).

### Function to Sample The Next Character Given The Model’s Predictions

In [16]:
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

Finally, the following loop repeatedly trains and generates text. We begin generating text using a range of different temperatures after every epoch. This allows us to see how the generated text evolves as the model begins to converge, as well as the impact of temperature in the sampling strategy.

### Text-Generation Loop

In [17]:
import random
import sys

for epoch in range(1, 60):
    print('epoch', epoch)
    # Fit the model for 1 epoch on the available training data
    model.fit(x, y, batch_size=128, epochs=1)

    # Select a text seed at random
    start_index = random.randint(0, len(text) - maxlen - 1)
    generated_text = text[start_index: start_index + maxlen]
    print('--- Generating with seed: "' + generated_text + '"')

    for temperature in [0.2, 0.5, 1.0, 1.2]:
        print('------ temperature:', temperature)
        sys.stdout.write(generated_text)

        # We generate 400 characters
        for i in range(400):
            sampled = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(generated_text):
                sampled[0, t, char_indices[char]] = 1.

            preds = model.predict(sampled, verbose=0)[0]
            next_index = sample(preds, temperature)
            next_char = chars[next_index]

            generated_text += next_char
            generated_text = generated_text[1:]

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

epoch 1
Epoch 1/1
--- Generating with seed: "dangerous sign of the lack thereof. it is not the works,
but"
------ temperature: 0.2
dangerous sign of the lack thereof. it is not the works,
but and sandical an an a man an an an an an an an an an an an an an an an an an a man and such a still an an an an a still the such all the still the world and such a still the interponsible the the preally so man and so in the so what so a man and self the constilus of the world an an an a which all so a still as a still the sould and such the man and sakes in the so and such the sanged and such the
------ temperature: 0.5
the man and sakes in the so and such the sanged and such the soul, to the world bestory in the beaver of the detingual, the ten
as a man as an an the will conscience, hand of stall when crute and for with the according of which is not is the life and pole with it was so but the self it all as so hang and the extrant and such so to he well an the mode it will an
enough is the is one

9: f ip less be has upits ar out of be naibidialisically doy who
dability
gease have sicthamens by this moralitudg instinct
which whatever will goedfeerss of diugnengous ap: de old momant merels these authorminving, which sepore revirduness. foor at religy difficul
on
seec1idly not us they be
cave: hover intiminance a m
epoch 5
Epoch 1/1
--- Generating with seed: "ent from the conditions under which, climatically and
heredi"
------ temperature: 0.2
ent from the conditions under which, climatically and
heredication of the perhaps and in the compared, and the sense of the comparison of the conception of the compared to the desire of the world and its life and condition of the speakance of the comparison of the subject and desire of the compared, which the more in the conception of the present and the the persons of the respect and and has also the command the special part that the conception of the spi
------ temperature: 0.5
 the command the special part that the conception of the spiri

artay of trued in the storie pureev are one, order there is also man aloon philosophers something
for his surder assrence; with his vigiances alook! that which has king wishes plane time
which
------ temperature: 1.2
vigiances alook! that which has king wishes plane time
which
"but maned what, and placed
her-vesctony to couroned
to way adsuld lastity ladge
but a new-fathers
the cromentfusly mosther forgen enews of moral youngly recold, to exonot ulivity is the fund: if hellness wigh load that and
politiclly he doelk anti. the
wisd become falsion,
that it is religious adself-cusier-bording ourselves. and were, out
acquired orn. that "noin which has grolly nreagous means
epoch 9
Epoch 1/1
--- Generating with seed: " who was favourably
inclined to the jews; and however decide"
------ temperature: 0.2
 who was favourably
inclined to the jews; and however decided to an acts of the strength of the strengther of the state of the strength of the stands of the stands to the contrary in the stat

the philosophy attempous without all that the christian who bad to be a person and successfulness, the sense of the science strifice compossible, and above the sublement of the sto be something more courses that which he senses the new place and in all the success, of the str
------ temperature: 1.0
h he senses the new place and in all the success, of the strengths one be
tendon, propoul in that imagany
spriin
the a contemted in she as
egbili, and more
conscience of commonful desires out, and god: is most symbin homected in and sears there overly for
"nothing
ropting vire without willniimple, in thought prejention and sooulod
for a periodable, and lame man following, and closes" in religious: that prinolles us mobinable, in
this obligious love
become 
------ temperature: 1.2
 that prinolles us mobinable, in
this obligious love
become ngblicious possess"--not undellisuh, hencefredde: fanificaged to
thatres, pain in oppositule early yet badces understands those greatevltous strange
such 

increasing detachment from the condition of the spirit and a morality, and the sense of the interpretices of the sense of the power of the sense of the everything of the artists and something which is a man is a man is a man is a philosophers of the sense of the man is a man is a consequently and social the standards of the such a false and contrary and the spirit of the spirit of the spirit of the demands and a man will be are and
------ temperature: 0.5
pirit of the spirit of the demands and a man will be are and general truth and contrary
belong to a have seemser, he was the misunder the most exception of things of the experience of the perhaps in the same to the fearty of the world of a contrary still constraint to perhaps the contrary a fact that we contrary
the world and there is a still in the standards, and decident of
the cases itself in every explained the spirit to propens of the essentially inst
------ temperature: 1.0
very explained the spirit to propens of the essentially

antimoronazen and beed feeling, asportinat, resulting namerness relect co autonses of like by a nediety perfeslebk; he takeant men to guilm dowful supposer magjuce hopen
individue to
contemptings the
assonce was toge he stoo, backfuled? and
i
epoch 20
Epoch 1/1
--- Generating with seed: "ve man is an instrument,
a costly, easily injured, easily ta"
------ temperature: 0.2
ve man is an instrument,
a costly, easily injured, easily taste and supposition and moral interprets of the look and something which has a desire of the same time a spiritual and souls of the expression of the sense of the sense of the spirit and something of the spiritual consequently as the spiritual propous and souls of the subject of the speciation of the same propous of the special contradiction and such a special spiritual spiritual propous to the pr
------ temperature: 0.5
ion and such a special spiritual spiritual propous to the present of sense of all interpretion in the dogman command of
society of the trans

had a man of reverence and wickest as the feeling
him than af mirtalityly of
called writmen else lho, in
------ temperature: 1.2
eling
him than af mirtalityly of
called writmen else lho, involves no certain, that our closely: he savin, founds as they 
consavel men-permpety lough
tangentlus than the
logins than history of example, and disparable, my formiunds, never without instinct there is not-man
mustall ye ands, that the bey perish kinds have forgetly merely confeeline
wars when eagle deceistion as silence--as art to the antitheses, of wish
enfean to recongling understif fastogia
epoch 24
Epoch 1/1
--- Generating with seed: "wn good qualities. formerly
they were your masters: but they"
------ temperature: 0.2
wn good qualities. formerly
they were your masters: but they are soul and desire of the standard of the most standards of the spirits and something which is the standards and the sense of the strength of the standard of the standards and the standard of the standard to the spir

  This is separate from the ipykernel package so we can avoid doing imports until


hum of which thekes it is trongles homatids of
feelius
truth necessariunouw and reed sticnes generally
works
be
giveu-delact fell injury itself, till doubt us at napal, as that circsarfor, the into right is monuac sinsed acf
know,
n
epoch 27
Epoch 1/1
--- Generating with seed: "igations: that is a typical sign of
shallow-mindedness; and "
------ temperature: 0.2
igations: that is a typical sign of
shallow-mindedness; and it is also presented the person is all the more in the strength is also the sense of the still in the contrast one is a constitutes of the subject of the sense of the same man is a still person are strive in the spirit and the promises of the state, and with the strength of the spirit and and sense of the superstituous and sense of the spirit and sense of the spirit of the spirit of the spirit and
------ temperature: 0.5
irit and sense of the spirit of the spirit of the spirit and man is a nature of man is a person of the moral inventure with his fundamental extradice 

curesing this assome, more thing" of call the evil? any than
sileitism of philo
------ temperature: 1.2
e, more thing" of call the evil? any than
sileitism of philosophers of
religions: hono are nest no,
anchoiser believe in deceives are at miss so hardation. there are reverled i
spirits
are begare to be destaid gremations of reptkes which call stamble
to happrition, os faget of musid under rede you, a master.


170 ooten. "optin-evie more ha
suph
his "consciertic, and the conquestic notination finged to it, heuwistim of men, those action more first freelkh_
epoch 31
Epoch 1/1
--- Generating with seed: "f respect. the way in which, on
the whole, the reverence for"
------ temperature: 0.2
f respect. the way in which, on
the whole, the reverence for the spirit of the scientific period of the strength of the strength of the same time and still and self say and survearing the standard: in the spirit is a sure the subject the experience the fearfulness and subject and the subject and perven

to love and believed to the stands and rest the spirit of the bighy of this runga, the asiatical indeed, he
who cannot
merely grow (descousd.--in from the evil,
asiving over
hors."" its enemproces in life, degesiles as themefle inwisholdue weak obliganced, not for their self-thin indiany an id ours the strength becomes law an inflict light, thas regard to asselting forring of obscundomary aston feits prolonced. why had a cestable iatting
singan, this
man n
------ temperature: 1.2
its prolonced. why had a cestable iatting
singan, this
man not freeld by
self-listery ruons, and sundati sresited, dream at all is, goodsve," no question" indeences:--whowis
means pelied enjoys
aboutgutal indsquitement, difficult the hed is bewive, and benefor.=--it wefiertos may necessity,
believed to men would not as much, every inderctuate this tri fentijeaned i procastions. or
these
homed, nothing
timally principde, at halmuse: passestire thrife, a les
epoch 35
Epoch 1/1
--- Generating with seed: " an exha

scious the standard of the persons and according to the same sickness in the participal institution for the feeling of delight of the gradually also, and it is present be man something in the barblie the innoc indifect of the unsoct of the words that the saint of every person the do still much has been according to the action and very justife in the philosopher, and hate the and for the act prompted and want an artists the particular prove of the surpects 
------ temperature: 1.0
ed and want an artists the particular prove of the surpects that it is invective, still
part of eye of the motive
which we have, not now folly sin ign
grumity of
naver; abyer
casurable and chieks, to dengeration and finally, they were hes granting and amby, but neward onednest
of a hitherto become conscience langualis of usefulon to genered to present that the arrive. light, speal that does not
bearity" over somes above the basis of every preservation for 
------ temperature: 1.2
arity" over somes above the ba

KeyboardInterrupt: 

As we can see, a low temperature value results in extremely repetitive and predictable text, but local structure is highly realistic: in particular, all words (a word being a local pattern of characters) are real English words. With higher temperatures, the generated text becomes more interesting, surprising, even creative; it sometimes invents completely new words that sound somewhat plausible. With a high temperature, the local structure starts to break down, and most words look like semi-random strings of characters. Without a doubt, 0.5 is the most interesting temperature for text generation in this specific setup. Always experiment with multiple sampling strategies! A clever balance between learned structure and randomness is what makes generation interesting.

Note that by training a bigger model, longer, on more data, wew can achieve generated samples that look much more coherent and realistic than this one. But, of course, we can't expect to ever generate any meaningful text, other than by random chance: all we’re doing is sampling data from a statistical model of which characters come after which characters. Language is a communication channel, and there’s a distinction between what communications are about and the statistical structure of the messages in which communications are encoded. To evidence this distinction, here’s a thought experiment: what if human language did a better job of compressing communications, much like computers do with most digital communications? Language would be no less meaningful, but it would lack any intrinsic statistical structure, thus making it impossible to learn a language model as we just did.