Let us use LSTM to generate sequence data (text data in our case). We will use stochastic sampling. First we read in the corpus from Nietzsche. 

In [3]:
import keras 
import numpy as np 

path = keras.utils.get_file('nietzsche.txt',origin = 'https://s3.amazonaws.com/text-datasets/nietzsche.txt')

text = open(path).read().lower() 
print(len(text))

Using TensorFlow backend.


Downloading data from https://s3.amazonaws.com/text-datasets/nietzsche.txt
600893


In [4]:
maxlen = 60 
step = 3
sentences = [] 
next_chars = [] 

for i in range(0,len(text)-maxlen,step): 
    sentences.append(text[i:i+maxlen]) 
    next_chars.append(text[i+maxlen])  
print(len(sentences)) 

chars = sorted(list(set(text))) 
print("Unique words = {}".format(len(chars))) 

char_indices = dict((char,chars.index(char)) for char in chars)  

print("vectorization...")

x = np.zeros((len(sentences),maxlen,len(chars)), dtype = np.bool) 
y = np.zeros((len(sentences),len(chars)), dtype = np.bool) 
for i,sentence in enumerate(sentences): 
    for t,char in enumerate(sentence):  
        x[i,t,char_indices[char]] = 1 
    y[i,char_indices[next_chars[i]]] = 1  

200278
Unique words = 57
vectorization...


In [5]:
from keras import layers 

model = keras.models.Sequential() 
model.add(layers.LSTM(128,input_shape=(maxlen,len(chars)))) 
model.add(layers.Dense(len(chars),activation='softmax')) 
model.compile(loss='categorical_crossentropy',optimizer=keras.optimizers.RMSprop(lr=0.01))

In [6]:
# for stochastic sampling 
def sample(preds, temperature = 1.0): 
    preds = np.asarray(preds).astype('float64') 
    preds = np.log(preds)/temperature
    exp_preds = np.exp(preds) 
    preds = exp_preds/np.sum(exp_preds) 
    probas = np.random.multinomial(1,preds,1) 
    return np.argmax(probas) 

We conduct training 

In [8]:
import random 
import sys

random.seed(42) 
start_index = random.randint(0,len(text)-maxlen-1) 
for epoch in range(1,60): 
    print("Epoch",epoch)
    model.fit(x,y,batch_size=128,epochs=1) 
    seed_text = text[start_index:start_index+maxlen] 
    print("seed text: " + seed_text) 
    for temperature in [0.2,0.5,1.0,1.2]: 
        print("temperature ",temperature) 
        generated_text = seed_text 
        sys.stdout.write(generated_text) 
        for i in range(400): 
            sampled = np.zeros((1,maxlen,len(chars))) 
            for t,char in enumerate(generated_text): 
                sampled[0,t,char_indices[char]] = 1 
            preds = model.predict(sampled,verbose=0)[0] 
            next_index = sample(preds,temperature) 
            next_char = chars[next_index] 
            
            generated_text += next_char 
            generated_text = generated_text[1:] 
            
            sys.stdout.write(next_char) 
            sys.stdout.flush() 
        print() 

Epoch 1
Epoch 1/1
seed text: the slowly ascending ranks and classes, in which,
through fo
temperature  0.2
the slowly ascending ranks and classes, in which,
through for the sentiment and the sensess of the sense of the sense of the same the sense of the sense of the conderting the conderting to the the sense of the sense of the still the deepones of the contempting and the sense of the condertance of the sense of the sense of the sense of the superitate sense and the sense of the and sense of the conderting in the sense and sense in the condecting the sense of 
temperature  0.5
the slowly ascending ranks and classes, in which,
through for as a distrustard so interpont the condecting which has his sone its of the sense of the that the deverourded soul and the sense, the convince and sense and in it is at the sempling in the soursesting and forment and forment in a mankertan the means and reated the his earth and even it is the intermant to the discorient of the the still in the selfreat

  after removing the cwd from sys.path.


mently thinquman
shoulr; dvice the physsed of the incjurs moral ofd a delling that efulsings. wisning. in's
affining nawness for
who, no mustidered to
us abavin, regard it!

    lest a, the hesure of domabition of life, of ecervance is rightuverly pow
sidful own immory ords
doctourm. ever detshord moved more, for is and immoralis
Epoch 20
Epoch 1/1
seed text: the slowly ascending ranks and classes, in which,
through fo
temperature  0.2
the slowly ascending ranks and classes, in which,
through for the sense of the most self-phicant in the sense of the senses of the superficial self-condition of the contrary of the sense of the spirit of the senses of the sense of the strength same that the sense of the person and the sense of the same that the spirit and the sense of the sense of the desires and in the sense of the past of the desires and the sense of the sense of the senses of the senses
temperature  0.5
the slowly ascending ranks and classes, in which,
through for the most hard and ar

As we can observer from the texts being generated, we see that at lower temperatures, we obtain texts that are rather predictable, but the overall meaning of the text is similar to that of the original. One interesting thing to note is that all the words generated under a lower temperature are actual English words. At high temperatures, we get more creative and experimental words (some words that actually do not exist in English) and the overall meaning of the text gets degraded. Perhaps the best temperature for sampling is 0.5 