In order to control the amount of stochasticity in the sampling process, we’ll introduce a parameter called the _softmax temperature_ that characterizes the entropy of the probability distribution used for sampling: it characterizes how surprising or predictable the choice of the next character will be. 

In [1]:
import numpy as np

def reweight_distribution(original_distribution, temperature=0.5):
    distribution = np.log(original_distribution) / temperature
    distribution = np.exp(distribution)
    return distribution / np.sum(distribution)

We will use use some of the writings of Nietzsche (translated into English). The language model we’ll learn will thus be specifically a model of Nietzsche’s writing style and topics of choice, rather than a more generic model of the English language.
Let's first download the corpus and convert it to lower case.

In [2]:
import keras
import numpy as np

path = keras.utils.get_file('nietzsche.txt',
                            origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()
print('Corpus length: ', len(text))

  return f(*args, **kwds)
Using TensorFlow backend.


Corpus length:  600893


To train the model we now need to extract partially overlapping sequences of a specific maximum lenght (`maxlen`). We need to one-hot encode them and then include them into a Numpy array `x` of shape `(sequences, maxlen, unique_characters)`. The corresponding `y` will be an array containing the targets: the one-hot-encoded characters that come after each extracted sequence. 

In [3]:
# extract sequences of 60 characters
maxlen = 60 
# we sample a new sequence every 3 characters
step = 3
# array to hold the extracted sequences
sentences = []
# array to hold the follow-up characters
next_chars = []

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
    
print('Number of sequences: ', len(sentences))

Number of sequences:  200278


In [4]:
# list of unique characters in the corpus (to use as dictionary)
chars = sorted(list(set(text)))
print('Unique characters:', len(chars))

Unique characters: 57


In [5]:
# create dictionary
char_indices = dict((char, chars.index(char)) for char in chars) 
print('Vectorisation...')
# we need to one-hot encode the characters into binary arrays
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Vectorisation...


Let's build the network. This network is a single LSTM layer followed by a Dense classifier and softmax over all possible characters. But note that recurrent neural networks aren’t the only way to do sequence data generation: 1D convnets also have proven extremely successful at this task in recent times.

In [6]:
from keras import layers

model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

# targets are one-hot encoded, so we use categorical crossentropy as loss function
optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

Given a trained model and a seed text snippet, we can generate new text by doing the following repeatedly:
1. Draw from the model a probability distribution for the next character, given the generated text available so far.
2. Reweight the distribution to a certain temperature.
3. Sample the next character at random according to the reweighted distribution.
4. Add the new character at the end of the available text.

The _sampling function_ in this case can be defined as follows.

In [7]:
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

Finally, the following loop repeatedly trains and generates text. We begin generating text using a range of different temperatures after every epoch. This allows us to see how the generated text evolves as the model begins to converge, as well as the impact of temperature in the sampling strategy.

In [8]:
import random
import sys

for epoch in range(1, 5):
    print('epoch', epoch)
    model.fit(x, y, batch_size=128, epochs=1)
    start_index = random.randint(0, len(text) - maxlen -1)
    generated_text = text[start_index: start_index + maxlen]
    print('--- Generating with seed: "' + generated_text + '"')
    
    for temperature in [0.2, 0.5, 1.0, 1.2]:
        print('------ temperature: ', temperature)
        sys.stdout.write(generated_text)
        
        for i in range(200):
            sampled = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(generated_text):
                sampled[0, t, char_indices[char]] = 1.
                
            preds = model.predict(sampled, verbose=0)[0]
            next_index = sample(preds, temperature)
            next_char = chars[next_index]
            
            generated_text += next_char
            generated_text = generated_text[1:]
            
            sys.stdout.write(next_char)

epoch 1
Epoch 1/1
--- Generating with seed: "t, it loves
error, because, as living itself, it loves life!"
------ temperature:  0.2
t, it loves
error, because, as living itself, it loves life! and the such as in the grow the sense and and the serves and the presention of the more and art of the serves the serves and and the serves to have will the more and disting the such and and the mora------ temperature:  0.5
 to have will the more and disting the such and and the moral every the very
free in the into sorves and meticiant of the grown in the serves into the agn the sapser of the seeve of truth posses and the such an into the serfect of the something of the man--in ------ temperature:  1.0
he such an into the serfect of the something of the man--in dight or
mution preparry--as persition ow is an ty
cantery
heads who very does gos sogiantant for timution opperomoy with to
truth to to ammansoutvoliral
deamarifo--the one their prounts is being acro------ temperature:  1.2
ansoutvoliral

KeyboardInterrupt: 