<h1>Generative LSTM on Nietzsche</h1>
<h3>By Joseph J. Bautista</h3>
<p>I ran this notebook in Paperspace by using their cloud-based GPU. Get started and get $10 freebie by clicking <a href="https://www.paperspace.com/&R=W949K8P">here</a>. Here, a character-level LSTM model was trained on Nietzsche's writings. The model was only trained with 20 epochs. </p>

In [2]:
import random
import sys

import numpy as np
import matplotlib.pyplot as plt

from keras.preprocessing import sequence
from keras.utils import get_file
from keras.models import Sequential
from keras.layers import Dense, LSTM
from keras.models import model_from_json
from keras.callbacks import Callback

plt.style.use('fivethirtyeight')

In [3]:
def sample(preds, temp=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temp
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

In [4]:
path = get_file("nietzsche.txt", origin="https://s3.amazonaws.com/text-datasets/nietzsche.txt")
text = open(path, encoding='utf-8').read().lower()
print("Corpus length: {}".format(len(text)))

Corpus length: 600893


In [5]:
maxlen = 60
step = 3

sentences = []
next_chars = []
print("Creating sentence and next_chars arrays...")
for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print("Number of sequences: {}".format(len(sentences)))

chars = sorted(list(set(text)))
print("Number of unique characters: {}".format(len(chars)))
print("Creating sentence and next_chars arrays...\n")

char_indices = dict((char, chars.index(char)) for char in chars)

print("Vectorization...")
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1
print("Finished.")

# x.shape = (len(sentences), maxlen, len(chars))

Creating sentence and next_chars arrays...
Number of sequences: 200278
Number of unique characters: 57
Creating sentence and next_chars arrays...

Vectorization...
Finished.


In [6]:
model = Sequential()
model.add(LSTM(128, input_shape=(maxlen, len(chars))))
model.add(Dense(len(chars), activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam',)

In [9]:
class GenerateText(Callback):
    def on_epoch_end(self, epoch, logs={}):
        if epoch % 4 == 0:
            generated_text = "old scandinavian saga: it is thus rightly expressed from the"
            print("--- Generating with seed: '" + generated_text + "'")
            for i in range(400):
                sampled = np.zeros((1, maxlen, len(chars)))
                for t, char in enumerate(generated_text):
                    sampled[0, t, char_indices[char]] = 1

                preds = model.predict(sampled, verbose=0)[0]
                next_index = sample(preds, 0.5)
                next_char = chars[next_index]

                sys.stdout.write(next_char)
        print("")

In [10]:
generate_text = GenerateText()
model.fit(x, y, batch_size=128, epochs=20, callbacks=[generate_text])

Epoch 1/20
--- Generating with seed: 'old scandinavian saga: it is thus rightly expressed from the'
 everimatien not deatidity and the more of the suth unding and the somed the callong that the samuth and but the store that that the sill of this store mand conjution and soult and the sall the evest and reance of the caltion of the sangher as of the mure and there far atting not the gore of
contrenal and monality of the comalie. in most proust be the ract pichire couther the there hat singer bela
Epoch 2/20

Epoch 3/20

Epoch 4/20

Epoch 5/20
--- Generating with seed: 'old scandinavian saga: it is thus rightly expressed from the'
 man a word in inteasty and may pass of the self, what the fact, and every intally and
deente and destine and such a man extent and connection of the signest and
sone the pristion of the strenct of the suppession of the reluge and sometters there of the self and even the sensed in the spirit of the most to present of the lay an any connece of the says and for t

<keras.callbacks.History at 0x7f25d817cda0>

In [15]:
model_json = model.to_json()
with open("model.json", "w") as json_file:
    json_file.write(model_json)

In [16]:
model.save_weights("model.h5")
print("Saved model to disk")

Saved model to disk
