# LSTM Text Generation

Tutorial from: https://keras.io/examples/generative/lstm_character_level_text_generation/

Generates sentences using an LSTM.
Given a seed phrase, the model samples from the available characters to generate a sentence of set length.

## Basic architecture
Input -> LSTM -> Dense -> Softmax

Input: One-hot encoded sentences

## Softmax temperature
The novelty of the generated sequence can be determined using the *Temperature* of the softmax. When the softmax output is considered as a probability distribution, we can tweek this distribution to make samples more or less suprising.

By reweighting the distribution we can determine how novel the next sequence is.
Low temperature -> More deterministic
High temperature -> More stochastic

In [1]:
from tensorflow import keras
from tensorflow.keras import layers

import numpy as np
import random
import io

In [2]:
# Load data
path = keras.utils.get_file("nietsche.txt", origin="https://s3.amazonaws.com/text-datasets/nietzsche.txt")
with io.open(path, encoding="utf-8") as f:
    text = f.read().lower()
text = text.replace("\n", " ")
print("Corpus length: {}".format(len(text)))

Downloading data from https://s3.amazonaws.com/text-datasets/nietzsche.txt
Corpus length: 600893


In [13]:
max_length = 60
step = 3 # New sequence is sampled after three characters
sentences = []
next_chars = [] # Holds the targets


for i in range(0, len(text) - max_length, step):
    sentences.append(text[i: i + max_length])
    next_chars.append(text[i + max_length])

print("Number of sequences: {}".format(len(sentences)))

chars = sorted(list(set(text)))
print("Total number of characters in corpus: {}".format(len(chars)))

char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))


Number of sequences: 200278
Total number of characters in corpus: 56


In [4]:
sentences[5]

'sing that truth is a woman--what then? is there not ground f'

In [5]:
# Vectorization

# One-hot encode characters into a 3D array.
# Dim 1: Sentence number i
# Dim 2: Character t from the i-th sentence
# Dim 3: One-hot 1D array which encodes the t-th character of the i-th sentence. Length: len(chars)
x = np.zeros((len(sentences), max_length, len(chars)), dtype=bool)
y = np.zeros((len(sentences), len(chars)),dtype=bool)

for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

In [7]:
# Build model
model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(max_length, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

optimizer = keras.optimizers.RMSprop(learning_rate=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

In [14]:
# Sample from the text
random.seed(1843)

def sample(preds, temperature=1.0):
  preds = np.asarray(preds).astype("float64")
  preds = np.log(preds) / temperature
  exp_preds = np.exp(preds)
  preds = exp_preds / np.sum(exp_preds) # So that they sum up to 1
  probas = np.random.multinomial(1, preds, 1) # Draw 1 sample from the preds
  return np.argmax(probas)

In [None]:
# Train the model
epochs = 60
batch_size = 128

for epoch in range(epochs):
  model.fit(x,y, batch_size=batch_size, epochs=1)
  print()
  print("Generating text after epoch: %d" %epoch)
  start_index = random.randint(0, len(text) - max_length - 1)
  
  for diversity in [0.2, 0.5, 1.0, 1.2]:
    print("...Diversity:", diversity)

    generated = ""
    sentence = text[start_index : start_index + max_length]  
    print("...Generating with seed: " + sentence)

    for i in range(400):
      x_pred = np.zeros((1, max_length, len(chars)))
      for t, char in enumerate(sentence):
        x_pred[0, t, char_indices[char]] = 1.0
      preds = model.predict(x_pred, verbose=0)[0]
      next_index = sample(preds, diversity)
      next_char = indices_char[next_index]
      sentence = sentence[1:] + next_char
      generated += next_char

    print("...Generated: ", generated)
    print()


Generating text after epoch: 0
...Diversity: 0.2
...Generating with seed: bly it even flattered the heart and taste of kant: let us ca
...Generated:  se of the best and the sense of the seems to the seems to the servical still the seems of the seems to the sense of the seems to the seems to be the seems to be the seems to the seems to the seems of the seems to the consequence of the seems to be the strong and and the seems of the seems to the such a such a such a comparestorow and the seems to the seems to be the sense of the seems to his still

...Diversity: 0.5
...Generating with seed: bly it even flattered the heart and taste of kant: let us ca
...Generated:  n be the such a rackure us an accurage is only his present of the case of the example of the sense of the way and in its ventures and considered also be have its restruct in an accession of comparestian that any highest standard and the extred to the seems of the profound, and being of extent of the conscience of the beauty in

  


...Generated:  lable judge in moral sense of the conditions and first man shalt and also it because the command a stand in the bad and specule that it would be same art of the characterister is also the sense of the profounder and the will be spose of the suffering of the spirits of the same position of life, and for "every degree of moral for the own believe would be some our entarilates to believe that it had 

...Diversity: 1.0
...Generating with seed: ovels--moreover, from life: buona femmina e mala femmina vuo
...Generated:  lable numrorigred his prission paculus--he like corducional furnish of an amought in his talredations sacrifying upon itself: no listerous and perviltiagers. allowed to precisely to feeling jow asceticir from himself simple condoush.   16  =not to "finger that alsogarity now doubt romain roffect casing that it were so truelatest happiness, to better away witherich, place worst. this ame democratio

...Diversity: 1.2
...Generating with seed: ovels--moreover, fr