<a href="https://colab.research.google.com/github/ernestomancebo/DeepLearningInPractice/blob/main/generative_dl/text_generation_lstm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Text Generation with LSTM

In this notebook we will feed into a LSTM network a corpus form Nietzche, later we'll provide a snippet that will kick the LSTM to generate what should be the next sequences, feed that output back in for several iterations and see what's the result.

Getting the Corpus.

In [1]:
import numpy as np
from keras import utils

path = utils.get_file('nietzsche.txt', origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()

print(f'Corpust length: {len(text)}')

Downloading data from https://s3.amazonaws.com/text-datasets/nietzsche.txt
Corpust length: 600893


Now we vectorize the sequence of characters (corpus).

The `sentences` and `next_chars` will act as a `n gram`.

In [2]:
maxlen = 60
step = 3

sentences = []
next_chars = []

# From the first element until the i-est elemnt - maxlen.
# maxlen is reserved for the next_chars
for i in range(0, len(text) - maxlen, step):
  sentences.append(text[i : i + maxlen])
  next_chars.append(text[i + maxlen])

print(f'Number of sequences: {len(sentences)}')

chars = sorted(list(set(text)))
print(f'Unique chars count: {len(chars)}')

# This is a char-index-map
char_indices = dict(
    (c, chars.index(c)) for c in chars
)

Number of sequences: 200278
Unique chars count: 57


##Vectorization

Here we vectorize the sequences, so the LSTM network can digest it.


In [3]:
# The sequences acts as samples. The Vocabulary size
# are the possibles elements to appear (target).
# maxlen is the dimention (padded) of each sequence 
sequences_count = len(sentences)
vocabulary_size = len(chars)

x = np.zeros((sequences_count, maxlen, vocabulary_size), dtype=np.bool)
y = np.zeros((sequences_count, vocabulary_size), dtype=np.bool)

for i, sentence in enumerate(sentences):
  for j, c in enumerate(sentence):
    # This builds a one-hot-like tensor with dimenssion:
    # instances x sequences x vocabulary, which
    # vocaublary is the  encoded value.
    x[i, j, char_indices[c]] = 1
  y[i, char_indices[next_chars[i]]] = 1

##Building the LSTM model

In [4]:
from keras import layers
from keras.models import Sequential
from keras.optimizers import RMSprop

model = Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, vocabulary_size)))
model.add(layers.Dense(vocabulary_size, activation='softmax'))

model.summary()

model.compile(loss='categorical_crossentropy', optimizer=RMSprop(lr=0.01))

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm (LSTM)                  (None, 128)               95232     
_________________________________________________________________
dense (Dense)                (None, 57)                7353      
Total params: 102,585
Trainable params: 102,585
Non-trainable params: 0
_________________________________________________________________


###Sampling Function

The next function will do some adjustment in the prediction probabilities by the model and throw a randomnized potential output.

In [5]:
def sample(predictions, temperature=1.0):
  preds = np.asarray(predictions).astype('float64')
  preds = np.log(preds) / temperature

  exp_preds = np.exp(preds)
  preds = exp_preds / np.sum(exp_preds)
  probas = np.random.multinomial(1, preds, 1)

  return np.argmax(probas)

###Text Generation Loop

This loop trains (epoch) and generates text. The text is generated in a range of distinct temperature at each epoch. This allow us to see when the model starts to converge and also the incidence of the temperature in the model.

In [7]:
import random
import sys

# This is to make prints cute.
padder = '-' * 5

for epoch in range(1, 60):
  print(f'\n\nEpoch: {epoch}')

  model.fit(x, y, batch_size=128, epochs=1)
  # Pick a random index
  start_index = random.randint(0, len(text) - maxlen - 1)
  generated_text = text[start_index : start_index + maxlen]
  print(f'\n{padder} Generating with seed: {generated_text} {padder}\n')

  for temperature in [0.2, 0.5, 1., 1.2]:
    print(f'\n{padder} Temperature: {temperature} {padder}\n')
    sys.stdout.write(generated_text)

    for i in range(400):
      sampled = np.zeros((1, maxlen, vocabulary_size))
      # one-hot-encode the seed (generated_text) as an input tensor
      for t, char in enumerate(generated_text):
        sampled[0, t, char_indices[char]] = 1
      
      preds = model.predict(sampled, verbose=0)[0]
      # Get the most likely next character according to the prediction 
      # and the current temperature
      next_index = sample(preds, temperature)
      next_char = chars[next_index]

      # Here we slide the seed (generated_text) window (sequence)
      # for the next iteration
      generated_text += next_char
      generated_text = generated_text[1:]

      sys.stdout.write(next_char)



Epoch: 1

----- Generating with seed: was different with felix mendelssohn, that halcyon
master, w -----


----- Temperature: 0.2 -----

was different with felix mendelssohn, that halcyon
master, which is a spiritual presenter to the stronger of the best of the consequently in the subject and consequently so the same them to the greater to the stronger of the souls, and the subject and perhaps the superstition of the stronger of the subject to the subject and the subjective and souls, and the privile to the subject and consequently and souls, and there is to be subject and more in the most 
----- Temperature: 0.5 -----

 and souls, and there is to be subject and more in the most suffering and misunderstanding and at the person for his insight and than has a sure, in the christian to the fals of man who has perhaps be perhaps to be subjective to distingunating, and also have no longer all the subject of the personal spirituality of placition and being posit problem of the privile and 

  This is separate from the ipykernel package so we can avoid doing imports until


e rares, meany, his roclekd and our
seloogi

Epoch: 7

----- Generating with seed: xperienced only in the
suffering of another, as in the case  -----


----- Temperature: 0.2 -----

xperienced only in the
suffering of another, as in the case of the spirit of the same states of the strong and the same and more and so the stronger to the same and power and so the best of the strong and spirit in the intellectual and sure, the superficial in the stronger of the spirit to be at the same to be a man and conscious sense of the states of the strong and more and so the best of the sense of the strong and such and so not of the states of the s
----- Temperature: 0.5 -----

nse of the strong and such and so not of the states of the sense of the latter of the sense of the free spirit is a songful enough and is be states of the best of a colled be perceive of the so purpose of the well with it in his spirit and self-desired how place of the bood in the states of a man in the worst which of person 