Loomis Python/Jupyter Notebook example of RNN/LSTM at work.

For reference I found several helpful articles on GitHub 

This one uses RNN/LSTM to train a model on shakespear and then write about it.

(experiments/text_generation_shakespeare_rnn/text_generation_shakespeare_rnn.ipynb)
https://github.com/trekhleb/machine-learning-experiments/blob/master/experiments/text_generation_shakespeare_rnn/text_generation_shakespeare_rnn.ipynb

I tested the example for myself on the following .txt file of Jane Austin, this example will serve as an introduction to RNN's and ML in python.

First import the following, TF and Kiras are popular ML programs.

In [None]:
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
import platform
import time
import pathlib
import os

# Path to Pride and Prejudice
path = 'C:/Users/loomi/Desktop/Python/PrideAndPrejudice.txt'

# Reading the database file.
text = open(path, mode='r').read()

print('Length of text: {} characters'.format(len(text)))

: 

In [None]:
print(text[:500])

: 

As we see above, we do infact have a copy of Jane Austin's "Pride and Prejudice" in a .txt file (saved locally will attatch to the email). Next are series of "getters and setters" to initialize the framework of the Vocabulary, Chunks or sets of characters to be parceled randomly, and the formate of the Arrays (matrix manipulation at its finnest).

Count the unique Charachters and create a Library of them.

In [None]:
# The unique characters in the 
vocab = sorted(set(text))

print('{} unique characters'.format(len(vocab)))
print('vocab:', vocab)

: 

In [None]:
# Map characters to their indices in vocabulary.
char2index = {char: index for index, char in enumerate(vocab)}

print('{')
for char, _ in zip(char2index, range(79)):
    print('  {:4s}: {:3d},'.format(repr(char), char2index[char]))
print('  ...\n}')

: 

In [None]:
# Map character indices to characters from vacabulary.
index2char = np.array(vocab)
print(index2char)

: 

In [None]:
# Convert chars in text to indices.
text_as_int = np.array([char2index[char] for char in text])

print('text_as_int length: {}'.format(len(text_as_int)))
print('{} --> {}'.format(repr(text[:15]), repr(text_as_int[:15])))

: 

In [None]:
# The maximum length sentence we want for a single input in characters.
sequence_length = 100
examples_per_epoch = len(text) // (sequence_length + 1)

print('examples_per_epoch:', examples_per_epoch)

: 

In [None]:
# Create training dataset. REQUIRED
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

for char in char_dataset.take(5):
    print(index2char[char.numpy()])

: 

In [None]:
# Generate batched sequences out of the char_dataset.
sequences = char_dataset.batch(sequence_length + 1, drop_remainder=True)

# Sequences size is the same as examples_per_epoch.
print('Sequences count: {}'.format(len(list(sequences.as_numpy_iterator()))));
print()

# Sequences examples.
for item in sequences.take(5):
    print(repr(''.join(index2char[item.numpy()])))

: 

In [None]:
def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text


dataset = sequences.map(split_input_target)

# Dataset size is the same as examples_per_epoch.
# But each element of a sequence is now has length of `sequence_length`
# and not `sequence_length + 1`.
print('dataset size: {}'.format(len(list(dataset.as_numpy_iterator()))))



: 

In [None]:
for input_example, target_example in dataset.take(1):
    print('Input sequence size:', repr(len(input_example.numpy())))
    print('Target sequence size:', repr(len(target_example.numpy())))
    print()
    print('Input:', repr(''.join(index2char[input_example.numpy()])))
    print('Target:', repr(''.join(index2char[target_example.numpy()])))

: 

In [None]:
for i, (input_idx, target_idx) in enumerate(zip(input_example[:5], target_example[:5])):
    print('Step {:2d}'.format(i))
    print('  input: {} ({:s})'.format(input_idx, repr(index2char[input_idx])))
    print('  expected output: {} ({:s})'.format(target_idx, repr(index2char[target_idx])))

: 

In [None]:
# Batch size.
BATCH_SIZE = 64

# Buffer size to shuffle the dataset (TF data is designed to work
# with possibly infinite sequences, so it doesn't attempt to shuffle
# the entire sequence in memory. Instead, it maintains a buffer in
# which it shuffles elements).
BUFFER_SIZE = 10000

dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

dataset

: 

In [None]:
print('Batched dataset size: {}'.format(len(list(dataset.as_numpy_iterator()))))

: 

In [None]:
for input_text, target_text in dataset.take(1):
    print('1st batch: input_text:', input_text)
    print()
    print('1st batch: target_text:', target_text)

: 

In [None]:
# Length of the vocabulary in chars.
vocab_size = len(vocab)

# The embedding dimension.
embedding_dim = 256

# Number of RNN units.
rnn_units = 1024

def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
    model = tf.keras.models.Sequential()

    model.add(tf.keras.layers.Embedding(
      input_dim=vocab_size,
      output_dim=embedding_dim,
      batch_input_shape=[batch_size, None]
    ))

    model.add(tf.keras.layers.LSTM(
      units=rnn_units,
      return_sequences=True,
      stateful=True,
      recurrent_initializer=tf.keras.initializers.GlorotNormal()
    ))

    model.add(tf.keras.layers.Dense(vocab_size))
  
    return model

model = build_model(vocab_size, embedding_dim, rnn_units, BATCH_SIZE)

model.summary()

: 

Translation, the model uses LSTM and RNN layers (see Dense, as in density of layers) to take the tokenized language of Jane Austin and train the computer to predict what letter comes next. In this way the model can "learn" how to approximate the author's style. 

In [None]:
# Loss function, labels logits
def loss(labels, logits):
    return tf.keras.losses.sparse_categorical_crossentropy(
      y_true=labels,
      y_pred=logits,
      from_logits=True
    )


: 

Baseds on the current Model the loss is fairly high at 4.6, through training we intend to lower that.

Keras, an Adam optimizer is a popular optimization algorithm used for training deep learning models. Adam stands for "Adaptive Moment Estimation" and is an extension of the stochastic gradient descent (SGD) optimization algorithm.

Adam optimizer combines the concepts of momentum and adaptive learning rates to efficiently update the weights during training. It maintains an  average of past gradients; By keeping track of these statistics, Adam adapts the learning rate for each parameter based on their estimated first and second moments (think delta's).

In Keras, you can use the Adam optimizer by importing it from the keras.optimizers module.

In [None]:
adam_optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
model.compile(
    optimizer=adam_optimizer,
    loss=loss
)

: 

In [None]:
# Directory where the checkpoints will be saved. This functions as the memory,
# or as the inhearant learned state. 
checkpoint_dir = 'tmp/checkpoints'
os.makedirs(checkpoint_dir, exist_ok=True)

# Name of the checkpoint files, saves individual
checkpoint_prefix = os.path.join(checkpoint_dir, 'ckpt_{epoch}')

checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True
)

: 

Now that we have the model and an optimizer in place we itterate over and over again until the Loss is reduced.

Each time the model is run the weights and imputs are balanced and the program begins to learn. Lets run 100 "epochs" and see how the model does.

In [None]:
EPOCHS=50
history = model.fit(
  x=dataset,
  epochs=EPOCHS,
  callbacks=[
    checkpoint_callback
  ]
)

: 

In [None]:
def render_training_history(training_history):
    loss = training_history.history['loss']
    plt.title('Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.plot(loss, label='Training set')
    plt.legend()
    plt.grid(linestyle='--', linewidth=1, alpha=0.5)
    plt.show()
    
render_training_history(history)

: 

Note the loss functions progress through the Epochs, it roughly resembles f(x) = e^ -x. Possibly do to the inners of ADAM

In [None]:
tf.train.latest_checkpoint(checkpoint_dir)

simplified_batch_size = 1

model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)

model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))

model.build(tf.TensorShape([simplified_batch_size, None]))
model.summary()

: 

In [None]:

# num_generate
# - number of characters to generate.
#
# temperature
# - Low temperatures results in more predictable text.
# - Higher temperatures results in more surprising text.
# - Experiment to find the best setting.
def generate_text(model, start_string, num_generate = 1000, temperature=1.0):
    # Evaluation step (generating text using the learned model)

    # Converting our start string to numbers (vectorizing).
    input_indices = [char2index[s] for s in start_string]
    input_indices = tf.expand_dims(input_indices, 0)

    # Empty string to store our results.
    text_generated = []

    # Here batch size == 1.
    model.reset_states()
    for char_index in range(num_generate):
        predictions = model(input_indices)
        # remove the batch dimension
        predictions = tf.squeeze(predictions, 0)

        # Using a categorical distribution to predict the character returned by the model.
        predictions = predictions / temperature
        predicted_id = tf.random.categorical(
        predictions,
        num_samples=1
        )[-1,0].numpy()

        # We pass the predicted character as the next input to the model
        # along with the previous hidden state.
        input_indices = tf.expand_dims([predicted_id], 0)

        text_generated.append(index2char[predicted_id])

    return (start_string + ''.join(text_generated))

# Generate the text with default temperature (1.0).
print(generate_text(model, start_string=u" "))

: 

Honestly...not too bad but I will try again and again... The notes from the originator show temprature as a moderator for the agressiveness of the algorythm so setting to .50, 50%? 

In [None]:
def generate_text(model, start_string, num_generate = 10000, temperature=.50):
    # Evaluation step (generating text using the learned model)

    # Converting our start string to numbers (vectorizing).
    input_indices = [char2index[s] for s in start_string]
    input_indices = tf.expand_dims(input_indices, 0)

    # Empty string to store our results.
    text_generated = []

    # Here batch size == 1.
    model.reset_states()
    for char_index in range(num_generate):
        predictions = model(input_indices)
        # remove the batch dimension
        predictions = tf.squeeze(predictions, 0)

        # Using a categorical distribution to predict the character returned by the model.
        predictions = predictions / temperature
        predicted_id = tf.random.categorical(
        predictions,
        num_samples=1
        )[-1,0].numpy()

        # We pass the predicted character as the next input to the model
        # along with the previous hidden state.
        input_indices = tf.expand_dims([predicted_id], 0)

        text_generated.append(index2char[predicted_id])

    return (start_string + ''.join(text_generated))

# Generate the text with default temperature (1.0).
print(generate_text(model, start_string=u" "))

: 

Much better, and as an avid fan of Ms Austin's work very humorous. 