<a href="https://colab.research.google.com/github/josmuniz/RNN-LSTM/blob/main/GridSearch_Small_LSTM_RNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [8]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import LSTM
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.utils import to_categorical
import numpy as np

In [2]:
from google.colab import files
uploaded = files.upload()

Saving wonderland.txt to wonderland.txt


We are using Alice's adventure in Wondeland .

In [3]:
# load ascii text and covert to lowercase
filename = 'wonderland.txt'
raw_text = open(filename, 'r', encoding='utf-8').read()
raw_text = raw_text.lower()

Now that the book is loaded, I must prepare the data for modeling by the neural network. I cannot model the characters directly; instead, I must convert the characters to integers.

I can do this easily by first creating a set of all of the distinct characters in the book, then creating a map of each character to a unique integer.

In [None]:
# create mapping of unique chars to integers
chars = sorted(list(set(raw_text)))
char_to_int = dict((c, i) for i, c in enumerate(chars))
char_to_int

Now that the book has been loaded and the mapping prepared, I can summarize the dataset.

In [5]:
n_chars = len(raw_text)
n_vocab = len(chars)
print ("Total Characters: ", n_chars)
print ("Total Vocab: ", n_vocab)

Total Characters:  163948
Total Vocab:  64


The book has just under 150,000 characters, and when converted to lowercase, there are only 47 distinct characters in the vocabulary for the network to learn—much more than the 26 in the alphabet.

I need to define the training data for the network. There is a lot of flexibility in how we choose to break up the text and expose it to the network during training.

I will split the book text up into subsequences with a fixed length of 100 characters, an arbitrary length. You could just as easily split the data by sentences, padding the shorter sequences and truncating the longer ones.

Each training pattern of the network comprises 100 time steps of one character (X) followed by one character output (y). When creating these sequences, you slide this window along the whole book one character at a time, allowing each character a chance to be learned from the 100 characters that preceded it (except the first 100 characters, of course).


In [6]:
# prepare the dataset of input to output pairs encoded as integers
seq_length = 100
dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
 seq_in = raw_text[i:i + seq_length]
 seq_out = raw_text[i + seq_length]
 dataX.append([char_to_int[char] for char in seq_in])
 dataY.append(char_to_int[seq_out])
n_patterns = len(dataX)
print ("Total Patterns: ", n_patterns)

Total Patterns:  163848


Running the code to this point shows that when you split up the dataset into training data for the network to learn that you have just under 150,000 training patterns. This makes sense as, excluding the first 100 characters, you have one training pattern to predict each of the remaining characters.

In [None]:
dataX[2]

Now that we have prepared our training data, we need to transform it to be suitable for use with Keras.

First, we must transform the list of input sequences into the form [samples, time steps, features] expected by an LSTM network.

Next, we need to rescale the integers to the range 0-to-1 to make the patterns easier to learn by the LSTM network using the sigmoid activation function by default.

Finally, we need to convert the output patterns (single characters converted to integers) into a one-hot encoding. This is so that you can configure the network to predict the probability of each of the 47 different characters in the vocabulary (an easier representation) rather than trying to force it to predict precisely the next character. Each y value is converted into a sparse vector with a length of 47, full of zeros, except with a 1 in the column for the letter (integer) that the pattern represents.

---
For example, when “n” (integer value 31) is one-hot encoded, it looks as follows:
[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.]


In [9]:
# reshape X to be [samples, time steps, features]
X = np.reshape(dataX, (n_patterns, seq_length, 1))
# normalize
X = X / float(n_vocab)
# one hot encode the output variable
y = to_categorical(dataY)

In [22]:
y

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [1., 0., 0., ..., 0., 0., 0.],
       [1., 0., 0., ..., 0., 0., 0.]], dtype=float32)

We can now define our LSTM model. Here, we define a single hidden LSTM layer with 256 memory units. The network uses dropout with a probability of 20. The output layer is a Dense layer using the softmax activation function to output a probability prediction for each of the 47 characters between 0 and 1.

The problem is really a single character classification problem with 47 classes and, as such, is defined as optimizing the log loss (cross entropy) using the ADAM optimization algorithm for speed.

In [10]:
# define the LSTM model
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

note1: There is no test dataset. We are modeling the entire training dataset to learn the probability of each character in a sequence.

We are not interested in the most accurate (classification accuracy) model of the training dataset. This would be a model that predicts each character in the training dataset perfectly. Instead, we are interested in a generalization of the dataset that minimizes the chosen loss function. We are seeking a balance between generalization and overfitting but short of memorization.

---



In [11]:
# define the checkpoint
filepath="weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]

In [12]:
model.fit(X, y, epochs=20, batch_size=128, callbacks=callbacks_list)

Epoch 1/20
Epoch 1: loss improved from inf to 3.02746, saving model to weights-improvement-01-3.0275.hdf5
Epoch 2/20
  10/1281 [..............................] - ETA: 15s - loss: 2.9609

  saving_api.save_model(


Epoch 2: loss improved from 3.02746 to 2.84676, saving model to weights-improvement-02-2.8468.hdf5
Epoch 3/20
Epoch 3: loss improved from 2.84676 to 2.76458, saving model to weights-improvement-03-2.7646.hdf5
Epoch 4/20
Epoch 4: loss improved from 2.76458 to 2.69273, saving model to weights-improvement-04-2.6927.hdf5
Epoch 5/20
Epoch 5: loss improved from 2.69273 to 2.63404, saving model to weights-improvement-05-2.6340.hdf5
Epoch 6/20
Epoch 6: loss improved from 2.63404 to 2.57328, saving model to weights-improvement-06-2.5733.hdf5
Epoch 7/20
Epoch 7: loss improved from 2.57328 to 2.51925, saving model to weights-improvement-07-2.5192.hdf5
Epoch 8/20
Epoch 8: loss improved from 2.51925 to 2.47057, saving model to weights-improvement-08-2.4706.hdf5
Epoch 9/20
Epoch 9: loss improved from 2.47057 to 2.42367, saving model to weights-improvement-09-2.4237.hdf5
Epoch 10/20
Epoch 10: loss improved from 2.42367 to 2.38161, saving model to weights-improvement-10-2.3816.hdf5
Epoch 11/20
Epoch 1

<keras.src.callbacks.History at 0x7912f5861270>

Generating Text with an LSTM Network
Generating text using the trained LSTM network is relatively straightforward.

First, we will load the data and define the network in exactly the same way, except the network weights are loaded from a checkpoint file, and the network does not need to be trained.

In [14]:
# load the network weights
filename = "weights-improvement-20-2.0719.hdf5"
model.load_weights(filename)
model.compile(loss='categorical_crossentropy', optimizer='adam')

Also, when preparing the mapping of unique characters to integers, we must also create a reverse mapping that you can use to convert the integers back to characters so that you can understand the predictions.

In [15]:
int_to_char = dict((i, c) for i, c in enumerate(chars))

The simplest way to use the Keras LSTM model to make predictions is to first start with a seed sequence as input, generate the next character, then update the seed sequence to add the generated character on the end and trim off the first character. This process is repeated for as long as you want to predict new characters (e.g., a sequence of 1,000 characters in length).

In [17]:
import sys
# pick a random seed
start = np.random.randint(0, len(dataX)-1)
pattern = dataX[start]
print("Seed:")
print("\"", ''.join([int_to_char[value] for value in pattern]), "\"")

# generate characters
for i in range(1000):
 x = np.reshape(pattern, (1, len(pattern), 1))
 x = x / float(n_vocab)
 prediction = model.predict(x, verbose=0)
 index = np.argmax(prediction)
 result = int_to_char[index]
 seq_in = [int_to_char[value] for value in pattern]
 sys.stdout.write(result)
 pattern.append(index)
 pattern = pattern[1:len(pattern)]
print("\nDone.")

Seed:
"  hat,” the king said to the hatter.

“it isn’t mine,” said the hatter.

“_stolen!_” the king exclaim "
nd to the jork, and the hook dale ti the sore of the cotro, and the whrt hnt loae toeet oo the tire  the hound tht ao io an ie oasee hn the dad no the tare whet sas ao all oo the tare wooh it tas an iore and the was so tie wire of the harter, and the woide to ae an anl and aro anr of the hadt, and she was not io the dart wo toe kirte the sooe of the carler, and the woide to ae an anl and aro anr of the hadt, and she was not io the dart wo toe kirte the sooe of the carler, and the woide to ae an anl and aro anr of the hadt, and she was not io the dart wo toe kirte the sooe of the carler, and the woide to ae an anl and aro anr of the hadt, and she was not io the dart wo toe kirte the sooe of the carler, and the woide to ae an anl and aro anr of the hadt, and she was not io the dart wo toe kirte the sooe of the carler, and the woide to ae an anl and aro anr of the hadt, and she w