In [12]:
%autosave 0

Autosave disabled


### Text generation using Project Gutenberg

In this tutorial, we will download a story from Project Gutenberg and use it to feed a text generation model.

About Project Gutenberg:
"Project Gutenberg offers over 56,000 free eBooks: Choose among free epub books, free kindle books, download them or read them online. You will find the world's great literature here, especially older works for which copyright has expired. We digitized and diligently proofread them with the help of thousands of volunteers."

Navigate to https://www.gutenberg.org/ and pick a book with a plain text file of around 200 kb (Alice in Wonderland is a good option: https://www.gutenberg.org/ebooks/11). Download the text as the basis for your model.

### Develop a Small LSTM Recurrent Neural Network
In this section we will develop a simple LSTM network to learn sequences of characters from Alice in Wonderland. In the next section we will use this model to generate new sequences of characters.

Let’s start off by importing the classes and functions we intend to use to train our model.

In [1]:
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.callbacks import ModelCheckpoint
from keras.utils import np_utils

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


Next, we need to load the ASCII text for the book into memory and convert all of the characters to lowercase to reduce the vocabulary that the network must learn.

In [2]:
# load ascii text and covert to lowercase
import re
filename = "data/dr_hyde_easy.txt"
raw_text = open(filename, encoding='utf-8').read()
raw_text = raw_text.lower()
raw_text = re.sub(r'[^\w\s]',' ', raw_text)

# Get rid of line breaks and change multiple space to single
raw_text = re.sub(r'[\n]',' ', raw_text)
raw_text = re.sub(r' +', ' ', raw_text)

Now that the book is loaded, we must prepare the data for modeling by the neural network. We cannot model the characters directly, instead we must convert the characters to integers.

We can do this easily by first creating a set of all of the distinct characters in the book, then creating a map of each character to a unique integer.

In [3]:
# create mapping of unique chars to integers
chars = sorted(list(set(raw_text)))
char_to_int = dict((c, i) for i, c in enumerate(chars))

You can see that there may be some characters that we could remove to further clean up the dataset that will reduce the vocabulary and may improve the modeling process.

Now that the book has been loaded and the mapping prepared, we can summarize the dataset.

In [4]:
n_chars = len(raw_text)
n_vocab = len(chars)
print("Total Characters: ", n_chars)
print("Total Vocab: ", n_vocab)

Total Characters:  151293
Total Vocab:  37


We now need to define the training data for the network. There is a lot of flexibility in how you choose to break up the text and expose it to the network during training.

In this tutorial we will split the book text up into subsequences with a fixed length of 100 characters, an arbitrary length. We could just as easily split the data up by sentences and pad the shorter sequences and truncate the longer ones.

Each training pattern of the network is comprised of 100 time steps of one character (X) followed by one character output (y). When creating these sequences, we slide this window along the whole book one character at a time, allowing each character a chance to be learned from the 100 characters that preceded it (except the first 100 characters of course).

In [5]:
# prepare the dataset of input to output pairs encoded as integers
seq_length = 100
dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
    seq_in = raw_text[i:i + seq_length]
    seq_out = raw_text[i + seq_length]
    dataX.append([char_to_int[char] for char in seq_in])
    dataY.append(char_to_int[seq_out])
n_patterns = len(dataX)
print("Total Patterns: ", n_patterns)

Total Patterns:  151193


Now that we have prepared our training data we need to transform it so that it is suitable for use with Keras.

First we must transform the list of input sequences into the form [samples, time steps, features] expected by an LSTM network.

Next we need to rescale the integers to the range 0-to-1 to make the patterns easier to learn by the LSTM network that uses the sigmoid activation function by default.

Finally, we need to convert the output patterns (single characters converted to integers) into a one hot encoding. This is so that we can configure the network to predict the probability of each of the different characters in the vocabulary (an easier representation) rather than trying to force it to predict precisely the next character. Each y value is converted into a sparse vector, full of zeros except with a 1 in the column for the letter (integer) that the pattern represents.


In [6]:
# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (n_patterns, seq_length, 1))
# normalize
X = X / float(n_vocab)
# one hot encode the output variable
y = np_utils.to_categorical(dataY)

We can now define our LSTM model. Here we define a single hidden LSTM layer with 256 memory units. The network uses dropout with a probability of 20. The output layer is a Dense layer using the softmax activation function to output a probability prediction for each of the characters between 0 and 1.

The problem is really a single character classification problem with n character classes and as such is defined as optimizing the log loss (cross entropy), here using the ADAM optimization algorithm for speed.


In [7]:
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

There is no test dataset. We are modeling the entire training dataset to learn the probability of each character in a sequence.

We are not interested in the most accurate (classification accuracy) model of the training dataset. This would be a model that predicts each character in the training dataset perfectly. Instead we are interested in a generalization of the dataset that minimizes the chosen loss function. We are seeking a balance between generalization and overfitting but short of memorization.


In [8]:
# define the checkpoint
filepath="weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]

model.fit(X, y, epochs=20, batch_size=128, callbacks=callbacks_list)

Epoch 1/20
Epoch 00001: loss improved from inf to 2.79100, saving model to weights-improvement-01-2.7910.hdf5
Epoch 2/20
Epoch 00002: loss improved from 2.79100 to 2.63793, saving model to weights-improvement-02-2.6379.hdf5
Epoch 3/20
Epoch 00003: loss improved from 2.63793 to 2.56246, saving model to weights-improvement-03-2.5625.hdf5
Epoch 4/20
Epoch 00004: loss improved from 2.56246 to 2.50216, saving model to weights-improvement-04-2.5022.hdf5
Epoch 5/20
Epoch 00005: loss improved from 2.50216 to 2.44435, saving model to weights-improvement-05-2.4444.hdf5
Epoch 6/20
Epoch 00006: loss improved from 2.44435 to 2.39395, saving model to weights-improvement-06-2.3940.hdf5
Epoch 7/20
Epoch 00007: loss improved from 2.39395 to 2.34958, saving model to weights-improvement-07-2.3496.hdf5
Epoch 8/20
Epoch 00008: loss improved from 2.34958 to 2.30721, saving model to weights-improvement-08-2.3072.hdf5
Epoch 9/20
Epoch 00009: loss improved from 2.30721 to 2.26834, saving model to weights-impro

<keras.callbacks.History at 0x1aa89163668>

After running the example, you should have a number of weight checkpoint files in the local directory.

You can delete them all except the one with the smallest loss value, eg `weights-improvement-19-1.9435.hdf5`

### Generating Text with an LSTM Network
Generating text using the trained LSTM network is relatively straightforward.

Firstly, we load the data and define the network in exactly the same way, except the network weights are loaded from a checkpoint file and the network does not need to be trained.


In [13]:
# load the network weights
filename = "weights-improvement-20-1.9344.hdf5"
model.load_weights(filename)
model.compile(loss='categorical_crossentropy', optimizer='adam')

Also, when preparing the mapping of unique characters to integers, we must also create a reverse mapping that we can use to convert the integers back to characters so that we can understand the predictions.


In [14]:
int_to_char = dict((i, c) for i, c in enumerate(chars))

The simplest way to use the Keras LSTM model to make predictions is to first start off with a seed sequence as input, generate the next character then update the seed sequence to add the generated character on the end and trim off the first character. This process is repeated for as long as we want to predict new characters (e.g. a sequence of 1,000 characters in length).

We can pick a random input pattern as our seed sequence, then print generated characters as we generate them.


In [10]:
import sys

In [15]:
import sys
# pick a random seed
start = numpy.random.randint(0, len(dataX)-1)
pattern = dataX[start]
print( "Seed:")
print( "\"", ''.join([int_to_char[value] for value in pattern]), "\"")
# generate characters
for i in range(1000):
    x = numpy.reshape(pattern, (1, len(pattern), 1))
    x = x / float(n_vocab)
    prediction = model.predict(x, verbose=0)
    index = numpy.argmax(prediction)
    result = int_to_char[index]
    seq_in = [int_to_char[value] for value in pattern]
    sys.stdout.write(result)
    pattern.append(index)
    pattern = pattern[1:len(pattern)]
print( "\nDone.")

Seed:
" ion and must flee before daylight from a house that was no longer mine and hurrying back to my cabin "
et io the mawe of the sare thi fane oo the fand of the saae whth a streng of toeer and the lawter sooe a creat dare oo the sereo of the fouse oh the saalen the lawter whsh a stidn and the lawyer saed the lawyer i sae io whet ie a mort an wou cane a auaat foro the coor of the sale the sawe oo the coor of the saalen the lawyer saed the lawyer i sae io whet ie a mort an wou cane a auaat foro the coor of the sale the sawe oo the coor of the saalen the lawyer saed the lawyer i sae io whet ie a mort an wou cane a auaat foro the coor of the sale the sawe oo the coor of the saalen the lawyer saed the lawyer i sae io whet ie a mort an wou cane a auaat foro the coor of the sale the sawe oo the coor of the saalen the lawyer saed the lawyer i sae io whet ie a mort an wou cane a auaat foro the coor of the sale the sawe oo the coor of the saalen the lawyer saed the lawyer i sae io whet ie a

What about the results shows good language generation? What is done poorly?

The model was able to create something like words in the beginning, however, afterwards it just picked up the same pattern over and over again.  One could assume that it did so because that pattern is common throughout the book or it isn't able to distinguish other patterns after those letters.

### Exercise 1: Try a bigger model

We will keep the number of memory units the same at 256, but add a second layer.

In [16]:
model2 = Sequential()
model2.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]), 
                return_sequences=True))
model2.add(LSTM(256))
model2.add(Dense(y.shape[1], activation='softmax'))
model2.compile(loss='categorical_crossentropy', optimizer='adam')

# define the checkpoint
filepath="weights-improvement-mod2-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]

model2.fit(X, y, epochs=20, batch_size=128, callbacks=callbacks_list)

Epoch 1/20
Epoch 00001: loss improved from inf to 2.68936, saving model to weights-improvement-mod2-01-2.6894.hdf5
Epoch 2/20
Epoch 00002: loss improved from 2.68936 to 2.42657, saving model to weights-improvement-mod2-02-2.4266.hdf5
Epoch 3/20
Epoch 00003: loss improved from 2.42657 to 2.22155, saving model to weights-improvement-mod2-03-2.2215.hdf5
Epoch 4/20
Epoch 00004: loss improved from 2.22155 to 2.06995, saving model to weights-improvement-mod2-04-2.0699.hdf5
Epoch 5/20
Epoch 00005: loss improved from 2.06995 to 1.96525, saving model to weights-improvement-mod2-05-1.9652.hdf5
Epoch 6/20
Epoch 00006: loss improved from 1.96525 to 1.88113, saving model to weights-improvement-mod2-06-1.8811.hdf5
Epoch 7/20
Epoch 00007: loss improved from 1.88113 to 1.81244, saving model to weights-improvement-mod2-07-1.8124.hdf5
Epoch 8/20
Epoch 00008: loss improved from 1.81244 to 1.75216, saving model to weights-improvement-mod2-08-1.7522.hdf5
Epoch 9/20
Epoch 00009: loss improved from 1.75216 t

<keras.callbacks.History at 0x1aba1f0b978>

In [18]:
# load the network weights
filename = "weights-improvement-mod2-20-1.2442.hdf5"
model2.load_weights(filename)
model2.compile(loss='categorical_crossentropy', optimizer='adam')

In [19]:
# pick a random seed
start = numpy.random.randint(0, len(dataX)-1)
pattern = dataX[start]
print( "Seed:")
print( "\"", ''.join([int_to_char[value] for value in pattern]), "\"")
# generate characters
for i in range(1000):
    x = numpy.reshape(pattern, (1, len(pattern), 1))
    x = x / float(n_vocab)
    prediction = model2.predict(x, verbose=0)
    index = numpy.argmax(prediction)
    result = int_to_char[index]
    seq_in = [int_to_char[value] for value in pattern]
    sys.stdout.write(result)
    pattern.append(index)
    pattern = pattern[1:len(pattern)]
print( "\nDone.")

Seed:
" th which i listened to the civilities of my unhappy victim i declare at least before god no man mora "
ly plt reterted the lawyer i shought it was not tn me a changed renlsee to the project gutenberg tm trademark and distribution of the project gutenberg tm micense and distribution or any pther with the perains of the fyl tere of the fyn tere aut the ward of the lawy moment i was still and i cegind the servant metter i shank you sile that i cannot sead the lawyer i have have his friends were the strange cloter was not my hiad to my consenoent for my sesvrn i dould have teen and i have have eone to my heart that i cannot sead the lawyer i have have his friends were the strange cloter was not my hiad to my consenoent for my sesvrn i dould have teen and i have have eone to my heart that i cannot sead the lawyer i have have his friends were the strange cloter was not my hiad to my consenoent for my sesvrn i dould have teen and i have have eone to my heart that i cannot sead the law

Again, what about the results shows good language generation? What is done poorly?

This time the model is able to better predict some words, such as **lawyer** and **friends**.  However, it seems to replace some letters such as **s** for **r** and **s** for **t** in words such as *sead* that should be *read* and *shank* instead of *thank*.  However, there still seems to be repitition in the model in what it predicts.

### Exercise 2: Create a word based model

Instead of generating language character by character, let's build an word generating model.

In [22]:
word_text = raw_text.split(' ')

# create mapping of unique chars to integers
words = sorted(list(set(word_text)))
word_to_int = dict((c, i) for i, c in enumerate(words))

n_words = len(word_text)
n_word_vocab = len(words)
print("Total Characters: ", n_words)
print("Total Vocab: ", n_word_vocab)

# prepare the dataset of input to output pairs encoded as integers
word_seq_length = 100
dataX_word = []
dataY_word = []
for i in range(0, n_words - word_seq_length, 1):
    seq_in = word_text[i:i + word_seq_length]
    seq_out = word_text[i + word_seq_length]
    dataX_word.append([word_to_int[word] for word in seq_in])
    dataY_word.append(word_to_int[seq_out])
n_word_patterns = len(dataX_word)
print("Total Patterns: ", n_word_patterns)

# reshape X to be [samples, time steps, features]
X_word = numpy.reshape(dataX_word, (n_word_patterns, word_seq_length, 1))
# normalize
X_word = X_word / float(n_word_vocab)
# one hot encode the output variable
y_word = np_utils.to_categorical(dataY_word)

Total Characters:  29027
Total Vocab:  4327
Total Patterns:  28927


In [23]:
model3 = Sequential()
model3.add(LSTM(100, input_shape=(X_word.shape[1], X_word.shape[2]), 
                return_sequences=True))
model3.add(LSTM(100, return_sequences=True))
model3.add(LSTM(100))
model3.add(Dense(y_word.shape[1], activation='softmax'))
model3.compile(loss='categorical_crossentropy', optimizer='adam')

# define the checkpoint
filepath="weights-improvement-wordmod-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]

model3.fit(X_word, y_word, epochs=20, batch_size=128, callbacks=callbacks_list)

Epoch 1/20
Epoch 00001: loss improved from inf to 6.75371, saving model to weights-improvement-wordmod-01-6.7537.hdf5
Epoch 2/20
Epoch 00002: loss improved from 6.75371 to 6.44474, saving model to weights-improvement-wordmod-02-6.4447.hdf5
Epoch 3/20
Epoch 00003: loss improved from 6.44474 to 6.43840, saving model to weights-improvement-wordmod-03-6.4384.hdf5
Epoch 4/20
Epoch 00004: loss improved from 6.43840 to 6.43824, saving model to weights-improvement-wordmod-04-6.4382.hdf5
Epoch 5/20
Epoch 00005: loss improved from 6.43824 to 6.43629, saving model to weights-improvement-wordmod-05-6.4363.hdf5
Epoch 6/20
Epoch 00006: loss did not improve
Epoch 7/20
Epoch 00007: loss did not improve
Epoch 8/20
Epoch 00008: loss did not improve
Epoch 9/20
Epoch 00009: loss did not improve
Epoch 10/20
Epoch 00010: loss did not improve
Epoch 11/20
Epoch 00011: loss did not improve
Epoch 12/20
Epoch 00012: loss did not improve
Epoch 13/20
Epoch 00013: loss did not improve
Epoch 14/20
Epoch 00014: loss 

<keras.callbacks.History at 0x1aba5d10d68>

In [24]:
# load the network weights
filename = "weights-improvement-wordmod-17-6.4350.hdf5"
model3.load_weights(filename)
model3.compile(loss='categorical_crossentropy', optimizer='adam')

In [25]:
int_to_word = dict((i, c) for i, c in enumerate(words))

In [34]:
# pick a random seed
start = numpy.random.randint(0, len(dataX_word)-1)
pattern = dataX_word[start]
print( "Seed:")
print( "\"", ' '.join([int_to_word[value] for value in pattern]), "\"")
# generate characters
for i in range(100):
    x = numpy.reshape(pattern, (1, len(pattern), 1))
    x = x / float(n_word_vocab)
    prediction = model3.predict(x, verbose=0)
    index = numpy.argmax(prediction)
    result = int_to_word[index]
    seq_in = [int_to_word[value] for value in pattern]
    sys.stdout.write(result + " ")
    pattern.append(index)
    pattern = pattern[1:len(pattern)]
print( "\nDone.")

Seed:
" my life my honour my reason are all at your mercy if you fail me to night i am lost you might suppose after this preface that i am going to ask you for something dishonourable to grant judge for yourself i want you to postpone all other engagements for to night ay even if you were summoned to the bedside of an emperor to take a cab unless your carriage should be actually at the door and with this letter in your hand for consultation to drive straight to my house poole my butler has his orders you will "
the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the 
Done.


In [35]:
model4 = Sequential()
model4.add(LSTM(50, input_shape=(X_word.shape[1], X_word.shape[2]), 
                return_sequences=True))
model4.add(LSTM(50, return_sequences=True))
model4.add(LSTM(50, return_sequences=True))
model4.add(LSTM(50, return_sequences=True))
model4.add(LSTM(50))
model4.add(Dense(y_word.shape[1], activation='softmax'))
model4.compile(loss='categorical_crossentropy', optimizer='adam')

# define the checkpoint
filepath="weights-improvement-wordmod2-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]

model4.fit(X_word, y_word, epochs=10, batch_size=128, callbacks=callbacks_list)

Epoch 1/10
Epoch 00001: loss improved from inf to 6.77577, saving model to weights-improvement-wordmod2-01-6.7758.hdf5
Epoch 2/10
Epoch 00002: loss improved from 6.77577 to 6.38962, saving model to weights-improvement-wordmod2-02-6.3896.hdf5
Epoch 3/10
Epoch 00003: loss improved from 6.38962 to 6.38053, saving model to weights-improvement-wordmod2-03-6.3805.hdf5
Epoch 4/10
Epoch 00004: loss improved from 6.38053 to 6.37867, saving model to weights-improvement-wordmod2-04-6.3787.hdf5
Epoch 5/10
Epoch 00005: loss improved from 6.37867 to 6.37842, saving model to weights-improvement-wordmod2-05-6.3784.hdf5
Epoch 6/10
Epoch 00006: loss did not improve
Epoch 7/10
Epoch 00007: loss improved from 6.37842 to 6.37810, saving model to weights-improvement-wordmod2-07-6.3781.hdf5
Epoch 8/10
Epoch 00008: loss did not improve
Epoch 9/10
Epoch 00009: loss did not improve
Epoch 10/10
Epoch 00010: loss did not improve


<keras.callbacks.History at 0x1abb0cae0b8>

In [37]:
# load the network weights
filename = "weights-improvement-wordmod2-07-6.3781.hdf5"
model4.load_weights(filename)
model4.compile(loss='categorical_crossentropy', optimizer='adam')

int_to_word = dict((i, c) for i, c in enumerate(words))

# pick a random seed
start = numpy.random.randint(0, len(dataX_word)-1)
pattern = dataX_word[start]
print( "Seed:")
print( "\"", ' '.join([int_to_word[value] for value in pattern]), "\"")
# generate characters
for i in range(100):
    x = numpy.reshape(pattern, (1, len(pattern), 1))
    x = x / float(n_word_vocab)
    prediction = model4.predict(x, verbose=0)
    index = numpy.argmax(prediction)
    result = int_to_word[index]
    seq_in = [int_to_word[value] for value in pattern]
    sys.stdout.write(result + " ")
    pattern.append(index)
    pattern = pattern[1:len(pattern)]
print( "\nDone.")

Seed:
" of hyde it took on this occasion a double dose to recall me to myself and alas six hours after as i sat looking sadly in the fire the pangs returned and the drug had to be re administered in short from that day forth it seemed only by a great effort as of gymnastics and only under the immediate stimulation of the drug that i was able to wear the countenance of jekyll at all hours of the day and night i would be taken with the premonitory shudder above all if i slept or even dozed for a "
the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the 
Done.


In [36]:
model5 = Sequential()
model5.add(LSTM(300, input_shape=(X_word.shape[1], X_word.shape[2]), 
                return_sequences=True))
model5.add(LSTM(300, return_sequences=True))
model5.add(LSTM(300))
model5.add(Dense(y_word.shape[1], activation='softmax'))
model5.compile(loss='categorical_crossentropy', optimizer='adam')

# define the checkpoint
filepath="weights-improvement-wordmod3-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]

model5.fit(X_word, y_word, epochs=10, batch_size=128, callbacks=callbacks_list)

Epoch 1/10
Epoch 00001: loss improved from inf to 6.87377, saving model to weights-improvement-wordmod3-01-6.8738.hdf5
Epoch 2/10
Epoch 00002: loss improved from 6.87377 to 6.59828, saving model to weights-improvement-wordmod3-02-6.5983.hdf5
Epoch 3/10
Epoch 00003: loss did not improve
Epoch 4/10
Epoch 00004: loss did not improve
Epoch 5/10
Epoch 00005: loss did not improve
Epoch 6/10
Epoch 00006: loss did not improve
Epoch 7/10
Epoch 00007: loss did not improve
Epoch 8/10
Epoch 00008: loss did not improve
Epoch 9/10
Epoch 00009: loss did not improve
Epoch 10/10
Epoch 00010: loss did not improve


<keras.callbacks.History at 0x1abba7fbf98>

In [38]:
# load the network weights
filename = "weights-improvement-wordmod3-02-6.5983.hdf5"
model5.load_weights(filename)
model5.compile(loss='categorical_crossentropy', optimizer='adam')

int_to_word = dict((i, c) for i, c in enumerate(words))

# pick a random seed
start = numpy.random.randint(0, len(dataX_word)-1)
pattern = dataX_word[start]
print( "Seed:")
print( "\"", ' '.join([int_to_word[value] for value in pattern]), "\"")
# generate characters
for i in range(100):
    x = numpy.reshape(pattern, (1, len(pattern), 1))
    x = x / float(n_word_vocab)
    prediction = model5.predict(x, verbose=0)
    index = numpy.argmax(prediction)
    result = int_to_word[index]
    seq_in = [int_to_word[value] for value in pattern]
    sys.stdout.write(result + " ")
    pattern.append(index)
    pattern = pattern[1:len(pattern)]
print( "\nDone.")

Seed:
" copyright holder the work can be copied and distributed to anyone in the united states without paying any fees or charges if you are redistributing or providing access to a work with the phrase project gutenberg associated with or appearing on the work you must comply either with the requirements of paragraphs 1 e 1 through 1 e 7 or obtain permission for the use of the work and the project gutenberg tm trademark as set forth in paragraphs 1 e 8 or 1 e 9 1 e 3 if an individual project gutenberg tm electronic work is posted with "
and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and 
Done.


### Exercise 3: Use a different source corpus for training and compare results

Pick a very different source corpus, like the King James Bible or something that would differ greatly from the book you initially chose.