# T81-558: Applications of Deep Neural Networks
**Module 10: Time Series in Keras**
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

# Module 10 Material

* Part 10.1: Time Series Data Encoding for Deep Learning [[Video]]() [[Notebook]](t81_558_class_10_1_timeseries.ipynb)
* Part 10.2: Programming LSTM with Keras and TensorFlow [[Video]]() [[Notebook]](t81_558_class_10_2_lstm.ipynb)
* **Part 10.3: Text Generation with Keras and TensorFlow** [[Video]]() [[Notebook]](t81_558_class_10_3_text_generation.ipynb)
* Part 10.4: Image Captioning with Keras and TensorFlow [[Video]]() [[Notebook]](t81_558_class_10_4_captioning.ipynb)
* Part 10.5: Temporal CNN in Keras and TensorFlow [[Video]]() [[Notebook]](t81_558_class_10_5_temporal_cnn.ipynb)

# Part 10.3: Text Generation with LSTM

Recurrent neural networks are also known for their ability to generate text.  This can allow the output of the neural network to be free-form text.  In this part we will see how an LSTM can be trained on a textual document, such as classic literature, and learn to output new text that appears to be of the same form as the training material.  If you train your LSTM on [Shakespeare](https://en.wikipedia.org/wiki/William_Shakespeare), then it will learn to crank out new prose that is similar to what Shakespeare had written. 

Don't get your hopes up.  Your not going to each your deep neural network to write the next [Pulitzer Prize for Fiction](https://en.wikipedia.org/wiki/Pulitzer_Prize_for_Fiction).  The prose generated by your neural network will be nonsensical.  However, it will usually be nearly grammatically and of a similar style as the source training documents. 

A neural network generating nonsensical text based on literature may not seem terribly useful at first glance.  However, the reason that this technology gets so much interest is that it forms the foundation for many more advanced technologies.  The fact that the LSTM will typically learn human grammar from the source document opens a wide range of possibilities. Similar technology can be used to complete sentences when a user is entering text.  Simply the ability to output free-form text becomes the foundation of many other technologies.  In the next part, we will make use of this technique to create a neural network that can write captions for images to describe what is going on in the image. 

### Additional Information

The following are some of the articles that I found useful putting this section together.

* [The Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)
* [Text Generation With LSTM Recurrent Neural Networks in Python with Keras](https://machinelearningmastery.com/text-generation-lstm-recurrent-neural-networks-python-keras/)
* [How to Develop a Word-Level Neural Language Model and Use it to Generate Text](https://machinelearningmastery.com/how-to-develop-a-word-level-neural-language-model-in-keras/)

### Character-Level Text Generation

There are a number of different approaches to teaching a neural network to output free-form text.  The most basic question is if you wish the neural network to learn at the word or character level.  In many ways, lerning at the character level is the more interesting of the two.  The LSTM is learning construct its own words without even being shown what a word is.  We will begin with character-level text generation.  In the next module, we will see how we can use nearly the same technique to operate at the word level.  The automatic captioning that will be implemented in the next module is at the word level.

We begin by importing the needed Python packages and defining the sequence length, named **CHAR_SEQ_LEN**.  Time-series neural networks always accept their input as a fixed length array.  Not all of the sequence might be used, it is common to fill extra elements with zeros.  The text will be divided into sequences of this length and the neural network will be trained to predict what comes after this sequence.

In [5]:
import sys
import os
import numpy as np
import pandas as pd
import requests
import re
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM

CHAR_SEQ_LEN = 25

For this simple example we will train the neural network on the classic children's book [Treasure Island](https://en.wikipedia.org/wiki/Treasure_Island).  We begin by loading this text into a Python string and displaying the first 1,000 characters.

In [8]:
r = requests.get("https://data.heatonresearch.com/data/t81-558/text/treasure_island.txt")
raw_text = r.text
print(raw_text[0:1000])


ï»¿The Project Gutenberg EBook of Treasure Island, by Robert Louis Stevenson

This eBook is for the use of anyone anywhere at no cost and with
almost no restrictions whatsoever.  You may copy it, give it away or
re-use it under the terms of the Project Gutenberg License included
with this eBook or online at www.gutenberg.net


Title: Treasure Island

Author: Robert Louis Stevenson

Illustrator: Milo Winter

Release Date: January 12, 2009 [EBook #27780]

Language: English


*** START OF THIS PROJECT GUTENBERG EBOOK TREASURE ISLAND ***




Produced by Juliet Sutherland, Stephen Blundell and the
Online Distributed Proofreading Team at http://www.pgdp.net









 THE ILLUSTRATED CHILDREN'S LIBRARY


         _Treasure Island_

       Robert Louis Stevenson

          _Illustrated by_
            Milo Winter


           [Illustration]


           GRAMERCY BOOKS
              NEW YORK




 Foreword copyright Â© 1986 by Random House V

We will extract all unique characters from the text and sort them.  This allows us to assign a unique ID to each character.  Because the characters are sorted, these IDs should remain the same.  If new characters were added to the original text, then the IDs would change.  We build up two dictionaries.  The first **char2idx** is used to convert a character into its ID.  The second **idx2char** converts an ID back into its character.

In [10]:
processed_text = raw_text.lower()
processed_text = re.sub(r'[^\x00-\x7f]',r'', processed_text) 
char_array = sorted(list(set(processed_text)))
char2idx = dict((n, v) for v, n in enumerate(char_array))
idx2char = dict((n, v) for n, v in enumerate(char_array))

The complete set of characters in *Treasure Island* is presented here:

In [11]:
'|'.join(char_array)

'\n|\r| |!|"|#|$|%|&|\'|(|)|*|,|-|.|/|0|1|2|3|4|5|6|7|8|9|:|;|?|@|[|]|_|a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z'

The stats on this text are shown here:

In [12]:
print(f"Total Characters: {len(raw_text)}")
print(f"Total Unique Used Characters: {len(char_array)}")

Total Characters: 397400
Total Unique Used Characters: 60


The complete lookup table for all the characters and IDs is shown here:

In [13]:
char2idx

{'\n': 0,
 '\r': 1,
 ' ': 2,
 '!': 3,
 '"': 4,
 '#': 5,
 '$': 6,
 '%': 7,
 '&': 8,
 "'": 9,
 '(': 10,
 ')': 11,
 '*': 12,
 ',': 13,
 '-': 14,
 '.': 15,
 '/': 16,
 '0': 17,
 '1': 18,
 '2': 19,
 '3': 20,
 '4': 21,
 '5': 22,
 '6': 23,
 '7': 24,
 '8': 25,
 '9': 26,
 ':': 27,
 ';': 28,
 '?': 29,
 '@': 30,
 '[': 31,
 ']': 32,
 '_': 33,
 'a': 34,
 'b': 35,
 'c': 36,
 'd': 37,
 'e': 38,
 'f': 39,
 'g': 40,
 'h': 41,
 'i': 42,
 'j': 43,
 'k': 44,
 'l': 45,
 'm': 46,
 'n': 47,
 'o': 48,
 'p': 49,
 'q': 50,
 'r': 51,
 's': 52,
 't': 53,
 'u': 54,
 'v': 55,
 'w': 56,
 'x': 57,
 'y': 58,
 'z': 59}

We are now ready to build the actual sequences.  Just like previous neural networks there will be an $x$ and $y$.  However, for the LSTM, $x$ and $y$ will both be sequences.  The $x$ input will specify the sequences where $y$ are the expected output.  The following code generates all possible sequences.

In [25]:
raw_x = []
raw_y = []

for i in range(0, len(processed_text) - CHAR_SEQ_LEN, 1):
    seq_input = processed_text[i:i + CHAR_SEQ_LEN]
    seq_expected = processed_text[i + CHAR_SEQ_LEN]
    raw_x.append([char2idx[ch] for ch in seq_input])
    raw_y.append(char2idx[seq_expected])

print("Total Sequences: ", len(raw_x))

Total Sequences:  397375


In [27]:
x = np.reshape(raw_x, (len(raw_x), CHAR_SEQ_LEN, 1))
x = x / float(len(char_array))
y = pd.get_dummies(raw_y)

In [28]:
x.shape

(397375, 25, 1)

In [29]:
y.shape

(397375, 60)

The dummy variables for $y$ are shown below.

In [9]:
y[0:10]

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,54,55,56,57,58,59,60,61,62,65
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Next, we create the neural network.  The primary feature of this neural network is the LSTM layer.  This allows the sequences to be processed.  

In [35]:
model = Sequential()
model.add(LSTM(256, input_shape=(x.shape[1], x.shape[2]), return_sequences=True))
model.add(LSTM(256))
model.add(Dropout(0.5))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

In [36]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_3 (LSTM)                (None, 25, 256)           264192    
_________________________________________________________________
lstm_4 (LSTM)                (None, 256)               525312    
_________________________________________________________________
dropout_1 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 60)                15420     
Total params: 804,924
Trainable params: 804,924
Non-trainable params: 0
_________________________________________________________________


Before we train the neural network, we ensure that there is not already a saved copy of the neural network from a previous train.  It can take up to several hours to train this network, depending on how fast your computer is.  If you have a GPU available, please make sure to use it.

In [None]:
model_filename = os.path.join('.','dnn','generate_text_char_network.hdf5')

if not os.path.exists(model_filename):
    model.fit(x, y, epochs=1, batch_size=64)
    model.save(model_filename)
else:
    model.load_weights(model_filename)

Epoch 1/1
  2624/397375 [..............................] - ETA: 18:37 - loss: 3.0942

Once the neural network is trained, it is saved.

Now that the neural network is trained, we are ready to generate text.

In [12]:
# pick a random seed
start = np.random.randint(0, len(raw_x)-1)
current = raw_x[start]
print(f"Attempt #{i}, starting point:")
print(''.join([idx2char[x] for x in current]))

print()
print("Generating text (character by character)...")
output = ""

for i in range(500):
    x = np.reshape(current, (1, len(current), 1)) / float(len(char_array))
    prediction = model.predict(x, verbose=0)
    idx = np.argmax(prediction)
    output += idx2char[idx]
    seq_in = [idx2char[v] for v in current]
    current.append(idx)
    current = current[1:len(current)]

print(output)

******************************
Attempt #0, starting point:
ers divided
between them

Generating text (character by character)...
 the soote of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the so
******************************
Attempt #1, starting point:
s agreement, the agreemen

Generating text (character by character)...
 of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe of the sooe