# Objective 01 - Describe Neural Networks Used for Modeling Sequences
## Overview
We have reached the last sprint of the core data science curriculum! In this unit so far, we have created and trained feed-forward neural networks. While we can do a lot with this type of neural network, some types of data work better with different architectures.

This module will explore recurrent neural networks (RNN), and a type of RNN called a long short-term memory (LSTM) network. These architectures are well suited for processing sequences and using them for many natural language processing tasks.

### Sequence
A sequence is a collection of objects (integers, floats, characters, tokens, and other data types) where you can repeat the order of matter and objects. A Python list is an example, as well as NumPy arrays. Many of the data structures we use are built on basic sequences.

### Time Series
A time series is a data where you have not just the order but some actual continuous marker for where the points lie “in time” - this could be a date, a timestamp, Unix time, or something else. Of course, all time series are also sequences, and for some techniques, you might consider the order of the sequence and not the separation (in time) of the entries.

### Recursion
In mathematics, recursion is defining objects based on previously defined other objects of the same type. In other words, recursion is something that happens when a thing calls itself one or more times.

For example, a recursive function calls itself and uses its previous terms to define subsequent terms. Pascal’s Triangle https://en.wikipedia.org/wiki/Pascal%27s_triangle is an example of using previous terms to calculate subsequent terms: each number is the sum of the two numbers directly above it.

In computer science, a recursive function calls itself from within its code.

### Recurrent Neural Networks (RNN)
Remember that a feed-forward neural network has an input layer and then some number of hidden layers. The output from each layer is fed into the next layer without any feedback. In contrast, with a recurrent neural network, there is a layer where the output from the nodes feeds back into itself. This layer is called the recurrent layer.

Simple RNNs have a weakness called the vanishing gradient problem: the recursive aspect sometimes results in the back-propagation gradients either exploding or becoming very small (vanishing). So what can we do?

### Long short-term memory (LSTM) network
To prevent the vanishing gradient problem, we can create a memory state within the network that adds to the gradients; this prevents them from becoming too small. You can learn more about the structure of the LSTM network in this article. https://adventuresinmachinelearning.com/recurrent-neural-networks-lstm-tutorial-tensorflow/ For now, we’ll focus on how to implement them and what types of problems they are suitable for.

## Follow Along
In this section, we’ll first look at the option in Keras for creating a simple neural network with a recurrent layer. The keras.layers.SimpleRNN is a fully connected RNN where the output from the previous time step is fed to the next time step.

In [2]:
##### USE THIS TO HIDE DEBUGGING LOGS
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 
# 0 = all messages are logged (default behavior)
# 1 = INFO messages are not printed
# 2 = INFO and WARNING messages are not printed
# 3 = INFO, WARNING, and ERROR messages are not printed

In [3]:
# Example: https://keras.io/guides/working_with_rnns/

# Imports
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Instantiate the model
model = keras.Sequential()
model.add(layers.Embedding(input_dim=1000, output_dim=64))

# The output of SimpleRNN will be a 2D tensor of shape (batch_size, 128)
model.add(layers.SimpleRNN(128))

# Add an additional hidden layer
model.add(layers.Dense(10))

# View the architecture
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_1 (Embedding)     (None, None, 64)          64000     
                                                                 
 simple_rnn_1 (SimpleRNN)    (None, 128)               24704     
                                                                 
 dense_1 (Dense)             (None, 10)                1290      
                                                                 
Total params: 89,994
Trainable params: 89,994
Non-trainable params: 0
_________________________________________________________________


In [None]:
### Output before using os.environ command
"""
2022-04-12 20:50:16.112964: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-04-12 20:50:16.113099: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding (Embedding)       (None, None, 64)          64000     
                                                                 
 simple_rnn (SimpleRNN)      (None, 128)               24704     
                                                                 
 dense (Dense)               (None, 10)                1290      
                                                                 
=================================================================
Total params: 89,994
Trainable params: 89,994
Non-trainable params: 0
_________________________________________________________________
2022-04-12 20:50:19.510435: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-04-12 20:50:19.510488: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-04-12 20:50:19.510529: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (Opti17): /proc/driver/nvidia/version does not exist
2022-04-12 20:50:19.511434: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags."""

Next, we can also create a network with a LSTM layer.

In [4]:
# Example: https://keras.io/guides/working_with_rnns/

# LSTM network example
model = keras.Sequential()
# Add an Embedding layer expecting input vocab of size 1000, and
# output embedding dimension of size 64.
model.add(layers.Embedding(input_dim=1000, output_dim=64))

# Add a LSTM layer with 128 internal units.
model.add(layers.LSTM(128))

# Add a Dense layer with 10 units.
model.add(layers.Dense(10))

model.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_2 (Embedding)     (None, None, 64)          64000     
                                                                 
 lstm (LSTM)                 (None, 128)               98816     
                                                                 
 dense_2 (Dense)             (None, 10)                1290      
                                                                 
Total params: 164,106
Trainable params: 164,106
Non-trainable params: 0
_________________________________________________________________


## Challenge  
Before class time, it would be good to review the Keras: Working with RNNs documentation. Ensure you know how to add a recurrent layer and the difference between a simple RNN and LSTM.

## Additional Resources  
Keras: Working with RNNs  https://keras.io/guides/working_with_rnns/  
Recurrent Neural Networks: LSTM Tutorial https://adventuresinmachinelearning.com/recurrent-neural-networks-lstm-tutorial-tensorflow/

# Objective 02 - Apply an LSTM to a Text Generation Problem Using Keras
## Overview
In the first part of this module, we generally learned why recurrent neural networks are a good choice for working with sequential data, such as text. Now, we will implement a specific type of RNN called a long short-term memory network (LSTM) to make text predictions.

LSTM networks are suitable for text prediction and generation because they can remember long sequences of data. So, let’s test out how to implement an LSTM network with text prediction.

## Follow Along
We’ll use text from Project Gutenberg https://www.gutenberg.org/ and use a portion of it to train the neural network. The novel is the Adventures of Sherlock Holmes by Arthur Conan Doyle; the shortened text used in the following analysis is also available here https://raw.githubusercontent.com/bloominstituteoftechnology/data-science-practice-datasets/main/unit_4/sherlock.txt

In [5]:
# Load the text
import requests

url = "https://raw.githubusercontent.com/bloominstituteoftechnology/data-science-practice-datasets/main/unit_4/sherlock.txt"
response = requests.get(url)
text = response.text

# Strip the \r\n characters
text = text.replace('\r\n', ' ')

We now have a single string of text. However, the neural network input needs to be numeric, so we must convert or encode the text as characters. We can create two look-up tables: character to integer and integer to character (to make predictions after training).

In [6]:
# Encode Data as Chars

# Find the unique characters
chars = list(set(text))

# Lookup tables
char_int = {c:i for i, c in enumerate(chars)} 
int_char = {i:c for i, c in enumerate(chars)}

print('The number of unique characters in the text:', len(chars))

The number of unique characters in the text: 91


Now we need to create sequences of the characters to train on.

In [7]:
# Create the sequence data
maxlen = 40
step = 5

# Encode the characters using the lookup tables
encoded = [char_int[c] for c in text]

# Initialize empty lists to hold the sequences
sequences = [] # Each element is 40 chars long
next_char = [] # One element for each sequence

# Loop through the entire text
for i in range(0, len(encoded) - maxlen, step): 
    sequences.append(encoded[i : i + maxlen])
    next_char.append(encoded[i + maxlen])

print('sequences: ', len(sequences))

sequences:  54974


And now that the text is processed, we can build our model! We’ll use a Keras utility to pad our sequences, so they are all the same length up to the maximum we specify. Then, we’ll create our feature and target arrays:

In [10]:
import tensorflow as tf
from tensorflow.keras.preprocessing import sequence

# Pad sequences so all are equal
seq = tf.keras.preprocessing.sequence.pad_sequences(sequences, maxlen=40)

# Create x & y
import numpy as np

# Create arrays of zeros (False)
x = np.zeros((len(sequences), maxlen, len(chars)), dtype=bool)
y = np.zeros((len(sequences), len(chars)), dtype=bool)

# Turn on the location (set to True) when the character is present
for i, sequence in enumerate(sequences):
    for t, char in enumerate(sequence):
        x[i,t,char] = 1

    y[i, next_char[i]] = 1

The model we will use has an input layer equal to the number of characters in our text, a hidden layer of 64 nodes, an LSTM layer of 64 nodes, and an output layer equal to the character set’s size. We are predicting one of the characters, so we need to reflect that in the output.

In [11]:
# Build the model: a single LSTM
from keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM
from tensorflow.keras.layers import Bidirectional, Embedding

model = Sequential()
model.add(Embedding(output_dim=64, input_dim=len(chars)))
model.add(Bidirectional(LSTM(64)))
model.add(Dense(len(chars), activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam')

Finally, let’s fit the model! We will choose a lower number of epochs for this text run because neural networks usually take some time to train. We can adjust the epochs later to see how our results change.

In [12]:
# Fit the model
model.fit(seq, y, batch_size=32,
          epochs=5, verbose=2)

Epoch 1/5
1718/1718 - 53s - loss: 2.5809 - 53s/epoch - 31ms/step
Epoch 2/5
1718/1718 - 50s - loss: 2.2156 - 50s/epoch - 29ms/step
Epoch 3/5
1718/1718 - 48s - loss: 2.0920 - 48s/epoch - 28ms/step
Epoch 4/5
1718/1718 - 49s - loss: 2.0058 - 49s/epoch - 28ms/step
Epoch 5/5
1718/1718 - 49s - loss: 1.9405 - 49s/epoch - 28ms/step


<keras.callbacks.History at 0x7f15d8abfa60>

Once we fit the model, we need to convert the numeric predictions back into characters, so that we can read it. We'll create a function to do this.

In [13]:
# Predict and convert text back into characters
def generate_text(model, seed, length):

  encoded = [char_int[c] for c in seed]

  generated = ''
  generated += seed
  model.reset_states()

  start_index = 0 

  for _ in range(length):

      sample = encoded[start_index:start_index+10]      
      sample = np.array(sample)
      sample = np.expand_dims(sample,0)

      pred = model.predict(sample)
      pred = tf.squeeze(pred, 0)
      next_char = np.argmax(pred)
      encoded.append(next_char)
      generated += int_char[next_char]

      start_index += 1

  return generated

In [14]:
# Set the seed text which the model will use to generate the predicted text
seed_text = "I have no data yet it is a capital mistake to theorise before one has data insensibly one begins to twist facts to suit theories"

generate_text(model, seed_text, 400)

'I have no data yet it is a capital mistake to theorise before one has data insensibly one begins to twist facts to suit theoriesmorhrtoa tn tn tnsosenenlaen ene ah the r ne teeore af  tes aorhrtn ert nee af  te eng ah thet etoree th thrdethe r nd ere yernahethetheeu   d yddtantdtneahe sotoathr t  an thhe hnn  y otneetot t  thhesntetnethe  hnhr  daheoheee  e sepootne toaeeded ed e dnsasotoooerdeeeeddedaou rr  ythhsndthee saootsodfhend  ethhhhee  eer er erareoaeosore n dddaansoeonk edth rdd d tnotncooen n unn   dddd  d nnnse'

Well, that is interesting! We have something resembling language, but the words don’t make any sense - I don’t know what an “aootttd” is, but it could be exciting! There also isn’t any punctuation or other structure in the text. But, we only trained the network for five epochs, which isn’t very many.

Let's increase that to 100 epochs and compare the output, using the same seed text.

In [19]:
# 100 TAKES 80 MINS! CANT DO THIS!
# Train with more epochs
model.fit(seq, y, batch_size=32,
          epochs=2, verbose=1)  # VERBOSE SET TO 0 IS VERY BAD!!!

Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x7f157447dd30>

In [None]:
seed_text = "I have no data yet it is a capital mistake to theorise before one has data insensibly one begins to twist facts to suit theories"

generate_text(model, seed_text, 400)

Now we can see that the text is starting to develop some structure, with punctuation and even a few words that seem more like words?

We kept this example simple so that you could see how to set up an LSTM for generating text. Usually, you would use more layers to capture the structure of the text better.

## Challenge  
Now it is up to you! Using the exact text and code above, add additional layers to the network and see if you can improve the text prediction.

You can even take it a step further and source a new text, load it, and process it in the same way, and see what your network can generate.

## Additional Resources  
Understanding LSTMs https://colah.github.io/posts/2015-08-Understanding-LSTMs/  
Text Generation Using Python https://www.analyticsvidhya.com/blog/2018/03/text-generation-using-python-nlp/  
Text Generation With LSTM Recurrent Neural Networks in Python with Keras 
https://machinelearningmastery.com/text-generation-lstm-recurrent-neural-networks-python-keras/