<a href="https://colab.research.google.com/github/nisaac21/TensorFlow/blob/main/Recurrent_Neural_Network.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##**Natural Language Processing**

NLP is computing that deals with trying to understand actual human language. Some examples are translations, autofill, spellcheck, etc.

##Recurrent Neural Networks

In this tutorial we will introduce a new kind of neural network that is much more capable of processing sequential data such as text or characters called a **recurrent neural network** (RNN for short). 

They work by using *sequential memory*. When we learn the alphabet, we learn it as a sequence. So saying it in full is easy, but trying to say it backwards is very tough. Moreover, if we start at an arbitrary letter, it will be hard to get the next few letters but then the rest will follow quickly as we picked up on the pattern. 

##**Ecoding Textual Data**

##Bag of Words

In this algorithm, we create a vocabulary (list of words), and use a number that represents every single word. In a large data set, this can mean a very large dictionary. 
  * We only track words that we have seen, and how frequently they occur. We loose the order of the words, but we have the frequency. It's like throwing all the words that appear into a bag.

```I thought the movie was going to be bad, but it was actually amazing!```

```I thought the movie was going to be amazing, but it was actually bad!```

These two sentences have the exact same words and frequency, but have very different meanings. We loose that meaning using bag of words because these two would be encoded the same. 

In [None]:
vocab = {}  # maps word to integer representing it
word_encoding = 1
def bag_of_words(text):
  global word_encoding

  words = text.lower().split(" ")  # create a list of all of the words in the text, well assume there is no grammar in our text for this example
  bag = {}  # stores all of the encodings and their frequency

  for word in words:
    if word in vocab:
      encoding = vocab[word]  # get encoding from vocab
    else:
      vocab[word] = word_encoding
      encoding = word_encoding
      word_encoding += 1
    
    if encoding in bag:
      bag[encoding] += 1
    else:
      bag[encoding] = 1
  
  return bag

text = "this is a test to see if this test will work is is test a a"
bag = bag_of_words(text)
print(bag)
print(vocab)

{1: 2, 2: 3, 3: 3, 4: 3, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1}
{'this': 1, 'is': 2, 'a': 3, 'test': 4, 'to': 5, 'see': 6, 'if': 7, 'will': 8, 'work': 9}


##Integer Encoding

Let's say we want to keep the order. What if we just created a list of the string, with each integer representing a word. That way, we would have a list of integers and we don't loose the place of each word. 

In [None]:
vocab = {}  
word_encoding = 1
def one_hot_encoding(text):
  global word_encoding

  words = text.lower().split(" ") 
  encoding = []  

  for word in words:
    if word in vocab:
      code = vocab[word]  
      encoding.append(code) 
    else:
      vocab[word] = word_encoding
      encoding.append(word_encoding)
      word_encoding += 1
  
  return encoding

text = "this is a test to see if this test will work is is test a a"
encoding = one_hot_encoding(text)
print(encoding)
print(vocab)

[1, 2, 3, 4, 5, 6, 7, 1, 4, 8, 9, 2, 2, 4, 3, 3]
{'this': 1, 'is': 2, 'a': 3, 'test': 4, 'to': 5, 'see': 6, 'if': 7, 'will': 8, 'work': 9}


However, let's say we have a large vocabulary of 100,000 words. Let's say we have the word `happy` encoded as 1, and `good` encoded as 100,000. When we pass in a sentence that has happy vs good, the model is going to struggle to realize they are similar as they are so numerically far apart. 

The numbers we choose for each words is important. 

##Words Embedding

This encoder tries to find words that are similar and assign them similar values. This algorithm assigns a vector to each word, where each component will represent how similar the overall word is to each component word. 

This would be a layer in our model, so our model would actually learn how to embed the vocabulary. 

##**Recurrent Neural Networks (RNN's)**

The big difference between an RNN vs a Densly connected or Convolutional Neural Network is RNN's contain a loop within the layers. 

This gives the RNN an 'internal memory' of what is has already seen.  

![alt text](https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-unrolled.png)

What these variables stand for...

h<sub>t</sub> output at time t

x<sub>t</sub> input at time t

A Recurrent Layer (loop)

What we are doing is we process our first word to generate some output. We then use that output in conjuction with our second input/word to produce a better understanding of the two words together. 

What we've just looked at is called a **simple RNN layer**. It can be effective at processing shorter sequences of text for simple problems but has many downfalls associated with it. For longer strings of input, the RNN struggles to remember earlier words, and they 'fade out' of memory. There are some methods to combat this. 

##Long-Short Term Memory (LSTM)

What we do here is add another layer to track the internal state. With a simple RNN, all we do is use the previous output (which is all the previous words + the latest ones mushed together) to generate a new output. With LSTM, this other layer allows us to look back at the things we saw at the beginning (or remember the earlier parts). This allows us to make more useful predictions. 

It chooses what words/info are most important to remember, and keeps that in the internal state. So not only do we pass in the output from previous layer, we also pass in that internal state and combine it with the previous output. 


##**Building Sentiment Analysis Model**



In [None]:
%tensorflow_version 2.x  # this line is not required unless you are in a notebook
from keras.datasets import imdb
from keras.preprocessing import sequence
import keras
import tensorflow as tf
import os
import numpy as np

VOCAB_SIZE = 88584 # num of words we will include

MAXLEN = 250
BATCH_SIZE = 64

(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words = VOCAB_SIZE)

`%tensorflow_version` only switches the major version: 1.x or 2.x.
You set: `2.x  # this line is not required unless you are in a notebook`. This will be interpreted as: `2.x`.


TensorFlow 2.x selected.
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz


##Preprocessing

Our reviews come in different lengths. To pass it to our model, there needs to be length of 250 words. To fix this we will

* Trim off extra words if review is > 250 words
* Add neccessary 0's if review is < 250 words (*padding*)


In [None]:
train_data = sequence.pad_sequences(train_data, MAXLEN)
test_data = sequence.pad_sequences(test_data, MAXLEN)

In [None]:
train_data[1] # now array length is 250

array([    0,     0,     0,     0,     0,     0,     0,     0,     0,
           0,     0,     0,     0,     0,     0,     0,     0,     0,
           0,     0,     0,     0,     0,     0,     0,     0,     0,
           0,     0,     0,     0,     0,     0,     0,     0,     0,
           0,     0,     0,     0,     0,     0,     0,     0,     0,
           0,     0,     0,     0,     0,     0,     0,     0,     0,
           0,     0,     0,     0,     0,     0,     0,     1,   194,
        1153,   194,  8255,    78,   228,     5,     6,  1463,  4369,
        5012,   134,    26,     4,   715,     8,   118,  1634,    14,
         394,    20,    13,   119,   954,   189,   102,     5,   207,
         110,  3103,    21,    14,    69,   188,     8,    30,    23,
           7,     4,   249,   126,    93,     4,   114,     9,  2300,
        1523,     5,   647,     4,   116,     9,    35,  8163,     4,
         229,     9,   340,  1322,     4,   118,     9,     4,   130,
        4901,    19,

## Creating Model

In [None]:
model = tf.keras.Sequential([
  tf.keras.layers.Embedding(VOCAB_SIZE, 32), # creates vectors  
  tf.keras.layers.LSTM(32), # 32 shows the num of dimensions each vector has
  # and then implements the LSTM algo 
  tf.keras.layers.Dense(1, activation='sigmoid') 
  # since sigmoid moves values between 0-1, we can have a certain range
  # represent negative, and the rest of range represent positive
])

In [None]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, None, 32)          2834688   
                                                                 
 lstm (LSTM)                 (None, 32)                8320      
                                                                 
 dense (Dense)               (None, 1)                 33        
                                                                 
Total params: 2,843,041
Trainable params: 2,843,041
Non-trainable params: 0
_________________________________________________________________


##Training

In [None]:
model.compile(loss="binary_crossentropy",optimizer="rmsprop",metrics=['acc'])

history = model.fit(train_data, train_labels, epochs=10, validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [None]:
model.save('sentiment_analysis_FCC.h5') #h5 is unqiue to keras

Here we can see that after one epoch of training, the validation accuracy remains pretty consistent and doesn't improve. This means we would need to update the model and create a better one. 

In [None]:
results = model.evaluate(test_data, test_labels)
print(results)

[0.45747655630111694, 0.8535599708557129]


##Making Predictions

In [None]:
word_index = imdb.get_word_index() # shows us index of each word

# for preprocessing a line of text
def encode(text):
  # given some text, convert each word into a token (sepeartae each out)
  tokens = keras.preprocessing.text.text_to_word_sequence(text)
  # if the word that is in the toxen is in mapping, replace word with index, 
  # else put 0. 
  tokens = [word_index[word] if word in word_index else 0 for word in tokens]
  # Then pad token sequence (put words on a list, so just take first element)
  return sequence.pad_sequences([tokens], MAXLEN)[0]

text = 'that movie was just amazing, so amazing'
encoded = encode(text)
print(encoded)



Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json
[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0 

We can also make a decoder...

In [None]:
reverse_word_index = {value : key for (key, value) in word_index.items()}

def decode(encode):
  PAD = 0
  text = ""
  for num in encode:
    if num != PAD:
      text += reverse_word_index[num] + " "
  return text[: -1] # don't include last space 

print(decode(encoded))

that movie was just amazing so amazing


Now let's make a prediction

In [None]:
def predict(text):
  encoded_text = encode(text)
  pred = np.zeros((1,250)) # putting in format model expects 
  pred[0] = encoded_text
  result = model.predict(pred)
  print(result[0])

positive_review = "That movie was awesome! really loved it and would great watch it again because it was amazingly great"
predict(positive_review)

negative_review = "that movie sucked. I hated it and wouldn't watch it again. Was one of the worst things I've ever watched"
predict(negative_review)


[0.8437104]
[0.35995132]


For the positive review, let's remove the words so awesome. Let's see how that changes the text 

In [None]:
positive_review = "That movie was! really loved it and would great watch it again because it was amazingly great"
predict(positive_review)

[0.9321127]


Actually makes the prediction much better! This could be an indication of where our model is starting to fail

##**RNN Play Generator**

This model is predicting the most likely next character. We will train the model on a bunch of sequences from play Romeo and Juliet, so that it learns how to predict next character

In order to generate the play, we are going to recursively feed back the model's output back as an input to the model, and it will keep predicting next character until a play has been created. 

##Dataset

Getting the data from keras

In [None]:
## Could use any play
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt


Let's open the file and look at the contents

In [None]:
# Read, then decode for py2 compat.
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# length of text is the number of characters in it
print ('Length of text: {} characters'.format(len(text)))

Length of text: 1115394 characters


In [None]:
print(text[:250]) # let's look at the formatting

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.



##Preprocessing/Encoding

In [None]:
# We are simply encoding each character, not each word
# This makes our lives easier as there are a finite amount of characters

vocab = sorted(set(text)) #sorts all unique characters

char2idx = {u:i for i, u in enumerate(vocab)} # creates a mapping 
# 0 : first char
# 1 : second char...
idx2char = np.array(vocab) # sorts in same order

def text_to_int(text):
  return np.array([char2idx[c] for c in text])

text_as_int = text_to_int(text)

Let's see how first 14 lettters gets encoded

In [None]:
# lets look at how part of our text is encoded
print("Text:", text[:14])
print("Encoded:", text_to_int(text[:14]))

Text: First Citizen:
Encoded: [18 47 56 57 58  1 15 47 58 47 64 43 52 10]


In [None]:
def int_to_text(ints):
  try:
    ints = ints.numpy()
  except:
    pass
  return ''.join(idx2char[ints])

print(int_to_text(text_as_int[:14]))

First Citizen:


##Creating Training Examples

We have to feed in lines from the play, not a play as a whole. This means we needs to split our text data into shorter sequences 

We will prepare using a *seq_length* sequence as input and *seq_length* sequence as the output where that sequence is the original sequence shifted one letter to the right. For example:

```input: Hell | output: ello```




In [None]:
seq_length = 100
examples_per_epoch = len(text) // (seq_length+1)
# each input is 100, each output is 100, so need the extra
# 1 char

# let's create the dataset - coverts the string into characters
# creates a stream of characters
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

In [None]:
sequences = char_dataset.batch(seq_length + 1, drop_remainder=True)
# taking char data set and batching it, and drops excess char

Now we need to use sequences and split then into input and output

In [None]:
def split_input_target(chunk):  # for the example: hello
    input_text = chunk[:-1]  # hell
    target_text = chunk[1:]  # ello
    return input_text, target_text  # hell, ello

dataset = sequences.map(split_input_target)  # we use map to apply the above function to every entry

Let's make the training batches

In [None]:
BATCH_SIZE = 64
VOCAB_SIZE = len(vocab)  # vocab is number of unique characters
EMBEDDING_DIM = 256
RNN_UNITS = 1024

# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences,
# so it doesn't attempt to shuffle the entire sequence in memory. Instead,
# it maintains a buffer in which it shuffles elements).
BUFFER_SIZE = 10000

data = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

##Building the Model

We are going to create a function that creates the model. We are doing this because then we can create different sized batches. We will later see how the different inputs change the model's play

In [None]:
def build_model(vocab_size, embedding_dim,
                rnn_units, batch_size):
  model = tf.keras.Sequential(
      [
       tf.keras.layers.Embedding(vocab_size, embedding_dim,
                                 batch_input_shape=[batch_size, None]),
       tf.keras.layers.LSTM(rnn_units,
                              return_sequences=True, # returns each step
                              stateful=True,
                              recurrent_initializer='glorot_uniform'),
       tf.keras.layers.Dense(vocab_size)
  ])
  return model

model = build_model(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS, BATCH_SIZE)
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_1 (Embedding)     (64, None, 256)           16640     
                                                                 
 lstm_1 (LSTM)               (64, None, 1024)          5246976   
                                                                 
 dense_1 (Dense)             (64, None, 65)            66625     
                                                                 
Total params: 5,330,241
Trainable params: 5,330,241
Non-trainable params: 0
_________________________________________________________________


So everytime we pass the model an input, we are giving it ```BATCH_SIZE``` (in this case 64) examples. Each example is *seq_length* (in this case 100) long. 

##Creating a Loss Function

Let's first better understand what outputs our model will create



In [None]:
for input_example_batch, target_example_batch in data.take(1):
  example_batch_predictions = model(input_example_batch)  # ask our model for a prediction on our first batch of training data (64 entries)
  print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")  # print out the output shape

(64, 100, 65) # (batch_size, sequence_length, vocab_size)


In [None]:
# we can see that the predicition is an array of 64 arrays, one for each entry in the batch
print(len(example_batch_predictions))
print(example_batch_predictions)

64
tf.Tensor(
[[[ 2.37315334e-03  1.82198198e-03  3.33774509e-03 ... -2.33500893e-03
    2.69794627e-03 -1.27459422e-03]
  [-1.81131903e-03  4.91416408e-03  4.56461636e-03 ...  6.58889953e-03
    1.08547322e-03 -1.08275074e-03]
  [-3.19248601e-03  3.11992393e-04  4.96673537e-03 ...  6.92240894e-03
   -1.29812455e-04 -3.64137604e-03]
  ...
  [ 1.09894527e-02 -1.44951773e-04 -3.06584145e-04 ...  2.06695660e-03
    1.41441193e-03 -6.38324209e-03]
  [ 8.73680972e-03 -3.89432476e-04 -7.09367450e-05 ...  4.30803606e-03
    8.56750179e-03 -1.08958241e-02]
  [ 8.17521755e-03 -1.17851396e-05  1.82223608e-04 ...  5.59174456e-03
    1.34385852e-02 -1.46059860e-02]]

 [[ 1.96748297e-03  4.46033711e-03 -4.84104035e-03 ... -3.74602107e-03
    3.01892683e-03  1.90614932e-03]
  [ 3.02096759e-03 -2.93492200e-03 -2.39786645e-03 ... -7.61793368e-03
   -6.82042155e-04  2.64850468e-03]
  [ 6.30768389e-03 -4.92296042e-03 -1.98721024e-03 ... -1.05367443e-02
    1.37845008e-03  4.04957682e-03]
  ...
  [ 6.398

In [None]:
# lets examine one prediction
pred = example_batch_predictions[0]
print(len(pred))
print(pred)
# notice this is a 2d array of length 100, where each 
# interior array is the prediction for the next character at each time step

100
tf.Tensor(
[[ 2.3731533e-03  1.8219820e-03  3.3377451e-03 ... -2.3350089e-03
   2.6979463e-03 -1.2745942e-03]
 [-1.8113190e-03  4.9141641e-03  4.5646164e-03 ...  6.5888995e-03
   1.0854732e-03 -1.0827507e-03]
 [-3.1924860e-03  3.1199239e-04  4.9667354e-03 ...  6.9224089e-03
  -1.2981246e-04 -3.6413760e-03]
 ...
 [ 1.0989453e-02 -1.4495177e-04 -3.0658414e-04 ...  2.0669566e-03
   1.4144119e-03 -6.3832421e-03]
 [ 8.7368097e-03 -3.8943248e-04 -7.0936745e-05 ...  4.3080361e-03
   8.5675018e-03 -1.0895824e-02]
 [ 8.1752175e-03 -1.1785140e-05  1.8222361e-04 ...  5.5917446e-03
   1.3438585e-02 -1.4605986e-02]], shape=(100, 65), dtype=float32)


In [None]:
# and finally well look at a prediction at the first timestep
time_pred = pred[0]
print(len(time_pred))
print(time_pred)
# and of course its 65 values representing the probabillity 
# of each character occuring next after first timestep 

65
tf.Tensor(
[ 2.3731533e-03  1.8219820e-03  3.3377451e-03  2.1350312e-03
  3.9269557e-04 -4.8913374e-03 -1.0782480e-03  2.8815230e-03
  3.5782913e-03 -1.7917999e-03 -2.8477262e-03 -1.7907147e-03
 -5.6958352e-03 -1.9960057e-03  3.0832691e-03  6.6348707e-04
  8.6161662e-03 -2.8482198e-03  4.8803799e-03  1.3197982e-03
  2.0636069e-03 -1.6747762e-03 -2.8662696e-03  1.7697064e-03
 -1.2123330e-03 -1.3534589e-03 -3.8454062e-03  3.0904014e-03
 -1.9760297e-03  2.5335930e-03 -1.3763786e-03 -8.2892813e-03
 -4.0954938e-03 -3.2084566e-04  4.3786964e-03  2.8965986e-04
  4.6115010e-03 -3.7874305e-03 -9.2757848e-04  6.1617964e-03
  1.8520345e-03 -2.6874247e-03  2.7117007e-03  1.6301543e-04
  8.3347550e-04 -8.3342100e-05 -2.3495283e-03 -8.9189614e-04
 -1.4239028e-03 -1.0999684e-04  1.0424952e-03  2.9350801e-03
 -3.5442787e-03  5.3198184e-03 -4.4141044e-03  3.6278425e-03
  4.7752494e-03  1.4177580e-04  4.7610453e-03  3.9290949e-03
 -4.7006039e-04 -9.4811909e-04 -2.3350089e-03  2.6979463e-03
 -1.274594

In [None]:
# If we want to determine the predicted character we need to 
# sample the output distribution (pick a value based on probabillity)
sampled_indices = tf.random.categorical(pred, num_samples=1)

# now we can reshape that array and convert all
#  the integers to numbers to see the actual characters
sampled_indices = np.reshape(sampled_indices, (1, -1))[0] 
# we are sampling the distribution, instead of just picking largest %
predicted_chars = int_to_text(sampled_indices)

predicted_chars  
# and this is what the model predicted for training sequence 1

"mXwV'dImIprLhiqmkA gR!yeICxf$FARxNh?ovyT'rwgXu!Y?XskSbtD-Jc:NfGqupYtAhYBgdOXv ZsKq EjXphjdyg MHXM?oS"

This is why we have to build our own loss function,  because our model is creating a very very sepcific output 

In [None]:
def loss(labels, logits):
  return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

##Compiling the Model

In [None]:
model.compile(optimizer='adam', loss=loss)

We can create checkpoints for the model as it trains. This allow us to load the model from a checkpoint and continue training it

In [None]:
# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

##Training Model

In [None]:
history = model.fit(data, epochs=50, callbacks=[checkpoint_callback])

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


##Loading the Model

After we have trained the model, we want to rebuild the modelfrom the latest checkpoint (therby loading the weights). 

In [None]:
model = build_model(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS, batch_size=1)

In [None]:
model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))
model.build(tf.TensorShape([1, None]))

We can load **any checkpoint** we want by specifying the exact file to load.

In [None]:
checkpoint_num = 10
model.load_weights(tf.train.load_checkpoint("./training_checkpoints/ckpt_" + str(checkpoint_num)))
model.build(tf.TensorShape([1, None]))

AttributeError: ignored

##Generating Text

In [None]:
def generate_text(model, start_string):
  # Evaluation step (generating text using the learned model)

  # Number of characters to generate
  num_generate = 800

  # Converting our start string to numbers (vectorizing)
  input_eval = [char2idx[s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0) #makes a nested list

  # Empty string to store our results
  text_generated = []

  # Low temperatures results in more predictable text.
  # Higher temperatures results in more surprising text.
  # Experiment to find the best setting.
  temperature = 1.0

  # Here batch size == 1
  model.reset_states() # reseting the states, so it doesn't remember last 
  # state after generating data 
  for i in range(num_generate):
      predictions = model(input_eval)
      # remove the batch dimension
    
      predictions = tf.squeeze(predictions, 0) #Removes exterior dimension

      # using a categorical distribution to predict 
      # the character returned by the model
      predictions = predictions / temperature
      predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

      # We pass the predicted character as the next input to the model
      # along with the previous hidden state
      input_eval = tf.expand_dims([predicted_id], 0)

      text_generated.append(idx2char[predicted_id])

  return (start_string + ''.join(text_generated)) # return the generated play

In [None]:
inp = input("Type a starting string: ")
print(generate_text(model, inp))