## Reference

Module 2: Introduction to TensorFlow - https://colab.research.google.com/drive/1F_EWVKa8rbMXi3_fG0w7AtcscFq7Hi7B#forceEdit=true&sandboxMode=true
<br>📗 Module 3: Core Learning Algorithms - https://colab.research.google.com/drive/15Cyy2H7nT40sGR7TBN5wBvgTd57mVKay#forceEdit=true&sandboxMode=true
<br>📘 Module 4: Neural Networks with TensorFlow -   
https://colab.research.google.com/drive/1m2cg3D1x3j5vrFc-Cu0gMvc48gWyCOuG#forceEdit=true&sandboxMode=true
<br>📙 Module 5: Deep Computer Vision - https://colab.research.google.com/drive/1ZZXnCjFEOkp_KdNcNabd14yok0BAIuwS#forceEdit=true&sandboxMode=true
<br>📔 Module 6: Natural Language Processing with RNNs -  https://colab.research.google.com/drive/1ysEKrw_LE2jMndo1snrZUh5w87LQsCxk#forceEdit=true&sandboxMode=true
<br>📒 Module 7: Reinforcement Learning - https://colab.research.google.com/drive/1IlrlS3bB8t1Gd5Pogol4MIwUxlAjhWOQ#forceEdit=true&sandboxMode=true


# Natural Language Processing 
Natural Language Processing (or NLP for short) is a discipline in computing that deals with the communication between natural (human) languages and computer languages. A common example of NLP is something like spellcheck or autocomplete. Essentially NLP is the field that focuses on how computers can understand and/or process natural/human languages. 

### Recurrent Neural Networks

In this tutorial we will introduce a new kind of neural network that is much more capable of processing sequential data such as text or characters called a **recurrent neural network** (RNN for short). 

We will learn how to use a reccurent neural network to do the following:
- Sentiment Analysis
- Character Generation 

RNN's are complex and come in many different forms so in this tutorial we wil focus on how they work and the kind of problems they are best suited for.



### 1. Bag of words
Just store frequency of words, not order

### 2. Word Embedding
 This method keeps the order of words intact as well as encodes similar words with very similar labels. It attempts to not only encode the frequency and order of words but the meaning of those words in the sentence. It encodes each word as a dense vector that represents its context in the sentence.

## Recurrent Neural Networks (RNN's)
Now that we've learned a little bit about how we can encode text it's time to dive into recurrent neural networks. Up until this point we have been using something called **feed-forward** neural networks. This simply means that all our data is fed forwards (all at once) from left to right through the network. This was fine for the problems we considered before but won't work very well for processing text. After all, even we (humans) don't process text all at once. We read word by word from left to right and keep track of the current meaning of the sentence so we can understand the meaning of the next word. Well this is exaclty what a recurrent neural network is designed to do. When we say recurrent neural network all we really mean is a network that contains a loop. A RNN will process one word at a time while maintaining an internal memory of what it's already seen. This will allow it to treat words differently based on their order in a sentence and to slowly build an understanding of the entire input, one word at a time.

This is why we are treating our text data as a sequence! So that we can pass one word at a time to the RNN.

Let's have a look at what a recurrent layer might look like.

![alt text](https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-unrolled.png)
*Source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/*

Let's define what all these variables stand for before we get into the explination.

**h<sub>t</sub>** output at time t

**x<sub>t</sub>** input at time t

**A** Recurrent Layer (loop)

What this diagram is trying to illustrate is that a recurrent layer processes words or input one at a time in a combination with the output from the previous iteration. So, as we progress further in the input sequence, we build a more complex understanding of the text as a whole.

What we've just looked at is called a **simple RNN layer**. It can be effective at processing shorter sequences of text for simple problems but has many downfalls associated with it. One of them being the fact that as text sequences get longer it gets increasingly difficult for the network to understand the text properly.



## LSTM
The layer we dicussed in depth above was called a *simpleRNN*. However, there does exist some other recurrent layers (layers that contain a loop) that work much better than a simple RNN layer. The one we will talk about here is called LSTM (Long Short-Term Memory). This layer works very similarily to the simpleRNN layer but adds a way to access inputs from any timestep in the past. Whereas in our simple RNN layer input from previous timestamps gradually disappeared as we got further through the input. With a LSTM we have a long-term memory data structure storing all the previously seen inputs as well as when we saw them. This allows for us to access any previous value we want at any point in time. This adds to the complexity of our network and allows it to discover more useful relationships between inputs and when they appear. 

For the purpose of this course we will refrain from going any further into the math or details behind how these layers work.



## Sentiment Analysis
And now time to see a recurrent neural network in action. For this example, we are going to do something called sentiment analysis.

The formal definition of this term from Wikipedia is as follows:

*the process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer's attitude towards a particular topic, product, etc. is positive, negative, or neutral.*

The example we’ll use here is classifying movie reviews as either postive, negative or neutral.

*This guide is based on the following tensorflow tutorial: https://www.tensorflow.org/tutorials/text/text_classification_rnn*



### Movie Review Dataset
Well start by loading in the IMDB movie review dataset from keras. This dataset contains 25,000 reviews from IMDB where each one is already preprocessed and has a label as either positive or negative. Each review is encoded by integers that represents how common a word is in the entire dataset. For example, a word encoded by the integer 3 means that it is the 3rd most common word in the dataset.
 




In [1]:

from keras.datasets import imdb
from keras.preprocessing import sequence
import keras
import tensorflow as tf
import os
import numpy as np

VOCAB_SIZE = 88584

MAXLEN = 250
BATCH_SIZE = 64

(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words = VOCAB_SIZE)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz


In [2]:
len(train_data[1])

189

### More Preprocessing
If we have a look at some of our loaded in reviews, we'll notice that they are different lengths. This is an issue. We cannot pass different length data into our neural network. Therefore, we must make each review the same length. To do this we will follow the procedure below:
- if the review is greater than 250 words then trim off the extra words
- if the review is less than 250 words add the necessary amount of 0's to make it equal to 250.

Luckily for us keras has a function that can do this for us:




In [3]:
train_data=sequence.pad_sequences(train_data,MAXLEN)
test_data=sequence.pad_sequences(test_data,MAXLEN)

In [4]:
len(train_data[1])

250

### Creating the Model
Now it's time to create the model. We'll use a word embedding layer as the first layer in our model and add a LSTM layer afterwards that feeds into a dense node to get our predicted sentiment. 

32 stands for the output dimension of the vectors generated by the embedding layer. We can change this value if we'd like!

In [5]:
model=tf.keras.Sequential([
    tf.keras.layers.Embedding(VOCAB_SIZE,32),
    tf.keras.layers.LSTM(32),
    tf.keras.layers.Dense(1,activation='sigmoid')
])

2022-09-03 07:26:45.623383: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-03 07:26:45.771073: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-03 07:26:45.772346: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-03 07:26:45.773968: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compil

In [6]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, None, 32)          2834688   
_________________________________________________________________
lstm (LSTM)                  (None, 32)                8320      
_________________________________________________________________
dense (Dense)                (None, 1)                 33        
Total params: 2,843,041
Trainable params: 2,843,041
Non-trainable params: 0
_________________________________________________________________


In [7]:
model.compile(loss="binary_crossentropy",optimizer="rmsprop",metrics=['accuracy'])
history=model.fit(train_data,train_labels,epochs=10,validation_split=0.2)

Epoch 1/10


2022-09-03 07:26:48.864741: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2022-09-03 07:26:51.315153: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8005


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [8]:
#model.save("lstm_model")
#or
model.save("lstm.h5")

In [9]:
new_model = tf.keras.models.load_model('lstm.h5')

In [10]:
results=new_model.evaluate(test_data,test_labels)
print(results)

[0.5596256852149963, 0.8424400091171265]


In [11]:
results=model.evaluate(test_data,test_labels)
print(results)

[0.5596256852149963, 0.8424400091171265]


### Making Predictions
Now let’s use our network to make predictions on our own reviews. 

Since our reviews are encoded well need to convert any review that we write into that form so the network can understand it. To do that well load the encodings from the dataset and use them to encode our own data.




In [12]:
word_index=imdb.get_word_index()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json


In [13]:
for i in range(10):
    print(list(word_index.keys())[i],':',list(word_index.values())[i])

fawn : 34701
tsukino : 52006
nunnery : 52007
sonja : 16816
vani : 63951
woods : 1408
spiders : 16115
hanging : 2345
woody : 2289
trawling : 52008


In [14]:
def encode_text(text):
    tokens=keras.preprocessing.text.text_to_word_sequence(text)
    tokens=[word_index[word] if word in word_index else 0 for word in tokens]
    return sequence.pad_sequences([tokens],MAXLEN)[0]

In [15]:
text="that movie was amazing, i have to watch it again"
encoded=encode_text(text)
print(encoded)

[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0  12  17  13 477  10  25   

In [16]:
# Decode function that converts itegers to text

reverse_word_index={value:key for (key,value) in word_index.items()}

def decode_integers(integers):
    PAD=0
    text=""
    for num in integers:
        if num!=PAD:
            text+=reverse_word_index[num] +" "
            
    return text[:-1]

print(decode_integers(encoded))

that movie was amazing i have to watch it again


In [17]:
def predict(text):
    encoded_text=encode_text(text)
    pred=encoded_text.reshape(1,250) #converting vector to 2d
    result=model.predict(pred)
    print(result[0])

In [18]:
positive_review="That was a good movie, i will definitely watch it again"
predict(positive_review)

negative_review="Don't waste your time watching this movie, so disappointing"
predict(negative_review)

[0.9020452]
[0.3575097]


## RNN Play Generator

Now time for one of the coolest examples we've seen so far. We are going to use a RNN to generate a play. We will simply show the RNN an example of something we want it to recreate and it will learn how to write a version of it on its own. We'll do this using a character predictive model that will take as input a variable length sequence and predict the next character. We can use the model many times in a row with the output from the last predicition as the input for the next call to generate a sequence.


*This guide is based on the following: https://www.tensorflow.org/tutorials/text/text_generation*

### Loading Your Own Data
To load your own data, you'll need to upload a file from the dialog below. Then you'll need to follow the steps from above but load in this new file instead.



In [19]:
#from google.colab import files
#path_to_file = list(files.upload().keys())[0]

In [20]:
#Load data from keras
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt


In [21]:
text=open(path_to_file,'rb').read().decode(encoding='utf-8')
print("Length of text : ",len(text))

Length of text :  1115394


In [22]:
print(text[:250])

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.



### Encoding
Since this text isn't encoded yet well need to do that ourselves. We are going to encode each unique character as a different integer.


In [23]:
vocab=sorted(set(text))

#Creating mapping from text to index
char2idx={u:i for i,u in enumerate(vocab)}
idx2char=np.array(vocab)

def text_to_int(text):
    return np.array([char2idx[t] for t in text])

text_as_int=text_to_int(text)

In [24]:
print('Text:',text[0:13])
print('Encoded:',text_to_int(text[:13]))

Text: First Citizen
Encoded: [18 47 56 57 58  1 15 47 58 47 64 43 52]


In [25]:
# Convert int to text
def int_to_text(ints):
    try:
        ints=ints.numpy()
    except:
        pass
    return ''.join(idx2char[ints])

print(int_to_text(text_to_int(text[:13])))

First Citizen


### Creating Training Examples
Remember our task is to feed the model a sequence and have it return to us the next character. This means we need to split our text data from above into many shorter sequences that we can pass to the model as training examples. 

The training examples we will prepapre will use a *seq_length* sequence as input and a *seq_length* sequence as the output where that sequence is the original sequence shifted one letter to the right. For example:

```input: Hell | output: ello```

Our first step will be to create a stream of characters from our text data.

In [26]:
seq_length = 100  # length of sequence for a training example
examples_per_epoch = len(text)//(seq_length+1)

# Create training examples / targets
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

In [27]:
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

In [28]:
def split_input_target(chunk):  # for the example: hello
    input_text = chunk[:-1]  # hell
    target_text = chunk[1:]  # ello
    return input_text, target_text  # hell, ello

dataset = sequences.map(split_input_target)  # we use map to apply the above function to every entry

In [29]:
for x, y in dataset.take(2):
    print("\n\nEXAMPLE\n")
    print("INPUT")
    print(int_to_text(x))
    print("\nOUTPUT")
    print(int_to_text(y))



EXAMPLE

INPUT
First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You

OUTPUT
irst Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You 


EXAMPLE

INPUT
are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you 

OUTPUT
re all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you k


In [30]:
BATCH_SIZE = 64
VOCAB_SIZE = len(vocab)  # vocab is number of unique characters
EMBEDDING_DIM = 256
RNN_UNITS = 1024

# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences,
# so it doesn't attempt to shuffle the entire sequence in memory. Instead,
# it maintains a buffer in which it shuffles elements).
BUFFER_SIZE = 10000

data = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

In [31]:
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
    model = tf.keras.Sequential([
      tf.keras.layers.Embedding(vocab_size, embedding_dim,
                                batch_input_shape=[batch_size, None]),
      tf.keras.layers.LSTM(rnn_units,
                        return_sequences=True,
                        stateful=True,
                        recurrent_initializer='glorot_uniform'),
      tf.keras.layers.Dense(vocab_size)
    ])
    return model

model = build_model(VOCAB_SIZE,EMBEDDING_DIM, RNN_UNITS, BATCH_SIZE)
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (64, None, 256)           16640     
_________________________________________________________________
lstm_1 (LSTM)                (64, None, 1024)          5246976   
_________________________________________________________________
dense_1 (Dense)              (64, None, 65)            66625     
Total params: 5,330,241
Trainable params: 5,330,241
Non-trainable params: 0
_________________________________________________________________


 (64, None, 256)  means 64 is batch size, None is length of sequence which we dont know, 256 is amount of values in vector

### Creating a Loss Function
Now we are going to create our own loss function for this problem. This is because our model will output a (64, sequence_length, 65) shaped tensor that represents the probability distribution of each character at each timestep for every sequence in the batch. 



However, before we do that let's have a look at a sample input and the output from our untrained model. This is so we can understand what the model is giving us.



In [32]:
for input_example_batch, target_example_batch in data.take(1):
    example_batch_predictions = model(input_example_batch)  # ask our model for a prediction on our first batch of training data (64 entries)
    print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")  # print out the output shape

(64, 100, 65) # (batch_size, sequence_length, vocab_size)


In [33]:
# we can see that the predicition is an array of 64 arrays, one for each entry in the batch

print(example_batch_predictions.shape)
print(len(example_batch_predictions))
print(example_batch_predictions)

(64, 100, 65)
64
tf.Tensor(
[[[-1.12358527e-03 -3.45355854e-03  1.42945966e-03 ... -1.18742906e-03
    2.80821067e-03 -2.70476239e-03]
  [ 7.55981775e-04  1.24701718e-03  4.51116823e-03 ... -3.01754498e-03
    2.00428907e-03 -3.70528735e-03]
  [-2.49436591e-04  1.47628644e-03 -4.98011941e-05 ... -9.40290629e-04
    3.30234016e-03 -2.04246351e-03]
  ...
  [ 2.48998683e-03  2.34583020e-03 -4.11398709e-04 ... -1.13506960e-02
    5.53064654e-03  8.57450999e-03]
  [ 1.84601638e-03  6.17627101e-03 -3.46798589e-03 ... -8.71742796e-03
    5.53322351e-03  5.44113899e-03]
  [ 1.48150418e-03  8.82135704e-03 -6.70037325e-03 ... -7.16813980e-03
    5.65900235e-03  3.62027343e-03]]

 [[ 1.85315963e-03  6.10149000e-04  2.04263348e-03 ... -3.28315771e-04
   -2.23206868e-03  3.16602318e-03]
  [ 4.75203339e-03  2.55833170e-03  1.99522823e-04 ... -2.02545128e-03
   -1.65152200e-03  1.21150841e-03]
  [-4.12548753e-03  3.39484308e-03  5.75516373e-03 ... -3.70519795e-03
   -4.47241170e-03  2.67887837e-03]
 

In [34]:
# lets examine one prediction
pred = example_batch_predictions[0]
print(len(pred))
print(pred)
# notice this is a 2d array of length 100, where each interior array is the prediction for the next character at each time step

100
tf.Tensor(
[[-1.1235853e-03 -3.4535585e-03  1.4294597e-03 ... -1.1874291e-03
   2.8082107e-03 -2.7047624e-03]
 [ 7.5598178e-04  1.2470172e-03  4.5111682e-03 ... -3.0175450e-03
   2.0042891e-03 -3.7052874e-03]
 [-2.4943659e-04  1.4762864e-03 -4.9801194e-05 ... -9.4029063e-04
   3.3023402e-03 -2.0424635e-03]
 ...
 [ 2.4899868e-03  2.3458302e-03 -4.1139871e-04 ... -1.1350696e-02
   5.5306465e-03  8.5745100e-03]
 [ 1.8460164e-03  6.1762710e-03 -3.4679859e-03 ... -8.7174280e-03
   5.5332235e-03  5.4411390e-03]
 [ 1.4815042e-03  8.8213570e-03 -6.7003733e-03 ... -7.1681398e-03
   5.6590023e-03  3.6202734e-03]], shape=(100, 65), dtype=float32)


In [35]:
# and finally well look at a prediction at the first timestep
time_pred = pred[0]
print(len(time_pred))
print(time_pred)
# and of course its 65 values representing the probabillity of each character occuring next

65
tf.Tensor(
[-0.00112359 -0.00345356  0.00142946  0.00684959 -0.00137744  0.00638073
  0.00023113 -0.00249013  0.00149533  0.0017729   0.0002724  -0.0043676
 -0.0013104  -0.0067938   0.00112225  0.00284601 -0.00162212 -0.00316528
  0.00158318  0.00307079  0.0013574  -0.00245419 -0.00376285 -0.00384478
 -0.00577022  0.00125079 -0.00043798  0.00064073  0.00261784 -0.00095391
 -0.00143128  0.00193866  0.00103428  0.00312065 -0.00038292  0.00440326
  0.00044238  0.00328891  0.00083899 -0.00053602  0.00224771  0.0037993
 -0.00345692  0.00157511 -0.00553754  0.00318208  0.00219149 -0.00225484
  0.0041493   0.00148823 -0.00683836  0.00069711  0.00282644 -0.00128041
  0.0044215  -0.00142566  0.00201439 -0.001533    0.00263311 -0.00377581
  0.00632846  0.00294136 -0.00118743  0.00280821 -0.00270476], shape=(65,), dtype=float32)


In [36]:
# If we want to determine the predicted character we need to sample the output distribution (pick a value based on probabillity)
sampled_indices = tf.random.categorical(pred, num_samples=1)

# now we can reshape that array and convert all the integers to numbers to see the actual characters
sampled_indices = np.reshape(sampled_indices, (1, -1))[0]
predicted_chars = int_to_text(sampled_indices)

predicted_chars  # and this is what the model predicted for training sequence 1

"H'Y:.jk$y\nyomIc ;&!G\nfdNzWJh\nxOv$NS?RcNTpNbRBp;Tq.WEe'SFycnAJRl3BDFYNVqYAhVE ERM&3wHTd,GyNW.RRcp-;Tc"

In [37]:
def loss(labels, logits):
    return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

In [38]:
model.compile(optimizer='adam', loss=loss)

### Creating Checkpoints
Now we are going to setup and configure our model to save checkpoinst as it trains. This will allow us to load our model from a checkpoint and continue training it.

In [39]:
# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

In [40]:
#More eopchs will have better result, no overfitting here, in ex 50 or more
history = model.fit(data, epochs=2, callbacks=[checkpoint_callback])

Epoch 1/2
Epoch 2/2


### Loading the Model
We'll rebuild the model from a checkpoint using a batch_size of 1 so that we can feed one peice of text to the model and have it make a prediction.

In [41]:
model = build_model(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS, batch_size=1)

We can load **any checkpoint** we want by specifying the exact file to load.

In [42]:
model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))
model.build(tf.TensorShape([1, None]))

We can load **any checkpoint** we want by specifying the exact file to load.

In [43]:
#checkpoint_num = 10
#model.load_weights(tf.train.load_checkpoint("./training_checkpoints/ckpt_" + str(checkpoint_num)))
#model.build(tf.TensorShape([1, None]))

### Generating Text
Now we can use the lovely function provided by tensorflow to generate some text using any starting string we'd like.

In [44]:
def generate_text(model, start_string):
      # Evaluation step (generating text using the learned model)

      # Number of characters to generate
    num_generate = 800

    # Converting our start string to numbers (vectorizing)
    input_eval = [char2idx[s] for s in start_string]
    input_eval = tf.expand_dims(input_eval, 0)

    # Empty string to store our results
    text_generated = []

  # Low temperatures results in more predictable text.
  # Higher temperatures results in more surprising text.
  # Experiment to find the best setting.
    temperature = 1.0

  # Here batch size == 1
    model.reset_states()
    for i in range(num_generate):
        predictions = model(input_eval)
      # remove the batch dimension
    
        predictions = tf.squeeze(predictions, 0)

      # using a categorical distribution to predict the character returned by the model
        predictions = predictions / temperature
        predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

      # We pass the predicted character as the next input to the model
      # along with the previous hidden state
        input_eval = tf.expand_dims([predicted_id], 0)

        text_generated.append(idx2char[predicted_id])

    return (start_string + ''.join(text_generated))

In [45]:
#inp=input('Type starting string')
inp="Romeo said"
print(generate_text(model,inp))

Romeo said.

ANLALUS:

KICG EDWY:
But is, with I ghey birtsel'd sivy if has upooster;
But surpivy a provarid, and peiling will his daunting, and our aran to&k I to alpusatees: to-back, year marnary
He have her benatpet. Ferring!,

MONENSE:
Gwidge's,
A known bifreade in each his beligainst purecan sweat,
Aging no beforl the mysirn; and trie suvil say,
Aut fat in this, in. Will, I deay nocesten:
Where if trengle my weats, an thet dees.
A,
Why, arour, groml gruse. and, and thus with so,
Allow I stink de sheel to my lifo,
Butonigh ot mart?

MastiCK:

Vise: G, weare our UXUENLET:
Mastear, twith with you s use; stan
eveith to prink I youbfet,
Their inse the my cincuctly fais,
Atis plienid, and you Lasty trke a your herpent as with spolan's, I twate yean but fearly
And you his oundrech
And now plechap gryo


### Words are here are not perfect, you can improve it by training on more epochs