# Natural Language Processing (NLP)

This section is based on the following blog post by Andrej Karpathy:

[The Unreasonable Effectiveness of Recurrent Neural Networks, by Andrej Karpathy](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)

The code used by Karpathy for the article is on Github:

[https://github.com/karpathy/char-rnn](https://github.com/karpathy/char-rnn)

Basically, it is a **character-level language model**; astonishingly, the network will learn to create text, even being trained on a character level! 

The basic idea is shown on the following picture from Karpathy's blog post:

![Karpathy: An example RNN with 4-dimensional input and output layers, and a hidden layer of 3 units (neurons). The vocabulary is `[h,e,l,o]`](pics/charseq_karpathy.jpeg)
*Karpathy: An example RNN with 4-dimensional input and output layers, and a hidden layer of 3 units (neurons). The vocabulary is `[h,e,l,o]`.*

The current notebook is about creating a simplified project, similar to the one described in the article, with the following goal: Given a sequence of characters, predict the same sequence shifted one character: e.g., `[h,e,l,l] (input) -> [e,l,l,o] (prediction)`.

Some points to consider:
- We are going to use the complete works by Shakespeare for training. The reansons are: (1) we have more than one million characters in the text and (2) the text is very well structured. However, any long text could be used, look at [gutenberg.org](https://www.gutenberg.org)
- We are going to create a one-hot encoding for the alphabet characters and punctuation; then, we are going to use an embedding to compress those one-hot vectors.

Steps followed:
1. Load text/data; a large dataset with millions of characters is required
2. Text processing and vectorization: integers assigned to letterns and symbols (e.g., punctuation)
3. Create batches: create long enough sequences to learn relationships, but not too long to avoid noise
4. Crate the model: we'll have 3 layers
    - Embedding layer: one-hot encoding vectors are compressed to a smaller space of fixed size (dimensions)
    - GRU layer: a simplified version of LSTM units (i.e., with fewer parameters), which leads to better results (see RNN folder: `../19_07_Keras_RNN`)
    - Dense layer: probabilities per character
5. Train the model
6. Inference

### Embeddings

A nice description of what embeddings are is given in this video on the DotCSV Youtube channel:

[INTRO al Natural Language Processing (NLP) #2 - ¿Qué es un EMBEDDING?](https://www.youtube.com/watch?v=RkYuH_K7Fx4)

Embeddings are not exclusive to language, but are commonly used in it, thanks to approaches like `word2Vec`, published in

"Efficient Estimation of Word Representations in Vector Space", Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. 2013, Google.

The idea is that, first, **we create a one-hot encoding to represent our vocabulary** in order to start working on the text (**note that one-hot encoding can be also represented as categorical integers**); the size of the one-hot vector is the size of the vocabulary (i.e., the number of words, say 10k). This representation has several problems, such as:
- It is large and sparse.
- Words that are close to each other semantically ar ethe same dinstance apart as words that should be far away.

In order to solve those issues, a shallow neural net can be applied to the one-hot vectors to compressed them to a space with less dimensions (e.g., 300) but continuous values. For example, here we map a 7-dim vocabulary space to a 2D embedding space.

`[0,0,0,1,0,0,0] -> 4 (/7) -> [0.54, 0.01]`

The nice thing is that vectors in the embedding space that are close to each other are in the reality semantically close to each other. Thus, we could start applying typical algebra operations on them, in such a way that `V(king) - V(man) + V(woman)` should be close to `V(queen)`. We can also apply dimensionality reduction techniques (e.g., PCA) and visualize the words in the embedding (e.g., in 3D space).

One of the issues of embedding spaces in NLP is polysemia: when a words has different meanings and the context matters, the same vector should be split into different vectors. Research is being done to address that.

## 1. Load Text/Data

In [38]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [39]:
# This line is for Google Colab, ignore it if notebook tf version is >= 2
#%tensorflow_version 2.x
import tensorflow as tf

In [40]:
# This line is to check that the TF version is >= 2
# Important for Gooogle Colab
# If not, we can uncomment the magic command before importing tensorflow above
tf.__version__

'2.0.0'

In [41]:
with open('./shakespeare.txt','r') as f:
    #lines = f.readlines() 
    text = f.read()

In [42]:
# The text has symbols in it, such as \n
text[0:100]

"\n                     1\n  From fairest creatures we desire increase,\n  That thereby beauty's rose mi"

In [43]:
# If we print it, the symbols are interpreted
print(text[100100:100500])

houldst not abhor my state.
    If thy unworthiness raised love in me,
    More worthy I to be beloved of thee.


                     151
  Love is too young to know what conscience is,  
  Yet who knows not conscience is born of love?
  Then gentle cheater urge not my amiss,
  Lest guilty of my faults thy sweet self prove.
  For thou betraying me, I do betray
  My nobler part to my gross body's 


## 2. Text Vectorization

In [44]:
# We create a set with all characters and symbols
vocab = sorted(set(text))

In [45]:
vocab[0:10]

['\n', ' ', '!', '"', '&', "'", '(', ')', ',', '-']

In [46]:
# Number of characters/symbols we have - important for the final dense layer
len(vocab)

84

In [47]:
# Now, we need to bidirectionally associate each character in the vocabulary
# with a number (related to one-hot encoding):
# character <-> number
# We can do that with enumerate and dictionaries
for pair in enumerate(vocab):
    print(pair)

(0, '\n')
(1, ' ')
(2, '!')
(3, '"')
(4, '&')
(5, "'")
(6, '(')
(7, ')')
(8, ',')
(9, '-')
(10, '.')
(11, '0')
(12, '1')
(13, '2')
(14, '3')
(15, '4')
(16, '5')
(17, '6')
(18, '7')
(19, '8')
(20, '9')
(21, ':')
(22, ';')
(23, '<')
(24, '>')
(25, '?')
(26, 'A')
(27, 'B')
(28, 'C')
(29, 'D')
(30, 'E')
(31, 'F')
(32, 'G')
(33, 'H')
(34, 'I')
(35, 'J')
(36, 'K')
(37, 'L')
(38, 'M')
(39, 'N')
(40, 'O')
(41, 'P')
(42, 'Q')
(43, 'R')
(44, 'S')
(45, 'T')
(46, 'U')
(47, 'V')
(48, 'W')
(49, 'X')
(50, 'Y')
(51, 'Z')
(52, '[')
(53, ']')
(54, '_')
(55, '`')
(56, 'a')
(57, 'b')
(58, 'c')
(59, 'd')
(60, 'e')
(61, 'f')
(62, 'g')
(63, 'h')
(64, 'i')
(65, 'j')
(66, 'k')
(67, 'l')
(68, 'm')
(69, 'n')
(70, 'o')
(71, 'p')
(72, 'q')
(73, 'r')
(74, 's')
(75, 't')
(76, 'u')
(77, 'v')
(78, 'w')
(79, 'x')
(80, 'y')
(81, 'z')
(82, '|')
(83, '}')


In [48]:
# Following that, we create a dictionary with comprehension
char_to_ind = {char:ind for ind,char in enumerate(vocab)}

In [49]:
# Bidirectional association
ind_to_char = np.array(vocab)

In [50]:
char_to_ind['A']

26

In [51]:
ind_to_char[26]

'A'

In [52]:
# Now, with those two vectors, we can encodde our text!
encoded_text = np.array([char_to_ind[c] for c in text])

In [53]:
# We check that we have several millions of characters (necessary for good enough results)
encoded_text.shape

(5445609,)

In [54]:
# We can compare the regular text and the encoded one
text[500:1000]

"d buriest thy content,\n  And tender churl mak'st waste in niggarding:\n    Pity the world, or else this glutton be,\n    To eat the world's due, by the grave and thee.\n\n\n                     2\n  When forty winters shall besiege thy brow,\n  And dig deep trenches in thy beauty's field,\n  Thy youth's proud livery so gazed on now,\n  Will be a tattered weed of small worth held:  \n  Then being asked, where all thy beauty lies,\n  Where all the treasure of thy lusty days;\n  To say within thine own deep su"

In [55]:
encoded_text[500:1000]

array([59,  1, 57, 76, 73, 64, 60, 74, 75,  1, 75, 63, 80,  1, 58, 70, 69,
       75, 60, 69, 75,  8,  0,  1,  1, 26, 69, 59,  1, 75, 60, 69, 59, 60,
       73,  1, 58, 63, 76, 73, 67,  1, 68, 56, 66,  5, 74, 75,  1, 78, 56,
       74, 75, 60,  1, 64, 69,  1, 69, 64, 62, 62, 56, 73, 59, 64, 69, 62,
       21,  0,  1,  1,  1,  1, 41, 64, 75, 80,  1, 75, 63, 60,  1, 78, 70,
       73, 67, 59,  8,  1, 70, 73,  1, 60, 67, 74, 60,  1, 75, 63, 64, 74,
        1, 62, 67, 76, 75, 75, 70, 69,  1, 57, 60,  8,  0,  1,  1,  1,  1,
       45, 70,  1, 60, 56, 75,  1, 75, 63, 60,  1, 78, 70, 73, 67, 59,  5,
       74,  1, 59, 76, 60,  8,  1, 57, 80,  1, 75, 63, 60,  1, 62, 73, 56,
       77, 60,  1, 56, 69, 59,  1, 75, 63, 60, 60, 10,  0,  0,  0,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1, 13,  0,  1,  1, 48, 63, 60, 69,  1, 61, 70, 73, 75, 80,  1,
       78, 64, 69, 75, 60, 73, 74,  1, 74, 63, 56, 67, 67,  1, 57, 60, 74,
       64, 60, 62, 60,  1

## 3. Creating Batches

Some important notes:
- An input sequence needs to have enough characters to contain the general structures of the text: Shakespeare has lines of around 40 characters and a rhyme every second line; thus, we take a sequence length of `3x40 = 120`.
- The idea is that we feed in a sequence and the output is the same sequence except the first character and additionally the next character which is most probable given the input sequence.

In [56]:
# We analyze which sequence length to take
print(text[:500])


                     1
  From fairest creatures we desire increase,
  That thereby beauty's rose might never die,
  But as the riper should by time decease,
  His tender heir might bear his memory:
  But thou contracted to thine own bright eyes,
  Feed'st thy light's flame with self-substantial fuel,
  Making a famine where abundance lies,
  Thy self thy foe, to thy sweet self too cruel:
  Thou that art now the world's fresh ornament,
  And only herald to the gaudy spring,
  Within thine own bu


In [57]:
line = "From fairest creatures we desire increase"

In [58]:
len(line)

41

### Create Training Sequences

The complete text needs to be partitioned in sequences and for each input sequence we need to associate a ground truth output sequence for training:

`in: 'Hello, my name i' -> out: 'ello, my name is'`

In [59]:
seq_len = 120 # motivation explained above

In [60]:
total_num_seq = len(text)//(seq_len+1) # we ignore the remainder

In [61]:
total_num_seq

45005

In [62]:
# We create a TF dataset which we can slice as written above
char_dataset = tf.data.Dataset.from_tensor_slices(encoded_text)

In [63]:
# That TF dataset has several call functions
# .take() takes the number of characters we specify
# .batch() creates sequences of the size/length we specify
# For example:
for i in char_dataset.take(50):
    print(ind_to_char[i.numpy()])



 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1


 
 
F
r
o
m
 
f
a
i
r
e
s
t
 
c
r
e
a
t
u
r
e
s
 
w


In [64]:
# That TF dataset has several call functions
# .batch() creates the number of sequences we'd like
# we use seq_len+1 because we want to have both in and out texts in a sequence, shifted by a character
# drop_remainder=True means the final chars that do not fit in 120 are dropped
sequences = char_dataset.batch(seq_len+1, drop_remainder=True)

In [65]:
# Now that we have all sequences, we need to generate for each sequence
# the input and output/target text pairs to use during training
# For that, the following function is mapped to the sequences to obtain our dataset
def create_seq_targets(seq):
    input_txt = seq[:-1] # Hell
    target_txt = seq[1:] # ello
    return input_txt, target_txt

In [66]:
# Our dataset is a collection of pairs
# Each pair has the input and output sequences shifted by 1 character
dataset = sequences.map(create_seq_targets)

In [67]:
# We can takea pair and print it
for input_txt,target_txt in dataset.take(1):
    print(input_txt.numpy())
    print(''.join(ind_to_char[input_txt.numpy()]))
    print('\n')
    print(target_txt.numpy())
    print(''.join(ind_to_char[target_txt.numpy()]))

[ 0  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 12  0
  1  1 31 73 70 68  1 61 56 64 73 60 74 75  1 58 73 60 56 75 76 73 60 74
  1 78 60  1 59 60 74 64 73 60  1 64 69 58 73 60 56 74 60  8  0  1  1 45
 63 56 75  1 75 63 60 73 60 57 80  1 57 60 56 76 75 80  5 74  1 73 70 74
 60  1 68 64 62 63 75  1 69 60 77 60 73  1 59 64 60  8  0  1  1 27 76 75]

                     1
  From fairest creatures we desire increase,
  That thereby beauty's rose might never die,
  But


[ 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 12  0  1
  1 31 73 70 68  1 61 56 64 73 60 74 75  1 58 73 60 56 75 76 73 60 74  1
 78 60  1 59 60 74 64 73 60  1 64 69 58 73 60 56 74 60  8  0  1  1 45 63
 56 75  1 75 63 60 73 60 57 80  1 57 60 56 76 75 80  5 74  1 73 70 74 60
  1 68 64 62 63 75  1 69 60 77 60 73  1 59 64 60  8  0  1  1 27 76 75  1]
                     1
  From fairest creatures we desire increase,
  That thereby beauty's rose might never die,
  But 


### Shuffling the Batches of Sequences

After creating the in/out sequences, we need to create batches of those sequence pairs.
Addtionally, it is a good practice to shuffle those batches to avoid aóverfitting a section in the text.

In [68]:
# Batch size
batch_size = 128
# Buffer size: see documentation - shuffling happens in groups of buffer size (better behavior)
buffer_size = 10000
# Remainder is dropped
dataset = dataset.shuffle(buffer_size).batch(batch_size, drop_remainder=True)

In [69]:
dataset

<BatchDataset shapes: ((128, 120), (128, 120)), types: (tf.int64, tf.int64)>

## 4. Creating the Model

I understand that the model defined in the course is not the one proposed by Karpathy; instead, Portilla seems to take a similar architecture as in [DeepMoji](https://deepmoji.mit.edu), which is available on [Github](https://github.com/bfelbo/DeepMoji).

In our architecture, we have the following layers:
- An **embedding** layer which compresses our vocabulary (84D) to a smaller embedding space (64D); embeddings improve RNN performance for NLP
- RNN: LSTM units / **GRU** units: 1026
- A **Dense** layer of the size of the vocabulary

In [70]:
# Length of the vocabulary in chars
vocab_size = len(vocab)
# The embedding dimension
embed_dim = 64
# Number of RNN units
rnn_neurons = 1026

### Loss Function: Sparse Categorical Cross-Entropy

As stated in the following post

[Stackexchange](https://datascience.stackexchange.com/questions/41921/sparse-categorical-crossentropy-vs-categorical-crossentropy-keras-accuracy)

we sould "use **sparse categorical crossentropy** when your classes are mutually exclusive (e.g. when each sample belongs exactly to one class) and **categorical crossentropy** when one sample can have multiple classes or labels are soft probabilities".

In [71]:
from tensorflow.keras.losses import sparse_categorical_crossentropy

In [72]:
help(sparse_categorical_crossentropy)

Help on function sparse_categorical_crossentropy in module tensorflow.python.keras.losses:

sparse_categorical_crossentropy(y_true, y_pred, from_logits=False, axis=-1)



In [73]:
# Since the default sparse_categorical_crossentropy does not work with on-hot encoded parameters
# we need to create our own loss function
def sparse_cat_loss(y_true,y_pred):
    # Since we have one-hot encoding: from_logits=True
    return sparse_categorical_crossentropy(y_true, y_pred, from_logits=True)

### Model

In [74]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM,Dense,Embedding,Dropout,GRU

In [75]:
# We create the model with a custom function
def create_model(vocab_size, embed_dim, rnn_neurons, batch_size):
    model = Sequential()
    model.add(Embedding(vocab_size,embed_dim,batch_input_shape=[batch_size, None]))
    # Shift+TAB for documentation
    # Glorot stands for the Xavier initialization, after Xavier Glorot
    model.add(GRU(rnn_neurons,return_sequences=True,stateful=True,recurrent_initializer='glorot_uniform'))
    # Final Dense Layer to Predict
    model.add(Dense(vocab_size))
    # We use our custom loss function
    model.compile(optimizer='adam', loss=sparse_cat_loss) 
    return model

In [76]:
model = create_model(vocab_size = vocab_size,
                     embed_dim=embed_dim,
                     rnn_neurons=rnn_neurons,
                     batch_size=batch_size)

In [77]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (128, None, 64)           5376      
_________________________________________________________________
gru_1 (GRU)                  (128, None, 1026)         3361176   
_________________________________________________________________
dense_1 (Dense)              (128, None, 84)           86268     
Total params: 3,452,820
Trainable params: 3,452,820
Non-trainable params: 0
_________________________________________________________________


## 5. Training the Model

**Important note:** we see that our RNN model has around 3.5M parameters, which is not viable to train on a laptop. Instead, a computer with powerful GPUs is required. Therefore, **Google Colab** is used in the course.

Follow these instructions:
- Open [Google Colab](https://colab.research.google.com/) and log in
- On the same URL, choose to upload this file
- On the Colab notebook: left menu bar: files (folder icon) > upload dataset (`shakespeare.txt`) and any other file necessary (clicking on icon or just drag & drop; for instance the pre-trained model `shakespeare_gen.h5`)

Some notes on Google Colab:
- Note that Markdown images can be rendered if these are in gDrive and have a shareable link
- Have a look at the Colab examples: Welcome, Charts, Moounting Drives, etc.
- Note that the workspace is erased once we exit!
- Note that the workspace is not in our gDrive, but apparently a linux container is started; however, our notebooks are saved on our gDrive when we exit - but only our notebooks!
- Check that we are using TF version >= 2 (see beginning of the notebook)
- We can run all cells as always: Runtime > Run all
- Tools > Settings > Editor: TABs should be 4 spaces!

### Some Preliminary Tests

In [78]:
# We get a single batch, feed it to our model and get the output
# Since th emodel has not been trained yet, the output should be rubish
# Recall input and target have 128 sequences of length 120, each shifted 1 character
for input_example_batch, target_example_batch in dataset.take(1):
    example_batch_predictions = model(input_example_batch)

In [79]:
# Predictions are 128 sequences of length 120
# For each of the 120 characters in a sequence, we have 84 probablities
# associated to each of the 84 characters/vocabs.
# These probablities are also called logits.
example_batch_predictions.shape

TensorShape([128, 120, 84])

In [80]:
# We take the first sequence of the 128
# and convert its 120 characters to categorical values ranging 0-83;
# num_samples=1 means from all 84 we take the largest
# categorical() returns the index of the largest vocab/char
sampled_indices = tf.random.categorical(example_batch_predictions[0],num_samples=1)

In [81]:
sampled_indices

<tf.Tensor: id=5475, shape=(120, 1), dtype=int64, numpy=
array([[83],
       [25],
       [39],
       [53],
       [ 8],
       [ 6],
       [17],
       [61],
       [54],
       [42],
       [36],
       [28],
       [55],
       [53],
       [13],
       [72],
       [35],
       [10],
       [ 2],
       [27],
       [59],
       [73],
       [27],
       [44],
       [55],
       [46],
       [63],
       [76],
       [41],
       [44],
       [ 3],
       [56],
       [73],
       [ 7],
       [77],
       [58],
       [59],
       [34],
       [41],
       [73],
       [22],
       [ 4],
       [35],
       [45],
       [54],
       [61],
       [27],
       [43],
       [65],
       [33],
       [76],
       [47],
       [56],
       [ 4],
       [ 8],
       [83],
       [58],
       [42],
       [52],
       [ 2],
       [68],
       [68],
       [60],
       [16],
       [54],
       [27],
       [ 0],
       [58],
       [ 2],
       [53],
       [54],
       [18],
       

In [82]:
# We reshape the sampled_indices and convert them to a numpy array
sampled_indices = tf.squeeze(sampled_indices,axis=-1).numpy()

In [83]:
sampled_indices

array([83, 25, 39, 53,  8,  6, 17, 61, 54, 42, 36, 28, 55, 53, 13, 72, 35,
       10,  2, 27, 59, 73, 27, 44, 55, 46, 63, 76, 41, 44,  3, 56, 73,  7,
       77, 58, 59, 34, 41, 73, 22,  4, 35, 45, 54, 61, 27, 43, 65, 33, 76,
       47, 56,  4,  8, 83, 58, 42, 52,  2, 68, 68, 60, 16, 54, 27,  0, 58,
        2, 53, 54, 18, 68, 81, 71, 65, 69, 80, 54, 76, 67, 59, 62, 64, 39,
       29,  8, 80, 41, 55, 58, 21,  5, 76, 10, 18,  5,  8, 67, 11, 80, 57,
       67, 67, 61, 79, 46,  2, 43, 78, 69, 22, 19, 23, 49, 45, 29, 50, 70,
       14])

In [84]:
# We convert it to chars
# It's a set of random characters
ind_to_char[sampled_indices]

array(['}', '?', 'N', ']', ',', '(', '6', 'f', '_', 'Q', 'K', 'C', '`',
       ']', '2', 'q', 'J', '.', '!', 'B', 'd', 'r', 'B', 'S', '`', 'U',
       'h', 'u', 'P', 'S', '"', 'a', 'r', ')', 'v', 'c', 'd', 'I', 'P',
       'r', ';', '&', 'J', 'T', '_', 'f', 'B', 'R', 'j', 'H', 'u', 'V',
       'a', '&', ',', '}', 'c', 'Q', '[', '!', 'm', 'm', 'e', '5', '_',
       'B', '\n', 'c', '!', ']', '_', '7', 'm', 'z', 'p', 'j', 'n', 'y',
       '_', 'u', 'l', 'd', 'g', 'i', 'N', 'D', ',', 'y', 'P', '`', 'c',
       ':', "'", 'u', '.', '7', "'", ',', 'l', '0', 'y', 'b', 'l', 'l',
       'f', 'x', 'U', '!', 'R', 'w', 'n', ';', '8', '<', 'X', 'T', 'D',
       'Y', 'o', '3'], dtype='<U1')

### Training

In [85]:
# We need at least 30 epochs
# This models and dataset require approximately 1 min/epoch to train
epochs = 30

In [None]:
model.fit(dataset,epochs=epochs)

In [None]:
# Save our model
model.save('my_shakespeare_model.h5')

## 6. Inference

We can wait until the training finishes or interrupt the training load the custom model provided in the course.

In [87]:
from tensorflow.keras.models import load_model

In [88]:
# Loading the model: first we create model and the we load the weights
# Note that the batch size is now 1, we don't pass batches of 128 sequences!
# Therefore, be need to rebuild it
model = create_model(vocab_size = vocab_size,
                     embed_dim=embed_dim,
                     rnn_neurons=rnn_neurons,
                     batch_size=1)

In [91]:
#model.load_weights('./my_shakespeare_model.h5')
model.load_weights('./shakespeare_gen.h5')

In [92]:
# Then, we build it by passing the input shape
# Note that now it is different: we don't pass batches of 128 sequences!
model.build(tf.TensorShape([1,None]))

In [93]:
model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (1, None, 64)             5376      
_________________________________________________________________
gru_2 (GRU)                  (1, None, 1026)           3361176   
_________________________________________________________________
dense_2 (Dense)              (1, None, 84)             86268     
Total params: 3,452,820
Trainable params: 3,452,820
Non-trainable params: 0
_________________________________________________________________


### Text Generator Function

We carry out the inference inside a function `generate_text`, which receives the input text and the expected size of the returned text.

Note that the instructor uses a variable `temp` to scale the probabilites of each vocab/char in a prediction; `temp` should regulate the weirdness of the predictions. However, as I write in the code comments below, it seems useless to me as it is implemented.

**IMPORTANT NOTE:** See the section below (Single Character Predictions) to better understand what is happening in the function. Sizes/Shapes of the tensors are not that trivial for me without trying them.

In [188]:
def generate_text(model,start_seed,gen_size=500,temp=1.0):
    # Expected sequence of characters
    num_generate = gen_size
    # Convert chars to indices:
    # an array consisting of all the chars passed as seed is built
    input_eval = [char_to_ind[s] for s in start_seed]
    # Re-shape to have more dimensions
    input_eval = tf.expand_dims(input_eval,0)
    text_generated = []
    temperature = temp
    # Rest model states
    model.reset_states()
    for i in range(num_generate):
        # Forward pass:
        # The model predicts an array of char-indices of the same length as input_eval
        # input_eval is the whole seed at the beginning,
        # then just a character/symbol
        predictions = model(input_eval)
        # Reverse of expand_dims
        predictions = tf.squeeze(predictions,0)
        # Scaling all vocab predictions and choosing max
        # is the same as directly choosing max
        # I think temp is useless here?
        predictions = predictions/temperature
        # Get the index of the LAST predicted character
        # Therefore, if len(input_eval)>1, only the last predicted char/symbol is taken
        predicted_id = tf.random.categorical(predictions,num_samples=1)[-1,0].numpy()
        # Next input is current output!
        input_eval = tf.expand_dims([predicted_id],0)
        # Concatenate one by one the char/symbol predicted in each step of range(num_generate)
        text_generated.append(ind_to_char[predicted_id])
    # Return final text of length num_generate
    return (start_seed+"".join(text_generated))

In [189]:
print(generate_text(model,'ROME',gen_size=1000))

ROMEOCTINE]  O sleep, thy slips nor words, untung'd in this
    extravagaing; for the rebels' HERALD green with read

  KING JOHN. My lord-
  AARON. [Wounds] I have forswear my weapon.
  Marget EELEL. I hope, I shall remember so far afflicted  
    your father. Who was it bad she? Without my sword.
  HERMIONE. It stuck me.
  CLOWN. Ay, sir; but, my lords,
    Desire to piecis; nor ed down
    Moves not. Then there lies d before us seeming one anon. Captain Fluelles of fire,
    Allowing him to the Mount; which makes from heaven
    And razs a tear.
  DEMETRIUS. What my men day lie scars, provoke no Roman babe! There were surpris'd
          and Doot I prosperous on his own.
  Leon. Could she beat?
  Vere. Catescand up, my lord, I have no eyes.
    There she's a trick, thy father's top-O. What? Marcellus!
  AUTOLYCUS. Are you commit such a friend? Courage me command thou then defend
    That which loy sounsing you whose judgment sure
    Will cure not first, t a coronets. Fairest men to

### Single Character Predictions

In [179]:
word = 'Hello'

In [180]:
input_eval = [char_to_ind[s] for s in word]
input_eval = tf.expand_dims(input_eval,0)

In [181]:
# We pass the complete word to the model
predictions = model(input_eval)

In [182]:
# The model returns the same number of characters as the input word
predictions.shape

TensorShape([1, 5, 84])

In [183]:
# We remove the first dimension
predictions = tf.squeeze(predictions,0)

In [184]:
predictions.shape

TensorShape([5, 84])

In [185]:
# Categorical indices are taken for each character
predicted_id = tf.random.categorical(predictions,num_samples=1)

In [186]:
predicted_id

<tf.Tensor: id=513134, shape=(5, 1), dtype=int64, numpy=
array([[ 1],
       [ 1],
       [80],
       [10],
       [78]])>

In [187]:
# The last character is taken
predicted_id[-1,0].numpy()

78