# Lab 11-1: Sequence Model of RNN
## Exercise: Character-based Text Generation
The explanation markdowns of this exercise are mostly from a COURSERA notebook.<br>
https://github.com/enggen/Deep-Learning-Coursera/

Recurrent Neural Networks (RNN) are very effective for Natural Language Processing and other sequence tasks because they have "memory". They can read inputs $x^{\langle t \rangle}$ (such as words) one at a time, and remember some information/context through the hidden layer activations that get passed from one time-step to the next. This allows a unidirectional RNN to take information from the past to process later inputs. A bidirectional RNN can take context from both the past and the future. 

**Notation**:
- Superscript $[l]$ denotes an object associated with the $l^{th}$ layer. 

- Superscript $(i)$ denotes an object associated with the $i^{th}$ example. 

- Superscript $\langle t \rangle$ denotes an object at the $t^{th}$ time-step. 
    
- **Sub**script $i$ denotes the $i^{th}$ entry of a vector.

Example:  
- $a^{(2)[3]<4>}_5$ denotes the activation of the 2nd training example (2), 3rd layer [3], 4th time step <4>, and 5th entry in the vector.

### Import libraries & Dataset

In [None]:
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
import time

(ds, ), ds_info = tfds.load(name='tiny_shakespeare', split=['train'], with_info=True)

ds = ds.map(lambda x: tf.strings.unicode_split(x['text'], 'UTF-8'))
for sample in ds.take(1): 
    text_str = sample

print(ds_info)
print('Total number of characters:', len(text_str))

[1mDownloading and preparing dataset tiny_shakespeare/1.0.0 (download: Unknown size, generated: 1.06 MiB, total: 1.06 MiB) to /root/tensorflow_datasets/tiny_shakespeare/1.0.0...[0m


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]





0 examples [00:00, ? examples/s]

Shuffling and writing examples to /root/tensorflow_datasets/tiny_shakespeare/1.0.0.incompleteM9EI3N/tiny_shakespeare-train.tfrecord


  0%|          | 0/1 [00:00<?, ? examples/s]

0 examples [00:00, ? examples/s]

Shuffling and writing examples to /root/tensorflow_datasets/tiny_shakespeare/1.0.0.incompleteM9EI3N/tiny_shakespeare-validation.tfrecord


  0%|          | 0/1 [00:00<?, ? examples/s]

0 examples [00:00, ? examples/s]

Shuffling and writing examples to /root/tensorflow_datasets/tiny_shakespeare/1.0.0.incompleteM9EI3N/tiny_shakespeare-test.tfrecord


  0%|          | 0/1 [00:00<?, ? examples/s]

[1mDataset tiny_shakespeare downloaded and prepared to /root/tensorflow_datasets/tiny_shakespeare/1.0.0. Subsequent calls will reuse this data.[0m
tfds.core.DatasetInfo(
    name='tiny_shakespeare',
    version=1.0.0,
    description='40,000 lines of Shakespeare from a variety of Shakespeare's plays. Featured in Andrej Karpathy's blog post 'The Unreasonable Effectiveness of Recurrent Neural Networks': http://karpathy.github.io/2015/05/21/rnn-effectiveness/.

To use for e.g. character modelling:

```
d = tfds.load(name='tiny_shakespeare')['train']
d = d.map(lambda x: tf.strings.unicode_split(x['text'], 'UTF-8'))
# train split includes vocabulary for other splits
vocabulary = sorted(set(next(iter(d)).numpy()))
d = d.map(lambda x: {'cur_char': x[:-1], 'next_char': x[1:]})
d = d.unbatch()
seq_len = 100
batch_size = 2
d = d.batch(seq_len)
d = d.batch(batch_size)
```',
    homepage='https://github.com/karpathy/char-rnn/blob/master/data/tinyshakespeare/input.txt',
    features=FeaturesDict(

### Building Vocabulary
Character vocabulary is a dictionary for characters vs corresponding indexes.

In [None]:
vocabulary = sorted(set(text_str.numpy()))
# train split includes vocabulary for other splits

char_to_index = dict((char, index) for index, char in enumerate(vocabulary))

index_to_char = {}
for key, value in char_to_index.items():
    index_to_char[value] = key

print('The number of character indexs:', len(vocabulary))
print('Character to Index Mapping:\n', char_to_index)

The number of character indexs: 65
Character to Index Mapping:
 {b'\n': 0, b' ': 1, b'!': 2, b'$': 3, b'&': 4, b"'": 5, b',': 6, b'-': 7, b'.': 8, b'3': 9, b':': 10, b';': 11, b'?': 12, b'A': 13, b'B': 14, b'C': 15, b'D': 16, b'E': 17, b'F': 18, b'G': 19, b'H': 20, b'I': 21, b'J': 22, b'K': 23, b'L': 24, b'M': 25, b'N': 26, b'O': 27, b'P': 28, b'Q': 29, b'R': 30, b'S': 31, b'T': 32, b'U': 33, b'V': 34, b'W': 35, b'X': 36, b'Y': 37, b'Z': 38, b'a': 39, b'b': 40, b'c': 41, b'd': 42, b'e': 43, b'f': 44, b'g': 45, b'h': 46, b'i': 47, b'j': 48, b'k': 49, b'l': 50, b'm': 51, b'n': 52, b'o': 53, b'p': 54, b'q': 55, b'r': 56, b's': 57, b't': 58, b'u': 59, b'v': 60, b'w': 61, b'x': 62, b'y': 63, b'z': 64}


### Building Training Data Pipeline

In [None]:
# Code to show how data is prepared

ids_from_chars = tf.keras.layers.StringLookup(vocabulary=list(vocabulary), mask_token=None)
chars_from_ids = tf.keras.layers.StringLookup(vocabulary=ids_from_chars.get_vocabulary(), invert=True, mask_token=None)

text_ids = ids_from_chars(text_str)

n_voca = len(ids_from_chars.get_vocabulary())

seq_len = 128

ds_ids = tf.data.Dataset.from_tensor_slices(text_ids).batch(seq_len+1, drop_remainder=True)

ds_xy = ds_ids.map(lambda x: {'cur_char': x[:-1], 'next_char': x[1:]})

buff_size = 10000
batch_size = 2
ds_train = ds_xy.shuffle(buff_size).batch(batch_size, drop_remainder=True)

ds_train

<BatchDataset element_spec={'cur_char': TensorSpec(shape=(2, 128), dtype=tf.int64, name=None), 'next_char': TensorSpec(shape=(2, 128), dtype=tf.int64, name=None)}>

In [None]:
def prepare_dataset(sample, n_classes):
    X = tf.keras.utils.to_categorical(sample['cur_char'], num_classes=n_classes)
    y = tf.keras.utils.to_categorical(sample['next_char'], num_classes=n_classes)
    return X, y

def text_from_ids(ids):
  return tf.strings.reduce_join(chars_from_ids(ids), axis=-1)

# Test Training Dataset Pipeline
for sample in ds_train.take(1):
    X, y = prepare_dataset(sample, n_voca)
    X = np.argmax(X, axis=-1)
    y = np.argmax(y, axis=-1)
    print("Input :", text_from_ids(X[0]).numpy())
    print("Target:", text_from_ids(y[0]).numpy())

Input : b'ious lord, I come but for mine own.\n\nKING RICHARD II:\nYour own is yours, and I am yours, and all.\n\nHENRY BOLINGBROKE:\nSo far be '
Target: b'ous lord, I come but for mine own.\n\nKING RICHARD II:\nYour own is yours, and I am yours, and all.\n\nHENRY BOLINGBROKE:\nSo far be m'


### Dimensions of Variables

#### Input with $n_x$ number of units
* For a single input example, $x^{(i)}$ is a one-dimensional input vector.
* Using language as an example, a language with a 5000 word vocabulary could be one-hot encoded into a vector that has 5000 units.  So $x^{(i)}$ would have the shape (5000,).  
* We'll use the notation $n_x$ to denote the number of units in a single training example.

#### Time steps of size $T_{x}$
* A recurrent neural network has multiple time steps, which we'll index with $t$.
* In the lessons, we saw a single training example $x^{(i)}$ (a vector) pass through multiple time steps $T_x$.  For example, if there are 10 time steps, $T_{x} = 10$

#### Batches of size $b$
* Let's say we have mini-batches, each with 20 training examples.  
* To benefit from vectorization, we'll stack 20 columns of $x^{(i)}$ examples into a 2D array (a matrix).
* For example, this tensor has the shape (20,5000). 
* We'll use $b$ to denote the number of training examples.  
* So the shape of a mini-batch is $(b,n_x)$

#### 3D Tensor of shape $(b,T_{x},n_{x})$
* The 3-dimensional tensor $x$ of shape $(b,T_x,n_x)$ represents the input $x$ that is fed into the RNN.

#### Taking a 2D slice for each time step: $x^{\langle t \rangle}$
* At each time step, we'll use a mini-batches of training examples (not just a single example).
* So, for each time step $t$, we'll use a 2D slice of shape $(b,n_x)$.
* We're referring to this 2D slice as $x^{\langle t \rangle}$.  The variable name in the code is `xt`.

### Definition of hidden state $h$

* The activation $h^{\langle t \rangle}$ that is passed to the RNN from one time step to another is called a "hidden state."

### Dimensions of hidden state $h$

* Similar to the input tensor $x$, the hidden state for a single training example is a vector of length $n_{h}$.
* If we include a mini-batch of $b$ training examples, the shape of a mini-batch is $(b,n_{h})$.
* When we include the time step dimension, the shape of the hidden state is $(b,T_x,n_{h})$
* We will loop through the time steps with index $t$, and work with a 2D slice of the 3D tensor.  
* We'll refer to this 2D slice as $h^{\langle t \rangle}$. 
* In the code, the variable names we use are either `ht_prev` or `ht_next`, depending on the function that's being implemented.
* The shape of this 2D slice is $(b,n_{h})$

### Dimensions of prediction $\hat{y}$
* Similar to the inputs and hidden states, $\hat{y}$ is a 3D tensor of shape $(b,T_{y},n_{y})$.
    * $b$: number of examples in a mini-batch.
    * $T_{y}$: number of time steps in the prediction.
    * $n_{y}$: number of units in the vector representing the prediction.
* For a single time step $t$, a 2D slice $\hat{y}^{\langle t \rangle}$ has shape $(b,n_{y})$.
* In the code, the variable names are:
    - `yhat`: $\hat{y}$ 

Check the training data dimension

In [None]:
for sample in ds_train.take(1):
    X, y = prepare_dataset(sample, n_voca)

print('Input  Shape:', X.shape)
print('Output Shape:', y.shape)

Input  Shape: (2, 128, 66)
Output Shape: (2, 128, 66)


**Exercise**: Implement the RNN-cell.

**Instructions**:
1. Compute the hidden state with tanh activation: $h^{\langle t \rangle} = \tanh(W_{hh} h^{\langle t-1 \rangle} + W_{hx} x^{\langle t \rangle} + b_h)$.
2. Using your new hidden state $h^{\langle t \rangle}$, compute the prediction $\hat{y}^{\langle t \rangle} = softmax(W_{yh} h^{\langle t \rangle} + b_y)$.
3. Repeat until the loop reaches $T_y$.
3. Return $h$ and $\hat{y}$

#### Additional Hints
* [numpy.tanh](https://www.google.com/search?q=numpy+tanh&rlz=1C5CHFA_enUS854US855&oq=numpy+tanh&aqs=chrome..69i57j0l5.1340j0j7&sourceid=chrome&ie=UTF-8)
* We've created a `softmax` function that you can use. 
* For matrix multiplication, use [numpy.matmul](https://docs.scipy.org/doc/numpy/reference/generated/numpy.matmul.html)



### RNN Class Definition

Define functions for RNN Class

In [None]:
def softmax(arr):
    earr = np.exp(arr)
    esum = np.sum(earr, axis=-1, keepdims=True)
    return earr / esum

def outer(a, b):
    a = np.expand_dims(a,-1)
    b = np.expand_dims(b,-2)
    return np.matmul(a, b)

Define RNN Class

In [None]:
class vanilla_rnn:

    def __init__(self, n_out, n_hidden, n_inp):
        # hyper parameters
        self.n_inp = n_inp
        self.n_hidden = n_hidden
        self.n_out = n_out
        # model parameters
        self.Whx = np.zeros((n_hidden, n_inp))
        self.Wyh = np.zeros((n_out, n_hidden))
        self.Whh = np.zeros((n_hidden, n_hidden))  # (h_curr, h_prev) = (h,hp)
        self.bh  = np.zeros((n_hidden, ))
        self.by  = np.zeros((n_out, ))

    def forward(self, xt, hprev, k1):
        T = k1
        n_B, n_T, _ = xt.shape

        # Initialize with zeros; Find approrpiate dimensions
        ### START CODE HERE ### 

        ht = np.zeros((n_B, T, self.n_hidden))                # initialize ht
        yhat = np.zeros((n_B, T, self.n_out))              # initialize yhat
        ht_prev = hprev           # initialize ht_prev

        ### END CODE HERE ###

        for t in range(T):
            ### START CODE HERE ### 

            # [(b,i) x (h,i).T = (b,h)] + [(b,hp) x (h,hp).T = (b,h)] + (h,)
            ht[:,t] = np.tanh(np.matmul(ht_prev, self.Whh.T) + np.matmul(xt[:,t], self.Whx.T) + self.bh)       # calculate current hidden state
            # [(b,h) x (o,h).T = (b,o)] + (o,)
            yhat[:,t] = softmax(np.matmul(ht[:,t], self.Wyh.T) + self.by)     # calculate output
            ht_prev = ht[:,t]       # save hidden state for next iteration

            ### END CODE HERE ###

        return ht, yhat

    def backward(self, xt, ht, dy, k2):
        T = k2
        n_B, _, _ = xt.shape
        
        # Initialize with zeros; Find approrpiate dimensions
        ### START CODE HERE ### 

        dWhx = np.zeros((n_B, self.n_hidden, self.n_inp))              # initialize gradient variables 
        dWhh = np.zeros((n_B, self.n_hidden, self.n_hidden))              # initialize gradient variables 
        dWyh = np.zeros((n_B, self.n_out, self.n_hidden))              # initialize gradient variables 
        dbh = np.zeros((n_B, self.n_hidden))               # initialize gradient variables 
        dby = np.zeros((n_B, self.n_out))               # initialize gradient variables 
        dhnext = np.zeros((n_B, self.n_hidden))            # var to pass the gradient for next iter

        ### END CODE HERE ###
    
        for t in reversed(range(T)):
            ### START CODE HERE ### 
            
            dWyh += outer(dy[:,t], ht[:,t])         # outer: (o, h) <= (o,) x (,h)
            dby += dy[:,t]
            # dht has gradients flowed from output and the next cell
            dht = np.matmul(dy[:,t], self.Wyh) + dhnext           # backprop into h from out & hnext
            dtanh = dht * (1 - np.square(ht[:,t]))         # backprop through tanh
            dbh += dtanh
            dWhx += outer(dtanh, xt[:,t])         # (h, i) <= (h,) x (,i)
            dWhh += outer(dtanh, ht[:,t-1])         # (h, h) <= (h,) x (,h)
            dhnext = np.matmul(dtanh, self.Whh)        # pass the gradient to next iter

            ### END CODE HERE ###

        # to mitigate gradient explosion, clip the gradients
        for dp in [dWhx, dWhh, dWyh, dbh, dby]:
            np.clip(dp, -10, 10, out=dp)
    
        # average for batch dimension
        dWhx = np.mean(dWhx, axis=0)
        dWhh = np.mean(dWhh, axis=0)
        dWyh = np.mean(dWyh, axis=0)
        dbh = np.mean(dbh, axis=0)
        dby = np.mean(dby, axis=0)

        return dWhx, dWhh, dWyh, dbh, dby

Test forward path

In [None]:
np.random.seed(1)

rnn_cell = vanilla_rnn(2,5,3)

x_tmp = np.random.randn(10,4,3)     # n_batch=10, n_seq=4, n_input=3
h_prev_tmp = np.random.randn(10,5)  # n_batch=10, n_hidden=5

rnn_cell.Whh = np.random.randn(5,5)
rnn_cell.Whx = np.random.randn(5,3)
rnn_cell.Wyh = np.random.randn(2,5)
rnn_cell.bh = np.random.randn(5)
rnn_cell.by = np.random.randn(2)

h_next_tmp, yt_pred_tmp = rnn_cell.forward(x_tmp, h_prev_tmp, 4)  # n_seq=4

print("h_next[1,t,4] = \n", h_next_tmp[1,:,4])
print("h_next.shape = \n", h_next_tmp.shape)
print("yt_pred[3,t,1] =\n", yt_pred_tmp[3,:,1])
print("yt_pred.shape = \n", yt_pred_tmp.shape)

h_next[1,t,4] = 
 [ 0.99859767  0.72269806 -0.99831123 -0.99998484]
h_next.shape = 
 (10, 4, 5)
yt_pred[3,t,1] =
 [0.08039465 0.81391132 0.07988095 0.02803803]
yt_pred.shape = 
 (10, 4, 2)


**Expected Output**:

```Python
h_next[1,t,4] = 
 [ 0.99859767  0.72269806 -0.99831123 -0.99998484]
h_next.shape = 
 (10, 4, 5)
yt_pred[3,t,1] =
 [0.08039465 0.81391132 0.07988095 0.02803803]
yt_pred.shape = 
 (10, 4, 2)
```

Test backward path

In [None]:
np.random.seed(1)

rnn_cell = vanilla_rnn(2,5,3)

x_tmp = np.random.randn(10,4,3)  # n_batch=10, n_seq=4, n_input=3
h0_tmp = np.random.randn(10,5)   # n_batch=10, n_hidden=5

rnn_cell.Whx = np.random.randn(5,3)
rnn_cell.Whh = np.random.randn(5,5)
rnn_cell.Wyh = np.random.randn(2,5)
rnn_cell.bh = np.random.randn(5)
rnn_cell.by = np.random.randn(2)

h_tmp, y_tmp = rnn_cell.forward(x_tmp, h0_tmp, 4)  # n_seq=4

dy_tmp = np.random.randn(10,4,2)

dWhx, dWhh, dWyh, dbh, dby = rnn_cell.backward(x_tmp, h_tmp, dy_tmp, 4)  # n_seq=4

print("dWhx[3][1] =", dWhx[3,1])
print("dWhx.shape =", dWhx.shape)
print("dWhh[1][2] =", dWhh[1,2])
print("dWhh.shape =", dWhh.shape)
print("dbh[4] =", dbh[4])
print("dbh.shape =", dbh.shape)
print("dby[1] =", dby[1])
print("dby.shape =", dby.shape)

dWhx[3][1] = -1.1757864340551145
dWhx.shape = (5, 3)
dWhh[1][2] = -0.07090670752981793
dWhh.shape = (5, 5)
dbh[4] = -0.7891177327315935
dbh.shape = (5,)
dby[1] = -0.24507440509183692
dby.shape = (2,)


**Expected Output**:
```Python
dWhx[3][1] = -1.1757864340551143
dWhx.shape = (5, 3)
dWhh[1][2] = -0.07090670752981783
dWhh.shape = (5, 5)
dbh[4] = -0.7891177327315935
dbh.shape = (5,)
dby[1] = -0.24507440509183692
dby.shape = (2,)
```

Create RNN and Initilize its weights

In [None]:
# Parameter defined before:
#   n_voca = len(ids_from_chars.get_vocabulary()) -> 66
#   seq_len = 128
#   batch_size = 2

n_hidden = 256 # n_embedding

RNN = vanilla_rnn(n_out=n_voca, n_hidden=n_hidden, n_inp=n_voca)

RNN.Whx = np.random.uniform(-np.sqrt(1./n_voca), np.sqrt(1./n_voca), (n_hidden, n_voca))
RNN.Wyh = np.random.uniform(-np.sqrt(1./n_hidden), np.sqrt(1./n_hidden), (n_voca, n_hidden))
RNN.Whh, _ = np.linalg.qr(np.random.uniform(-np.sqrt(1./n_hidden), np.sqrt(1./n_hidden), (n_hidden, n_hidden)))

Test RNN before Training

In [None]:
X_test = np.zeros((1, 1, n_voca), dtype=float)
ix = np.random.randint(n_voca)
X_test[0, 0, ix] = 1.0
y_char = index_to_char[ix]

print('The RNN is initiated with ', y_char, 'and will generate 100 characters.')

result = ''
t_length = 100
hprev = np.random.randn(1, n_hidden)

for i in range(t_length):
    ht, y_pred = RNN.forward(X_test, hprev, k1=1)

    hprev = ht[:,-1]

    iy = np.argmax(y_pred, -1)
    X_test = np.zeros((1, 1, n_voca))
    X_test[0, 0, iy[0,0]] = 1.0

    pred_ids = np.squeeze(iy, axis=-1)
    result += text_from_ids(pred_ids).numpy().decode('utf-8')

print('Generated text:\n', result)

The RNN is initiated with  b'3' and will generate 100 characters.
Generated text:
 fL&3:bTTv[UNK]kkz;n-R.;[UNK]$HSrOusBxh&XPt.BkzPE-P,G
LbpfccIeKrgB!t;UJ3egf[UNK]g[UNK]IdhKIkWk!mVU;dXZ:Avk b'U!tuXJsA


In [None]:
def update_param(RNN, dWhx, dWhh, dWyh, dbh, dby, alpha=1e-3):
    RNN.Whx -= alpha * dWhx
    RNN.Whh -= alpha * dWhh
    RNN.Wyh -= alpha * dWyh
    RNN.bh  -= alpha * dbh
    RNN.by  -= alpha * dby
    return

**RNN training takes more time than DNN or CNN does.**

In [None]:
# tqdm library shows loop progress
import tqdm

steps = len(ds_train)
print(steps)

3890


In [None]:
n_epochs = 5

# the initial state can be zeros or randoms
hprev = np.random.randn(batch_size, n_hidden)

for epoch in range(n_epochs):
    sample_no = 0
    t_loss = 0.0
    start = time.time()

    pbar = tqdm.notebook.tqdm(ds_train, total=steps)
    pbar.set_description('Epoch:%2d' % (epoch+1))

    for sample in pbar:

        X_train, y_train = prepare_dataset(sample, n_voca)
        
        ht, y_pred = RNN.forward(X_train, hprev, k1=seq_len)
    
        hprev = ht[:,-1]
    
        dy = y_pred - y_train      # gradient through softmax (b,o)

        dWhx, dWhh, dWyh, dbh, dby = RNN.backward(X_train, ht, dy, k2=seq_len)
    
        update_param(RNN, dWhx, dWhh, dWyh, dbh, dby, alpha=1e-3)
    
        sample_no += 1

        loss_J = -np.mean(np.sum(y_train * np.log(y_pred), axis=-1))
        pbar.set_postfix({'loss' : loss_J})
        t_loss += loss_J

    print('Epoch:%2d/ Samples:%4d, Elapsed_t: %4.2fs,  loss: %10.8f' \
          % (epoch+1, sample_no, time.time() - start, t_loss/steps))

  0%|          | 0/3890 [00:00<?, ?it/s]

Epoch: 1/ Samples:3890, Elapsed_t: 571.70s,  loss: 2.47084945


  0%|          | 0/3890 [00:00<?, ?it/s]

Epoch: 2/ Samples:3890, Elapsed_t: 568.89s,  loss: 2.12644655


  0%|          | 0/3890 [00:00<?, ?it/s]

Epoch: 3/ Samples:3890, Elapsed_t: 563.30s,  loss: 1.99504563


  0%|          | 0/3890 [00:00<?, ?it/s]

Epoch: 4/ Samples:3890, Elapsed_t: 561.97s,  loss: 1.90736156


  0%|          | 0/3890 [00:00<?, ?it/s]

Epoch: 5/ Samples:3890, Elapsed_t: 561.96s,  loss: 1.84551385


### Test Model with a random character
This model generates a string that looks like a sentence.<br>
But the resulting sentence has no grammar and no meaning.

In [None]:
X_test = np.zeros((1, 1, n_voca), dtype=float)
ix = np.random.randint(n_voca)
X_test[0, 0, ix] = 1.0
y_char = index_to_char[ix]

print('The RNN is initiated with ', y_char, 'and will generate 100 characters.')

result = ''
t_length = 100
hprev = np.random.randn(1, n_hidden)

for i in range(t_length):
    ht, y_pred = RNN.forward(X_test, hprev, k1=1)

    hprev = ht[:,-1]

    iy = np.argmax(y_pred, -1)
    X_test = np.zeros((1, 1, n_voca))
    X_test[0, 0, iy[0,0]] = 1.0

    pred_ids = np.squeeze(iy, axis=-1)
    result += text_from_ids(pred_ids).numpy().decode('utf-8')

print('Generated text:\n', result)

The RNN is initiated with  b'H' and will generate 100 characters.
Generated text:
 LAM:
And then he with me the prowned my lord, in the with a fare of his his him and good my life and


### To save model for later use

In [None]:
import pickle

def save_object(obj):
    try:
        with open("RNN.pickle", "wb") as f:
            pickle.dump(obj, f, protocol=pickle.HIGHEST_PROTOCOL)
    except Exception as ex:
        print("Error during pickling object (Possibly unsupported):", ex)

def load_object(filename):
    try:
        with open(filename, "rb") as f:
            return pickle.load(f)
    except Exception as ex:
        print("Error during unpickling object (Possibly unsupported):", ex)

In [None]:
save_object(RNN)

In [None]:
RNN = load_object("RNN.pickle")