# Recurrent Neural Network (RNN) - NumPy

A recurrent neural network (RNN) is a class of artificial neural network where connections between nodes form a directed graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior. Unlike feedforward neural networks, RNNs can use their internal state (memory) to process sequences of inputs. This makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. (wikipedia)

In [1]:
# Import the libraries
import numpy as np

In [2]:
# Softmax function
def softmax(x):
    """
    Implementing activation function of softmax.
    """
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0)

## 1. Forward propagation for the basic Recurrent Neural Network¶
 A Recurrent neural network can be seen as the repetition of a single cell. You are first going to implement the computations for a single time-step. The following figure describes the operations for a single time-step of an RNN cell.

### 1.1. RNN cell

A Recurrent neural network can be seen as the repetition of a single cell. You are first going to implement the computations for a single time-step. The following figure describes the operations for a single time-step of an RNN cell.

##### The whole network:
<img src = "./assets/RNN_Cell_1.png">
<p style = "font-size:10px;color:gray;">Picture taken from udacity</p>
 
##### Zooming in one cell:
<img src = "./assets/RNN_Cell_2.png">
<p style = "font-size:10px;color:gray;">Picture taken from udacity</p>
 
##### One cell in greater detail:
<img src = "./assets/RNN_Cell_3.png">
<p style = "font-size:10px;color:gray;">Picture taken from udacity</p>

In [3]:
# Function for a single forward step for RNN cell
def rnn_cell_forward(Xt, St_prev, parameters_dict):
    """
    Implementing a single forward step for the RNN cell.
    
    Arguments
    -------------------------------------------------------------------------
    
    - Xt: Our input data at timestep "t". It should be numpy array of shape (n_x, m).
    
    - St_prev: Hidden state or state at timestep "t-1". It should be numpy array of shape (n_s, m).
    
    - parameters_dict: Python dictionary containing:
                           . Wx: A weight matrix connecting the inputs to the state (or hidden state).
                                 It should be numpy array of shape (n_s, n_x).  
                           . Ws: A weight matrix connecting the state (or hidden state) from previous timestep to the state (or hidden state) in the current timestep.
                                 It should be numpy array of shape (n_s, n_s).
                           . Wy: A weight matrix connecting the state (or hidden state) to the output layer.
                                 It should be numpy array of shape (n_y, n_s).
                           . bs: Bias for state. 
                                 It should be numpy array of shape (n_s, 1).
                           . by: Bias for output layer. 
                                 It should be numpy array of shape (n_y, 1).
    Returns
    -------------------------------------------------------------------------
    - St_next: Next state (or hidden state).
               It should be numpy array of shape (n_s, m).
               
    - Yt: Prediction at timestep "t".
          It should be numpy array of shape (n_y, m).
          
    - cache: Tuple of values needed for the backward pass. contains (St_next, St_prev, Xt, parameters_dict)
    
    """
    # Retrive parameters
    Wx = parameters_dict['Wx']
    Ws = parameters_dict['Ws']
    Wy = parameters_dict['Wy']
    bs = parameters_dict['bs']
    by = parameters_dict['by']
    
    # Compute the next activation state
    St_next = np.tanh(np.dot(Wx, Xt) + np.dot(Ws, St_prev) + bs)
    
    # Compute output of current cell
    Yt = softmax(np.dot(Wy, St_next) + by)
    
    # Store variables for backpropagation
    cache = (St_next, St_prev, Xt, parameters_dict)
    
    return St_next, Yt, cache

In [4]:
### Testing

# Random seed
np.random.seed(1)

# Input data
Xt = np.random.randn(3, 10)

# Hidden state at time t-1
St_prev = np.random.randn(5, 10)

# Weight matrices
Ws = np.random.randn(5, 5)
Wx = np.random.randn(5, 3)
Wy = np.random.randn(2, 5)

# Biases
bs = np.random.randn(5, 1)
by = np.random.randn(2, 1)

# Storing all parameters inside a dictionary
parameters = {"Ws": Ws, "Wx": Wx, "Wy": Wy, "bs": bs, "by": by}

# Compute the forward step for the RNN cell.
St_next, Yt, cache = rnn_cell_forward(Xt = Xt, 
                                      St_prev = St_prev, 
                                      parameters_dict = parameters)

# Get the shape of hidden state at time t
print("Shape of hidden state at time t: ", St_next.shape, "\n")

# Get the shape of output
print("Output shape: ", Yt.shape)

Shape of hidden state at time t:  (5, 10) 

Output shape:  (2, 10)


### 1.2. RNN forward pass

You can see an RNN as the repetition of the cell you've just built. If your input sequence of data is carried over 10 time steps, then you will copy the RNN cell 10 times. Each cell takes as input the hidden state from the previous cell and the current time-step's input data. It outputs a hidden state and a prediction for this time-step.

<img src = "./assets/RNN.png">

In [5]:
# Function for forward pass od RNN
def rnn_forward(X, S_0, parameters_dict):
    """
    Implementing the forward pass of the recurrent neural network (RNN).
    
    Arguments
    -------------------------------------------------------------------------
    - X: Input data for every time-step.
         It should be numpy array of shape (n_x, m, T_x). T_x is total length of x.
         
    - S_initial: Initial hidden state.
                 It should be numpy array of shape (n_s, m).
                 
    - parameters_dict: Python dictionary containing:
                           . Wx: A weight matrix connecting the inputs to the state (or hidden state).
                                 It should be numpy array of shape (n_s, n_x).  
                           . Ws: A weight matrix connecting the state (or hidden state) from previous timestep to the state (or hidden state) in the current timestep.
                                 It should be numpy array of shape (n_s, n_s).
                           . Wy: A weight matrix connecting the state (or hidden state) to the output layer.
                                 It should be numpy array of shape (n_y, n_s).
                           . bs: Bias for state. 
                                 It should be numpy array of shape (n_s, 1).
                           . by: Bias for output layer. 
                                 It should be numpy array of shape (n_y, 1).
    
    Returns
    -------------------------------------------------------------------------
    - S: Hidden states for every time-step.
         It should be numpy array of shape (n_s, m, T_x).
         
    - y_pred: Predictions for every time-step.
              It should be numpy array of shape (n_y, m, T_x).
          
    - caches: Tuple of values needed for the backward pass, contains (list of caches, x)
    
    """
    # Retrive dimensions
    n_x, m, T_x = X.shape
    n_y, n_s = parameters_dict['Wy'].shape
    
    # Initialize state "S" and output "y" with zeros
    S = np.zeros(shape = (n_s, m, T_x))
    y_pred = np.zeros(shape = (n_y, m, T_x))
    
    # Initialize S_next
    S_next = S_0
    
    # Initialize caches which contains all caches
    caches = []
    
    # Loop over all timesteps
    for t in range(T_x):
        
        # Updating next state, Predicting the output, And getting the cache
        S_next, Yt, cache = rnn_cell_forward(Xt = X[:, :, t],
                                              St_prev = S_next,
                                              parameters_dict = parameters_dict)
        
        # Save the next state into hidden state
        S[:, :, t] = S_next
        
        # Save the value of prediction into y
        y_pred[:, :, t] = Yt
        
        # Append "cache" into "caches"
        caches.append(cache)
        
    # Store values needed for backward propagation in cache
    caches = (caches, X)
    
    return S, y_pred, caches

In [6]:
### Testing

# Random seed
np.random.seed(1)

# Input data
X = np.random.randn(3,10,4)

# Initial hidden state
S_0 = np.random.randn(5,10)

# Weight matrices
Ws = np.random.randn(5,5)
Wx = np.random.randn(5,3)
Wy = np.random.randn(2,5)

# Biases
bs = np.random.randn(5,1)
by = np.random.randn(2,1)

# Storing all parameters inside a dictionary
parameters = {"Ws": Ws, "Wx": Wx, "Wy": Wy, "bs": bs, "by": by}

# Compute forward pass of the RNN
S, y_pred, caches = rnn_forward(X = X, 
                                S_0 = S_0, 
                                parameters_dict = parameters)

# Get the shape of hidden state
print("Shape of hidden state:", S.shape, "\n")

# Get the shape of predicted y
print("Shape of predicted y: ", y_pred.shape, "\n")

Shape of hidden state: (5, 10, 4) 

Shape of predicted y:  (2, 10, 4) 



## 2. Backpropagation in recurrent neural networks

In modern deep learning frameworks, we only have to implement the forward pass, and the framework takes care of the backward pass, so most deep learning engineers do not need to bother with the details of the backward pass. 

In a simple (fully connected) neural network, we did backpropagation to compute the derivatives with respect to the cost to update the parameters. Similarly, in recurrent neural networks we can to calculate the derivatives with respect to the cost in order to update the parameters.

<img style = "width:650px" src = "./assets/BBTT.JPG">

### 2.1. Basic RNN cell backward pass

In [7]:
# Function for the backward pass of RNN cell
def rnn_cell_backward(ds_next, cache):
    """
    Implement the backward pass for RNN cell (single timestep)
    
    Arguments
    -------------------------------------------------------------------------
    - ds_next: Gradient of loss with respect to next hidden state.
    
    - cache: A dictionary containing the output of rnn_cell_forward()
    
    
    Returns
    -------------------------------------------------------------------------
    - gradients: A dictionary which contains the following.
                     . dx: Gradients of input data.
                           It should be numpy array of shape (n_x, m)
                     . ds_prev: Gradients of previous hidden state.
                                It should be numpy array of shape (n_s, m)
                     . dWx: Gradients of input-to-hidden weights. 
                            It should be numpy array of shape (n_s, n_x).
                     . dWs: Gradients of hidden-to-hidden weights.
                            It should be numpy array of shape (n_s, n_s).
                     . dbs: Gradients of bias vector. 
                            It should be numpy array of shape (n_s, 1)
    
    """
    # Retrieve values from cache
    (S_next, S_prev, Xt, parameters_dict) = cache
    
    # Retrieve values from parameters
    Wx = parameters_dict["Wx"]
    Ws = parameters_dict["Ws"]
    Wy = parameters_dict["Wy"]
    bs = parameters_dict["bs"]
    by = parameters_dict["by"]
    
    # Compute the gradient of tanh with respect to S_next
    dtanh = (1 - S_next ** 2) * ds_next
    
    # Compute the gradient of loss with repect to Wx
    dXt = np.dot(Wx.T, dtanh)
    dWx = np.dot(dtanh, Xt.T)
    
    # Compute the gradient with respect to Ws
    ds_prev = np.dot(Ws.T, dtanh)
    dWs = np.dot(dtanh, S_prev.T)
    
    # Compute the gradient with respect b
    dbs = np.sum(dtanh, axis = 1, keepdims = 1)
    
    # Store the gradients
    gradients = {"dXt": dXt, "ds_prev": ds_prev, "dWx": dWx, "dWs": dWs, "dbs": dbs}
    
    return gradients

In [8]:
### Testing

# Random seed
np.random.seed(1)

# Input data
Xt = np.random.randn(3,10)

# Previous hidden state
S_prev = np.random.randn(5,10)

# Weight matrices
Wx = np.random.randn(5,3)
Ws = np.random.randn(5,5)
Wy = np.random.randn(2,5)

# Biases
b = np.random.randn(5,1)
by = np.random.randn(2,1)

# Storing all parameters inside a dictionary
parameters = {"Wx": Wx, "Ws": Ws, "Wy": Wy, "bs": bs, "by": by}

# Compute single forward step for RNN cell
S_next, Yt, cache = rnn_cell_forward(Xt, S_prev, parameters)

# Gradient of loss with respect to next hidden state.
ds_next = np.random.randn(5,10)

# Compute backward pass of the RNN cell
gradients = rnn_cell_backward(ds_next, cache)

# Get the shapes
print("gradients[\"dXt\"].shape =", gradients["dXt"].shape, "\n")
print("gradients[\"ds_prev\"].shape =", gradients["ds_prev"].shape, "\n")
print("gradients[\"dWx\"].shape =", gradients["dWx"].shape, "\n")
print("gradients[\"dWs\"].shape =", gradients["dWs"].shape, "\n")
print("gradients[\"dbs\"].shape =", gradients["dbs"].shape, "\n")

gradients["dXt"].shape = (3, 10) 

gradients["ds_prev"].shape = (5, 10) 

gradients["dWx"].shape = (5, 3) 

gradients["dWs"].shape = (5, 5) 

gradients["dbs"].shape = (5, 1) 



### 2.2. Backward pass through the RNN

Computing the gradients of the cost with respect to $S_{t}$ at every time-step $t$ is useful because it is what helps the gradient backpropagate to the previous RNN-cell. To do so, you need to iterate through all the time steps starting at the end, and at each step, you increment the overall $db_S$, $dW_{S}$, $dW_{X}$ and you store $dx$.

In [9]:
# Function for backward pass of RNN
def rnn_backward(ds, caches):
    """
    Implement the backward pass for a RNN over an entire sequence of input data.
    
    Arguments
    -------------------------------------------------------------------------
    - ds: Upstream gradients of all hidden states.
          It should be numpy array of shape (n_s, m, T_x)
          
    -caches: Tuple containing information from the forward pass or rnn_forward()
    
    
    Returns
    -------------------------------------------------------------------------
    - gradients:  A dictionary which contains the following.
                     . dx: Gradient w.r.t. the input data.
                           It should be numpy array of shape (n_x, m, T_x).
                     . ds_0: Gradient w.r.t the initial hidden state.
                                It should be numpy array of shape (n_s, m)
                     . dWx: Gradient w.r.t the input's weight matrix.
                            It should be numpy array of shape (n_s, n_x).
                     . dWs: Gradient w.r.t the hidden state's weight matrix.
                            It should be numpy array of shape (n_s, n_s).
                     . dbs: Gradient w.r.t the bias.
                            It should be numpy array of shape (n_s, 1)
    """
    # Retrieve first cache (t=1) inside caches
    (caches, X) = caches
    (S1, S0, X1, parameters) = caches[0]
    
    # Retrieve dimensions
    n_s, m, T_x = ds.shape
    n_x, m = X1.shape
    
    # Initialize the gradients
    dX = np.zeros(shape = (n_x, m, T_x))
    dWx = np.zeros(shape = (n_s, n_x))
    dWs = np.zeros(shape = (n_s, n_s))
    dbs = np.zeros(shape = (n_s, 1))
    ds_0 = np.zeros(shape = (n_s, m))
    ds_prev = np.zeros(shape = (n_s, m))
    

    # Loop over all timesteps
    for t in reversed(range(T_x)):
        
        # Compute gradients at time t
        gradients = rnn_cell_backward(ds_next = ds[:, :, t] + ds_prev,
                                      cache = caches[t])

        # Retrieve derivatives from gradients
        dX_t = gradients['dXt']
        ds_prev = gradients['ds_prev']
        dWx_t = gradients['dWx']
        dWs_t = gradients['dWs']
        dbs_t = gradients['dbs']
        
        # Increment global derivatives w.r.t. parameters by adding their derivative at timestep t
        dX[:, :, t] = dX_t
        dWx += dWx_t
        dWs += dWs_t
        dbs += dbs_t
        
    # Updating ds_0
    ds_0 = ds_prev
    
    # Store the gradients
    gradients = {"dX": dX, "ds_0": ds_0, "dWx": dWx, "dWs": dWs, "dbs": dbs}
    
    return gradients

In [10]:
### Testing

# Random seed
np.random.seed(1)

# Input data
X = np.random.randn(3, 10, 4)

# Initial hidden state
S_0 = np.random.randn(5, 10)

# Weight matrices
Wx = np.random.randn(5, 3)
Ws = np.random.randn(5, 5)
Wy = np.random.randn(2, 5)

# Biases
bs = np.random.randn(5, 1)
by = np.random.randn(2, 1)

# Storing all parameters inside a dictionary
parameters = {"Wx": Wx, "Ws": Ws, "Wy": Wy, "bs": bs, "by": by}

# Compute forward pass of RNN
S, y_pred, caches = rnn_forward(X, S_0, parameters)

# Upstream gradients of all hidden states.
ds = np.random.randn(5, 10, 4)

# Compute backward pass of RNN
gradients = rnn_backward(ds, caches)

# Get the shapes
print("gradients[\"dX\"].shape =", gradients["dX"].shape)
print("gradients[\"ds_0\"].shape =", gradients["ds_0"].shape)
print("gradients[\"dWx\"].shape =", gradients["dWx"].shape)
print("gradients[\"dWs\"].shape =", gradients["dWs"].shape)
print("gradients[\"dbs\"].shape =", gradients["dbs"].shape)

gradients["dX"].shape = (3, 10, 4)
gradients["ds_0"].shape = (5, 10)
gradients["dWx"].shape = (5, 3)
gradients["dWs"].shape = (5, 5)
gradients["dbs"].shape = (5, 1)


**RESOURCES:**
1. <a href="https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks">Recurrent Neural Networks cheatsheet - Stanford</a>
2. <a href="https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/">Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
</a>
3. <a href="https://blog.usejournal.com/stock-market-prediction-by-recurrent-neural-network-on-lstm-model-56de700bff68">Stock Market Prediction by Recurrent Neural Network on LSTM Model</a>

<hr>

# Build Your Own Encrypted Language Using RNNs

## 1. Loading the Dataset

In [1]:
# Importing the libraries
import helper

In [2]:
# Loading the dataset
codes = helper.load_data('./dataset/cipher.txt')
plaintext = helper.load_data('./dataset/plaintext.txt')

In [3]:
# Take a look at the first sentence and its encripted version
print("Sentence: ", plaintext[0])
print("Encripted: ", codes[0])

Sentence:  THE LIME IS HER LEAST LIKED FRUIT , BUT THE BANANA IS MY LEAST LIKED .
Encripted:  YMJ QNRJ NX MJW QJFXY QNPJI KWZNY , GZY YMJ GFSFSF NX RD QJFXY QNPJI .


## 2. Preprocessing

In [4]:
# Importing the libraries
import numpy as np
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences

Using TensorFlow backend.


In [5]:
# Preprocessing function
def preprocess_text(list_of_strings):
    
    # Initialize the tokenizer
    char_tokens = Tokenizer(lower = True,
                            split = ' ',
                            char_level = True,
                            filters = '!"#$%&()*+,-./:;<=>?@[\\\\]^_`{|}~\\t\\n\'')
    
    # Fitting the tokenizer to our text
    char_tokens.fit_on_texts(list_of_strings)
    
    return char_tokens.texts_to_sequences(list_of_strings), char_tokens

In [51]:
# Applying the preprocessing to our text
codes_preprocessed, codes_tokenizer = preprocess_text(codes)
plaintext_preprocessed, plaintext_tokenizer = preprocess_text(plaintext)

In [55]:
# Take a look at the first sentence and its preprocessed version
print("Sentence: \n", plaintext[0], "\n")
print("Preprocessed Sentence: \n", plaintext_preprocessed[0])

Sentence: 
 THE LIME IS HER LEAST LIKED FRUIT , BUT THE BANANA IS MY LEAST LIKED . 

Preprocessed Sentence: 
 [5, 14, 3, 1, 10, 2, 13, 3, 1, 2, 4, 1, 14, 3, 6, 1, 10, 3, 8, 4, 5, 1, 10, 2, 25, 3, 11, 1, 20, 6, 9, 2, 5, 1, 18, 1, 17, 9, 5, 1, 5, 14, 3, 1, 17, 8, 7, 8, 7, 8, 1, 2, 4, 1, 13, 15, 1, 10, 3, 8, 4, 5, 1, 10, 2, 25, 3, 11, 1, 19]


In [8]:
# Padding function
def padding(list_of_strings, length = None):
    
    # Define length, If it is not defined
    if length == None:
        length = max([len(i_text) for i_text in list_of_strings])
        
    # Add padding
    pad = pad_sequences(list_of_strings, maxlen = length, padding = "post")
    
    return pad

In [9]:
# Apply padding to our preprocessed text
codes_preprocessed = padding(codes_preprocessed)
plaintext_preprocessed = padding(plaintext_preprocessed)

In [10]:
# Keras's sparse_categorical_crossentropy function requires the labels to be in 3 dimensions
plaintext_preprocessed = plaintext_preprocessed.reshape(*plaintext_preprocessed.shape, 1)

## 3. Create a Model

In [11]:
# Importing the libraries
from keras.layers import GRU, Input, Dense, TimeDistributed
from keras.models import Model
from keras.layers import Activation
from keras.optimizers import Adam
from keras.losses import sparse_categorical_crossentropy
from keras.callbacks import ModelCheckpoint

In [12]:
# Hyper parameters
learning_rate = 1e-3
gru_units = 64
batch_size=32 
epochs=5 
validation_split=0.2

In [13]:
# Reshaping the input to work with a basic RNN
tmp_x = padding(list_of_strings = codes_preprocessed, 
                length = plaintext_preprocessed.shape[1])

tmp_x = tmp_x.reshape((-1, plaintext_preprocessed.shape[-2], 1))

In [14]:
# Create your model's architecture
input_model = Input(tmp_x.shape[1:])
rnn_1 = GRU(units = gru_units, return_sequences = True)(input_model)
logits = TimeDistributed(Dense(len(plaintext_tokenizer.word_index) + 1))(rnn_1)
model = Model(input_model, Activation("softmax")(logits))

# Take a look at model's summary
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 101, 1)            0         
_________________________________________________________________
gru_1 (GRU)                  (None, 101, 64)           12672     
_________________________________________________________________
time_distributed_1 (TimeDist (None, 101, 32)           2080      
_________________________________________________________________
activation_1 (Activation)    (None, 101, 32)           0         
Total params: 14,752
Trainable params: 14,752
Non-trainable params: 0
_________________________________________________________________


In [15]:
# Compile the model
model.compile(loss = sparse_categorical_crossentropy,
              optimizer = Adam(learning_rate),
              metrics = ['accuracy'])

In [16]:
# Fit the model
# Checkpoint for saving the model
checkpointer = ModelCheckpoint(filepath='./saved model/weights.best.deciphering.hdf5', 
                               verbose = 1, 
                               save_best_only = True)

# Train the model
model.fit(tmp_x, 
          plaintext_preprocessed,
          batch_size = batch_size,
          epochs = epochs,
          validation_split = validation_split,
          callbacks = [checkpointer], 
          verbose = 1)

Train on 8000 samples, validate on 2001 samples
Epoch 1/5

Epoch 00001: val_loss improved from inf to 0.95687, saving model to ./saved model/weights.best.deciphering.hdf5
Epoch 2/5

Epoch 00002: val_loss improved from 0.95687 to 0.56940, saving model to ./saved model/weights.best.deciphering.hdf5
Epoch 3/5

Epoch 00003: val_loss improved from 0.56940 to 0.36154, saving model to ./saved model/weights.best.deciphering.hdf5
Epoch 4/5

Epoch 00004: val_loss improved from 0.36154 to 0.24660, saving model to ./saved model/weights.best.deciphering.hdf5
Epoch 5/5

Epoch 00005: val_loss improved from 0.24660 to 0.17632, saving model to ./saved model/weights.best.deciphering.hdf5


<keras.callbacks.History at 0x11145b6a0>

## 4. Prediction

In [42]:
# Define a funcction for converting logits into text
def logits_to_text(logits, tokenizer):
    
    # Get index to words
    index2word = {id: word for word, id in tokenizer.word_index.items()}
    
    # Add '<PAD>' at start of index2word
    index2word[0] = '<PAD>'
    
    # Get the text
    text = "".join([index2word[prediction] for prediction in np.argmax(logits, 1)])
    
    return text

In [46]:
# Predict the first item in tmp_x
prediction = model.predict(tmp_x[:1])[0]

# Convert the logits into text
predicted_text = logits_to_text(logits = prediction, 
                                tokenizer = plaintext_tokenizer)
print(predicted_text)

the lime is her least liked fruit , but the banana is my least liked .<PAD><PAD><PAD><PAD><PAD><PAD><PAD><PAD><PAD><PAD><PAD><PAD><PAD><PAD><PAD><PAD><PAD><PAD><PAD><PAD><PAD><PAD><PAD><PAD><PAD><PAD><PAD><PAD><PAD><PAD><PAD>


**RESOURCES:**
1. <a href="https://machinelearningmastery.com/develop-character-based-neural-language-model-keras/">How to Develop a Character-Based Neural Language Model in Keras</a>
2. <a href="https://eli.thegreenplace.net/2018/understanding-how-to-implement-a-character-based-rnn-language-model/">Understanding how to implement a character-based RNN language model</a>