# Understanding Recurrent Neural Networks (RNNs)

Recurrent Neural Networks are an evolution of Artificial Neural Networks (ANNs) that are designed to be used on sequences of data where the prior data values are expected to have some affect on the next value in the sequence. [^1] In a sense they kind of remind me of Hidden Markov Models (HMMs)
Please note that we should not confuse RNNs with Recursive Neural Networks (RvNNs) which are a completely different type of neural network. [^2]

As shown in the diagram below, a simple RNN differs from an ANN in the sense that the output of a node from the prior element is combined with the input for the next element.
This means that the RNN considers information for both the current input and the output from the prior input.

![RNN vs ANN structure comparison](./recurrentVsFeedForwardNNs.png) [^4]

RNN types:
 - 1 to 1
 - 1 to many
 - many to 1
 - many to many

![RNN types](./rnnTypes.png) [^4]

## Training

Like with regular neural networks, RNNs are trained with a modified version of the general Forward Propagation 
and Back Propagation process called `Back Propagation Through Time` (BPTT)

Forward propagation consists of putting a input into the ANN and then calculating the "error" between the calculated result and the expected result.
Back Propagation occurs when we take that error and work our way back through the neural net updating the weights on each node to minimize the total error.


## Bibliography/Resources

[^1]: [Recurrent Neural Networks - Wikipedia](https://en.wikipedia.org/wiki/Recurrent_neural_network)

[^2]: [Recursive Neural Networks - Wikipedia](https://en.wikipedia.org/wiki/Recursive_neural_network)

[^3]: [Illustrated Guide to RNNs - Towards Data Science](https://towardsdatascience.com/illustrated-guide-to-recurrent-neural-networks-79e5eb8049c9)

[^4]: [RNNs and LSTM - BuiltIn.com](https://builtin.com/data-science/recurrent-neural-networks-and-lstm)

[^5]: [RNNs Cheatsheet - Stanford.com](https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks)

[^6]: [RNNs with Keras - Tensorflow.com](https://www.tensorflow.org/guide/keras/rnn)


ANN from scratch

In [3]:
# imports
import os
import tensorflow as tf
import cProfile

mucking around with tensor flow basics

In [4]:
x = [[2]]
m = tf.matmul(x,x)
print(f'm = {m}')

m = [[4]]


mucking around with back prop: https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/

![back prop by hand pg 01](./backprop_by_hand/backprop_by_hand_01.jpg)

In [5]:
# initial weights (input and bias) for hidden layer's node 1
weights_h1 = [[0.15, 0.2,1]]
# inputs and bias going into hidden layer's node 1
inputs = [[0.05],[0.1],[0.35]]
# the matrix multiplication to generate the (total) net input
net_input_h1 = tf.matmul(weights_h1, inputs)
print(f'net_input_h1: {net_input_h1}')

# calculate the output by running the net input through the activation function (in this case the logistic/sigmoid func)
out_h1 = 1 / (1 + tf.math.exp(-1*net_input_h1))
print(f'out_h1{out_h1}')
print(f'using the built in sigmoid func: {tf.math.sigmoid(net_input_h1)}')

net_input_h1: [[0.3775]]
out_h1[[0.59327]]
using the built in sigmoid func: [[0.59327]]


![back prop by hand pg 02](./backprop_by_hand/backprop_by_hand_02.jpg)

In [6]:
# repeating the process for node 2 of the hidden layer
weights_h2 = [[0.25, 0.3,1]]
net_input_h2 = tf.matmul(weights_h2, inputs)
print(f'net_input_h2: {net_input_h2}')
out_h2 = tf.math.sigmoid(net_input_h2)
print(f'out_h2: {out_h2}')

net_input_h2: [[0.39249998]]
out_h2: [[0.59688437]]


In [7]:
# repeating the process for node 1 of the output layer
weights_o1 = [[0.4, 0.45, 1]]
hidden_result = [out_h1[0],out_h2[0],[0.6]]
net_input_o1 = tf.matmul(weights_o1, hidden_result)
print(f'net_input_o1: {net_input_o1}')
out_o1 = tf.math.sigmoid(net_input_o1)
print(f'out_o1: {out_o1}')

net_input_o1: [[1.105906]]
out_o1: [[0.75136507]]


In [8]:
# repeating process for node 2 of the output layer
weights_o2 = [[0.5, 0.55, 1]]
net_input_o2 = tf.matmul(weights_o2, hidden_result)
print(f'net_input_o2:{net_input_o2}')
out_o2 = tf.math.sigmoid(net_input_o2)
print(f'out_o2:{out_o2}')


net_input_o2:[[1.2249215]]
out_o2:[[0.7729285]]


## Calculating the Error


In [9]:
# Then we can calculate the errors

target_o1 = 0.01
error_o1 = 0.5 * tf.math.pow((target_o1 - out_o1),2)
print(f'error_o1: {error_o1}')

target_o2 = 0.99
error_o2 = 0.5 * tf.math.pow((target_o2 - out_o2),2)
print(f'error_o2: {error_o2}')

error_total = error_o1 + error_o2
print(f'error_total: {error_total}')

error_o1: [[0.2748111]]
error_o2: [[0.02356002]]
error_total: [[0.2983711]]


## Executing the backwards pass of back propagation

![back prop by hand pg 03](./backprop_by_hand/backprop_by_hand_03.jpg)

![back prop by hand pg 04](./backprop_by_hand/backprop_by_hand_04.jpg)

![back prop by hand pg 05](./backprop_by_hand/backprop_by_hand_05.jpg)

In [14]:
d_error_total_p_d_out_o1 = out_o1 - target_o1
d_out_o1_p_d_net_o1 = out_o1 * (1 - out_o1)
d_net_o1_p_d_w5 = out_h1
d_error_total_p_d_w5 = d_error_total_p_d_out_o1 * d_out_o1_p_d_net_o1 * d_net_o1_p_d_w5
print(f'd_error_total_p_d_w5: {d_error_total_p_d_w5}')
d_net_o1_p_d_w6 = out_h2
d_error_total_p_d_w6 = d_error_total_p_d_out_o1 * d_out_o1_p_d_net_o1 * d_net_o1_p_d_w6
print(f'd_error_total_p_d_w6: {d_error_total_p_d_w6}')

w6n = 0.45-0.5*d_error_total_p_d_w6
print(f'w6n: {w6n}')

d_error_total_p_d_w5: [[0.08216704]]
d_error_total_p_d_w6: [[0.08266763]]
