# Recurrent Neural Networks
- Weights shared for each "word cell" (recurring)
![image.png](img/rnn.png)
[Source](https://www.coursera.org/learn/sequence-models-in-nlp/supplement/TyJuk/recurrent-neural-networks)
<br>
<br>

# Typical RNN Tasks
- 1:n - e.g. give a word and get a sentence
- n:1 - e.g. give a sentence and figure out if it's offensive or not (binary classification)
- n:n - e.g. translate a sentence into another language
<br>
<br>

# Why?
![image.png](img/rnn_q_matrix.png)
<br>
<br>

# Simple RNN
![image.png](img/simple_rnn.png)
[Source](https://www.coursera.org/learn/sequence-models-in-nlp/supplement/eaLt6/math-in-simple-rnns)
<br>
<br>

# Formulas
![image-2.png](img/rnn_math.png)
[Source](https://www.coursera.org/learn/sequence-models-in-nlp/supplement/eaLt6/math-in-simple-rnns)

<br>
<br>
<img src="img/rnn_element_wise.png" align="left"/>

This funny circle is standing for "*a binary operation that takes two matrices of the same dimensions and produces another matrix of the same dimension as the operands, where each element i, j is the product of elements i, j of the original two matrices. It is to be distinguished from the more common matrix product.*" [(Hadamard product)](https://en.wikipedia.org/wiki/Hadamard_product_(matrices)) Also see: [How to do in Python](https://stackoverflow.com/questions/40034993/how-to-get-element-wise-matrix-multiplication-hadamard-product-in-numpy)

<br>
<br>

# What are we training?
![image.png](img/rnn_what_to_train.png)
[Source](https://www.coursera.org/learn/sequence-models-in-nlp/supplement/eaLt6/math-in-simple-rnns)

<br>
<br>

# Calculate Hidden State Activation `h` in Python
*From h_t_prev to h_t*

In [1]:
import numpy as np

In [2]:
w_hh = np.random.standard_normal((3,2))
w_hx = np.random.standard_normal((3,3))
h_t_prev = np.random.standard_normal((2,1))
x_t = np.random.standard_normal((3,1))

In [3]:
print("w_hh",w_hh,"w_hx",w_hx,"h_t_prev",h_t_prev,"x_t",x_t, sep="\n\n")

w_hh

[[ 0.05024214  1.40632344]
 [-1.32553537 -0.01148908]
 [ 0.63987199  0.21935045]]

w_hx

[[ 1.5932815   0.6259053  -0.92402778]
 [-0.73604864 -0.9747867  -1.45184666]
 [ 0.1358952   0.42801815  0.79667667]]

h_t_prev

[[0.18624689]
 [1.12470923]]

x_t

[[ 0.27707348]
 [-0.25004927]
 [ 2.08299281]]


In [4]:
def sigmoid(x):
     return 1 / (1 + np.exp(-x))
    
bias = np.random.standard_normal((x_t.shape[0],1))
bias

array([[-0.12165709],
       [ 0.31815551],
       [ 0.22575993]])

In [5]:
h_t = sigmoid(np.matmul(w_hh, h_t_prev) + np.matmul(w_hx, x_t) + bias)
print(h_t)

A = h_t

[[0.4575055 ]
 [0.05088198]
 [0.89859761]]


In [6]:
# Another way
h_t = sigmoid(np.matmul(np.hstack((w_hh, w_hx)), np.vstack((h_t_prev, x_t))) + bias)
print(h_t)

B = h_t

[[0.4575055 ]
 [0.05088198]
 [0.89859761]]


In [7]:
np.allclose(A,B)

True

In [8]:
# Another way, too
sigmoid(np.hstack((w_hh, w_hx)) @ np.vstack((h_t_prev, x_t)) + bias)

array([[0.4575055 ],
       [0.05088198],
       [0.89859761]])

In [9]:
# Another way, too (this one is sexy)
sigmoid(w_hh @ h_t_prev + w_hx @ x_t + bias)

array([[0.4575055 ],
       [0.05088198],
       [0.89859761]])

In [10]:
%timeit sigmoid(np.matmul(w_hh, h_t_prev) + np.matmul(w_hx, x_t) + bias)

5.41 µs ± 53.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [11]:
%timeit sigmoid(np.matmul(np.hstack((w_hh, w_hx)), np.vstack((h_t_prev, x_t))) + bias)

13.1 µs ± 238 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [12]:
%timeit sigmoid(np.hstack((w_hh, w_hx)) @ np.vstack((h_t_prev, x_t)) + bias)

12.9 µs ± 314 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [13]:
%timeit sigmoid(w_hh @ h_t_prev + w_hx @ x_t + bias)

5.21 µs ± 30.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


Lets use `@` for element-wise operations. It's easier to remember, shorter and faster.

# Costs
For one example costs can be calculated like this.

<img src="img/rnn_costs_single.png" align="left"/>

<br>





If several time steps *T* are involved, we're building average costs.

<img src="img/rnn_costs_all.png" align="left"/>

[Source](https://www.coursera.org/learn/sequence-models-in-nlp/supplement/KBmVE/cost-function-for-rnns)

# Scan functions
- Abstract RNNs for fast computation
- Needed for GPU usage & parrallel computing

![image.png](img/rnn_scan_functions.png)

[Source](https://www.coursera.org/learn/sequence-models-in-nlp/supplement/rhso8/implementation-note)