# Recurrent Neural Networks
- Weights shared for each "word cell" (recurring)
![image.png](img/rnn.png)
[Source](https://www.coursera.org/learn/sequence-models-in-nlp/supplement/TyJuk/recurrent-neural-networks)
<br>
<br>

# Typical RNN Tasks
- 1:n - e.g. give a word and get a sentence
- n:1 - e.g. give a sentence and figure out if it's offensive or not (binary classification)
- n:n - e.g. translate a sentence into another language
<br>
<br>

# Why?
![image.png](img/rnn_q_matrix.png)
<br>
<br>

# Simple RNN
![image.png](img/simple_rnn.png)
[Source](https://www.coursera.org/learn/sequence-models-in-nlp/supplement/eaLt6/math-in-simple-rnns)
<br>
<br>

# Formulas
![image-2.png](img/rnn_math.png)
[Source](https://www.coursera.org/learn/sequence-models-in-nlp/supplement/eaLt6/math-in-simple-rnns)

<br>
<br>
<img src="img/rnn_element_wise.png" align="left"/>

This funny circle is standing for "*a binary operation that takes two matrices of the same dimensions and produces another matrix of the same dimension as the operands, where each element i, j is the product of elements i, j of the original two matrices. It is to be distinguished from the more common matrix product.*" [(Hadamard product)](https://en.wikipedia.org/wiki/Hadamard_product_(matrices)) Also see: [How to do in Python](https://stackoverflow.com/questions/40034993/how-to-get-element-wise-matrix-multiplication-hadamard-product-in-numpy)

<br>
<br>

# What are we training?
![image.png](img/rnn_what_to_train.png)
[Source](https://www.coursera.org/learn/sequence-models-in-nlp/supplement/eaLt6/math-in-simple-rnns)

<br>
<br>

# Calculate Hidden State Activation `h` in Python
*From h_t_prev to h_t*

In [1]:
import numpy as np

In [2]:
w_hh = np.random.standard_normal((3,2))
w_hx = np.random.standard_normal((3,3))
h_t_prev = np.random.standard_normal((2,1))
x_t = np.random.standard_normal((3,1))

In [6]:
print("w_hh",w_hh,"w_hx",w_hx,"h_t_prev",h_t_prev,"x_t",x_t, sep="\n\n")

w_hh

[[ 0.22457062 -0.63607552]
 [ 0.38489911  0.09436209]
 [ 0.35757583 -0.8582695 ]]

w_hx

[[-1.89119573 -1.15237173 -2.64837489]
 [-0.65459109 -1.15981748  0.9526242 ]
 [-1.48586    -0.27073782 -0.5992768 ]]

h_t_prev

[[-1.82289398]
 [ 0.19193647]]

x_t

[[ 0.3404886 ]
 [-1.21314068]
 [ 0.30124242]]


In [12]:
def sigmoid(x):
     return 1 / (1 + np.exp(-x))
    
bias = np.random.standard_normal((x_t.shape[0],1))
bias

array([[ 0.1271755 ],
       [ 1.76832631],
       [-0.8434474 ]])

In [16]:
h_t = sigmoid(np.matmul(w_hh, h_t_prev) + np.matmul(w_hx, x_t) + bias)
print(h_t)

A = h_t

[[0.38983081]
 [0.92797018]
 [0.11732529]]


In [17]:
# Another way
h_t = sigmoid(np.matmul(np.hstack((w_hh, w_hx)), np.vstack((h_t_prev, x_t))) + bias)
print(h_t)

B = h_t

[[0.38983081]
 [0.92797018]
 [0.11732529]]


In [19]:
np.allclose(A,B)

True

In [21]:
# Another way, too
sigmoid(np.hstack((w_hh, w_hx)) @ np.vstack((h_t_prev, x_t)) + bias)

array([[0.38983081],
       [0.92797018],
       [0.11732529]])

In [26]:
# Another way, too (this one is sexy)
sigmoid(w_hh @ h_t_prev + w_hx @ x_t + bias)

array([[0.38983081],
       [0.92797018],
       [0.11732529]])

In [32]:
%timeit sigmoid(np.matmul(w_hh, h_t_prev) + np.matmul(w_hx, x_t) + bias)

5.83 µs ± 208 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [29]:
%timeit sigmoid(np.matmul(np.hstack((w_hh, w_hx)), np.vstack((h_t_prev, x_t))) + bias)

14.9 µs ± 428 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [30]:
%timeit sigmoid(np.hstack((w_hh, w_hx)) @ np.vstack((h_t_prev, x_t)) + bias)

14.4 µs ± 578 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [33]:
%timeit sigmoid(w_hh @ h_t_prev + w_hx @ x_t + bias)

5.41 µs ± 117 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


Lets use `@` for element-wise operations. It's easier to remember, shorter and faster.