# Hidden State Activation
Hidden state activation of a vanilla RNN
$$h^{<t>} = g\left(W_h\left[h^{<t-1>}, x^{<t>}\right] + b_h \right)  $$
Otherwise written as
$$h^{<t>} = g\left(W_{hh}h^{<t-1>} \oplus W_{hx}x^{<t>} + b_h \right) $$

Where
- $W_h$ in the first formula denotes the horizontal concatenation of $W_{hh}$ and $W_{hx}$ form the second formula
- $W_h$ in the first formula then multiplied by $[h^{<t-1>}, x^{<t>}]$ another concatenation of parameters from the second formula but this time in a different direction



In [1]:
import numpy as np

### Joining(Concatenation)
#### Weights
A join along the vertical boundary is called a horizontal contatenation stack

Visually it looks likes this:- $W_h$ = $[W_{hh}|W_{hx}]$

In [8]:
# Create some dummy data

w_hh = np.full((3, 2), 1)
w_hx = np.full((3, 3), 9)

# Some random initialization
# w_hh = np.random.standard_normal((3, 2))
# w_hx = np.random.standard_normal((3, 3))

print("__Data__\n")
print("w_hh : ")
print(w_hh)
print("w_hh shape :", w_hh.shape, "\n")
print("w_hx :")
print(w_hx)
print("w_hx shape :", w_hx.shape, "\n")

__Data__

w_hh : 
[[1 1]
 [1 1]
 [1 1]]
w_hh shape : (3, 2) 

w_hx :
[[9 9 9]
 [9 9 9]
 [9 9 9]]
w_hx shape : (3, 3) 



In [9]:
# Joining the array
print("__Joining__\n")
w_h1 = np.concatenate((w_hh, w_hx), axis=1)
print("Option 1: concatenate\n")
print("w_h : ")
print(w_h1)
print(f'w_h shape: {w_h1.shape}')

__Joining__

Option 1: concatenate

w_h : 
[[1 1 9 9 9]
 [1 1 9 9 9]
 [1 1 9 9 9]]
w_h shape: (3, 5)


In [10]:
# Option 2: hstack
w_h2 = np.hstack((w_hh, w_hx))
print("option 2 : hstack\n")
print("w_h :")
print(w_h2)
print("w_h shape :", w_h2.shape)

option 2 : hstack

w_h :
[[1 1 9 9 9]
 [1 1 9 9 9]
 [1 1 9 9 9]]
w_h shape : (3, 5)


In [14]:
np.hstack((w_hx, w_hh))

array([[9, 9, 9, 1, 1],
       [9, 9, 9, 1, 1],
       [9, 9, 9, 1, 1]])

### Hidden State & Inputs
Joining along a horizontal boundary is called a vertical concatenation or vertical stack. Visually it looks like this...

$$[h^{<t-1>}, x^{<t>}] = \left[\frac{h^{<t-1>}}{x^{<t>}}\right] $$

In [15]:
# Create some more dummy data
h_t_prev = np.full((2, 1), 1)
x_t = np.full((3, 1), 9)

### START CODE HERE ###
# h_t_prev = np.random.standard_normal((2,1))
# x_t = np.random.standard_normal((3,1))
### END CODE HERE ###

print("-- Data --\n")
print("h_t_prev :")
print(h_t_prev)
print("h_t_prev shape :", h_t_prev.shape, "\n")
print("x_t :")
print(x_t)
print("x_t shape :", x_t.shape, "\n")


-- Data --

h_t_prev :
[[1]
 [1]]
h_t_prev shape : (2, 1) 

x_t :
[[9]
 [9]
 [9]]
x_t shape : (3, 1) 



In [16]:
# Joining the arrays
print("-- Joining --\n")

# Option 1: concatenate - vertical
ax_1 = np.concatenate(
    (h_t_prev, x_t), axis=0
)  # note the difference in axis parameter vs earlier
print("option 1 : concatenate\n")
print("ax_1 :")
print(ax_1)
print("ax_1 shape :", ax_1.shape, "\n")

-- Joining --

option 1 : concatenate

ax_1 :
[[1]
 [1]
 [9]
 [9]
 [9]]
ax_1 shape : (5, 1) 



In [17]:
# Option 2: vstack
ax_2 = np.vstack((h_t_prev, x_t))
print("option 2 : vstack\n")
print("ax_2 :")
print(ax_2)
print("ax_2 shape :", ax_2.shape)

option 2 : vstack

ax_2 :
[[1]
 [1]
 [9]
 [9]
 [9]]
ax_2 shape : (5, 1)


In [18]:
np.vstack((x_t, h_t_prev))

array([[9],
       [9],
       [9],
       [1],
       [1]])

### Verify Formulas
Hidden state activation of a vanilla RNN
$$h^{<t>} = g\left(W_h\left[h^{<t-1>}, x^{<t>}\right] + b_h \right) \tag{1} $$
Otherwise written as
$$h^{<t>} = g\left(W_{hh}h^{<t-1>} \oplus W_{hx}x^{<t>} + b_h \right) \tag{1}$$


In [20]:
# Data
w_hh = np.full((3, 2), 1)
w_hx = np.full((3, 3), 9)
h_t_prev = np.full((2, 1), 1)
x_t = np.full((3, 1), 9)

# Results
print("__Results__")
stack_1 = np.hstack((w_hh, w_hx))
stack_2 = np.vstack((h_t_prev, x_t))

print("\nFormula 1")
print(f'Term 1:\n {stack_1}')
print(f'Term 2:\n {stack_2}')
formula_1 = np.matmul(np.hstack((w_hh, w_hx)), np.vstack((h_t_prev, x_t)))
print("\nOutput:")
print(formula_1)

__Results__

Formula 1
Term 1:
 [[1 1 9 9 9]
 [1 1 9 9 9]
 [1 1 9 9 9]]
Term 2:
 [[1]
 [1]
 [9]
 [9]
 [9]]

Output:
[[245]
 [245]
 [245]]


In [21]:
# Formula 2
mul_1 = np.matmul(w_hh, h_t_prev)
mul_2 = np.matmul(w_hx, x_t)
print("\nFormula 2")
print("Term1:\n",mul_1)
print("Term2:\n",mul_2)


Formula 2
Term1:
 [[2]
 [2]
 [2]]
Term2:
 [[243]
 [243]
 [243]]


In [22]:
formula_2 = np.matmul(w_hh, h_t_prev) + np.matmul(w_hx, x_t)
print("\nOuput:")
print(formula_2)


Ouput:
[[245]
 [245]
 [245]]


In [23]:
print("-- Verify --")
print("Results are the same :", np.allclose(formula_1, formula_2))


-- Verify --
Results are the same : True


In [24]:
# # Try adding a sigmoid activation function and bias term as a final check
# # Activation
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Bias and check
b = np.random.standard_normal((formula_1.shape[0],1))
print("Formula 1 Output:\n",sigmoid(formula_1+b))
print("Formula 2 Output:\n",sigmoid(formula_2+b))

all_close = np.allclose(sigmoid(formula_1+b), sigmoid(formula_2+b))
print("Results after activation are the same :",all_close)
### END CODE HERE ###

Formula 1 Output:
 [[1.]
 [1.]
 [1.]]
Formula 2 Output:
 [[1.]
 [1.]
 [1.]]
Results after activation are the same : True
