# Hidden state activation

In [1]:
import numpy as np

## Activations

$h^{<t>}=g(W_{h}[h^{<t-1>},x^{<t>}] + b_h)$                                                    

Or you can write it like this:         

$h^{<t>}=g(W_{hh}h^{<t-1>} + W_{hx}x^{<t>} + b_h)$                                        


## Joining matrixes - horizontal stack

$W_h = \left [ W_{hh} \ | \ W_{hx} \right ]$

In [2]:
w_hh = np.full((3,2), 1)
w_hx = np.full((3,3), 9)

In [3]:
w_hh

array([[1, 1],
       [1, 1],
       [1, 1]])

In [4]:
w_hx

array([[9, 9, 9],
       [9, 9, 9],
       [9, 9, 9]])

### Method 1: Use `np.concatenate()`

In [5]:
w_h1 = np.concatenate((w_hh, w_hx), axis=1)

In [6]:
w_h1

array([[1, 1, 9, 9, 9],
       [1, 1, 9, 9, 9],
       [1, 1, 9, 9, 9]])

### Method 2: Use `np.hstack()`

In [7]:
w_h2 = np.hstack((w_hh, w_hx))

In [8]:
w_h2

array([[1, 1, 9, 9, 9],
       [1, 1, 9, 9, 9],
       [1, 1, 9, 9, 9]])

## Joining matrixes - vertical stack

$[h^{<t-1>},x^{<t>}] = \left[ \frac{h^{<t-1>}}{x^{<t>}} \right]$

In [9]:
h_t_prev = np.full((2,1), 1)
x_t = np.full((3,1), 9)

In [10]:
h_t_prev

array([[1],
       [1]])

In [11]:
x_t

array([[9],
       [9],
       [9]])

### Method 1: Use `np.concatenate()`

In [12]:
ax_1 = np.concatenate((h_t_prev, x_t), axis=0)

In [13]:
ax_1

array([[1],
       [1],
       [9],
       [9],
       [9]])

### Method 3: Use `np.vstack()`

In [14]:
ax_2 = np.vstack((h_t_prev, x_t))

In [15]:
ax_2

array([[1],
       [1],
       [9],
       [9],
       [9]])

## Verify formulas

Now you know how to do the concatenations, horizontal and vertical, lets verify if the two formulas produce the same result.

__Formula 1:__ $h^{<t>}=g(W_{h}[h^{<t-1>},x^{<t>}] + b_h)$ 

__Formula 2:__ $h^{<t>}=g(W_{hh}h^{<t-1>} + W_{hx}x^{<t>} + b_h)$


To prove: __Formula 1__ $\Leftrightarrow$ __Formula 2__

You will ignore the bias term $b_h$ and the activation function $g(\ )$ because the transformation will be identical for each formula. So what we really want to compare is the result of the following parameters inside each formula:

$W_{h}[h^{<t-1>},x^{<t>}] \quad \Leftrightarrow \quad W_{hh}h^{<t-1>} + W_{hx}x^{<t>} $

You will do this by using matrix multiplication combined with the data and techniques (stacking/concatenating) from above.

* Try adding a sigmoid activation function and bias term to the checks for completeness.


In [17]:
w_hh = np.full((3,2), 1)
w_hx = np.full((3,3), 9)
h_t_prev = np.full((2,1), 1)
x_t = np.full((3,1), 9)

### Formula 1 verification

In [26]:
stack_1 = np.hstack((w_hh, w_hx))

In [32]:
print(f"Stack 1:\n {stack_1}")
print()
print(f"Stack 1 (Shape): {stack_1.shape}")

Stack 1:
 [[1 1 9 9 9]
 [1 1 9 9 9]
 [1 1 9 9 9]]

Stack 1 (Shape): (3, 5)


In [33]:
stack_2 = np.vstack((h_t_prev, x_t))

In [34]:
print(f"Stack 2:\n {stack_2}")
print()
print(f"Stack 2 (Shape): {stack_2.shape}")

Stack 2:
 [[1]
 [1]
 [9]
 [9]
 [9]]

Stack 2 (Shape): (5, 1)


In [20]:
formula_1 = stack_1 @ stack_2

In [35]:
print(f"Formula 1:\n {formula_1}")
print()
print(f"Formula 1 (Shape): {formula_1.shape}")

Formula 1:
 [[245]
 [245]
 [245]]

Formula 1 (Shape): (3, 1)


## Formula 2 verification

In [37]:
mul_1 = w_hh @ h_t_prev

In [38]:
print(f"Mul 1:\n {mul_1}")
print()
print(f"Mul 1 (Shape): {mul_1.shape}")

Mul 1:
 [[2]
 [2]
 [2]]

Mul 1 (Shape): (3, 1)


In [39]:
mul_2 = w_hx @ x_t

In [40]:
print(f"Mul 2:\n {mul_2}")
print()
print(f"Mul 2 (Shape): {mul_2.shape}")

Mul 2:
 [[243]
 [243]
 [243]]

Mul 2 (Shape): (3, 1)


In [41]:
formula_2 = mul_1 + mul_2

In [42]:
print(f"Formula 2:\n {formula_2}")
print()
print(f"Formula 2 (Shape): {formula_2.shape}")

Formula 2:
 [[245]
 [245]
 [245]]

Formula 2 (Shape): (3, 1)


## Conclusion

They are the same!!