# Demystifying Neural Networks 

---

# Exercises - ANN Weights

We will generate matrices that can be used as an ANN.

You can generate matrices with any function from `numpy.random`.
You can provide a tuple to the `size=` parameter to get an array
of that shape.  For example, `np.random.normal(0, 1, (3, 6))`
generates a matrix of 3 rows and 6 columns.

In [1]:
import numpy as np

#### 1. Generate the following matrices with values from the normal distribution

A) $A_{2 \times 3}$

B) $B_{7 \times 5}$

In [2]:
A = np.random.normal(0, 1, (2, 3))
B = np.random.normal(0, 1, (7, 5))
print(A)
print(B)

[[ 0.41368386 -0.46197109 -0.31394177]
 [-0.76525426 -0.12080901 -1.15777574]]
[[-1.24587997  0.9302691   2.48813365  0.02748313 -2.26752876]
 [-0.6345608   0.91266004  1.07964376 -0.27285384  0.58175895]
 [ 1.11946317  0.75285805  1.0308717   1.50849316 -1.07940968]
 [ 0.69269657  1.19468001  1.1692544  -0.88120546  0.55920565]
 [-1.57520478  0.04654562  1.06973026  1.78777095 -0.99759735]
 [-0.30326333  0.42675967 -1.58884367 -0.91625    -0.20531768]
 [-0.30970048 -0.44785164 -1.25832689 -0.87159608  2.22253136]]


#### 2. Generate matrices of the same size as the used in the `pytorch` network

$$
W_{25 \times 8}, W_{B\: 25 \times 1},
W'_{10 \times 25}, W'_{B\: 10 \times 1},
W'_{2 \times 10}, W'_{B\: 2 \times 1}
$$

In [3]:
W = np.random.normal(0, 1, (25, 8))
W_B = np.random.normal(0, 1, (25, 1))
Wx = np.random.normal(0, 1, (10, 25))
Wx_B = np.random.normal(0, 1, (10, 1))
Wxx = np.random.normal(0, 1, (2, 10))
Wxx_B = np.random.normal(0, 1, (2, 1))

weights = [W, W_B, Wx, Wx_B, Wxx, Wxx_B]
print([x.shape for x in weights])

[(25, 8), (25, 1), (10, 25), (10, 1), (2, 10), (2, 1)]


---

Weight generation is a big topic in ANN research.
We will use one well accepted way of generating ways but there are plethora of others.

The way we will generate weight matrices is to:
If we need to generate a matrix of size $p \times n$,
we take all values for the matrix from the normal distribution
with mean and standard deviation as:

$$
\mu = 0 \\
\sigma = \frac{1}{n + p}
$$

In `numpy` the mean argument is `loc=` and standard deviation is called `scale=`

#### 3. Generate the same matrices as above but use the distribution describe above, then evaluate

$$
X = \left[
\begin{matrix}
102.50781 & 58.88243 &  0.46532 & -0.51509 & 1.67726 & 14.86015 & 10.57649 & 127.39358 \\
142.07812 & 45.28807 & -0.32033 &  0.28395 & 5.37625 & 29.00990 &  6.07627 &  37.83139 \\
138.17969 & 51.52448 & -0.03185 &  0.04680 & 6.33027 & 31.57635 &  5.15594 &  26.14331 \\
\end{matrix}
\right]
$$

(These are the first three rows in the pulsar dataset)

$$
\hat{Y}_{2 \times 3} = tanh(W''_{2 \times 10} \times
    tanh(W'_{10 \times 25} \times
        tanh(W_{25 \times 8} \times X^T + W_{B\: 25 \times 1})
    + W'_{B\: 10 \times 1})
+ W''_{B\: 2 \times 1})
$$

In [4]:
X = np.array([
    [102.50781, 58.88243,  0.46532, -0.51509, 1.67726, 14.86015, 10.57649, 127.39358],
    [142.07812, 45.28807, -0.32033,  0.28395, 5.37625, 29.00990,  6.07627,  37.83139],
    [138.17969, 51.52448, -0.03185,  0.04680, 6.33027, 31.57635,  5.15594,  26.14331],
])
W = np.random.normal(0, 1/(8+25), (25, 8))
W_B = np.random.normal(0, 1/(25+1), (25, 1))
Wx = np.random.normal(0, 1/(10+25), (10, 25))
Wx_B = np.random.normal(0, 1/(10+1), (10, 1))
Wxx = np.random.normal(0, 1/(2+10), (2, 10))
Wxx_B = np.random.normal(0, 1/(2+1), (2, 1));

In [5]:
Y_hat = np.tanh(Wxx @ np.tanh(Wx @ np.tanh(W @ X.T + W_B) + Wx_B) + Wxx_B)
print(Y_hat.T)

[[-0.36091669  0.03487534]
 [-0.38056382  0.04280491]
 [-0.38015046  0.04409401]]
