# Feed-Forward Neural Network

Note: This notebook already assumes a basic knowledge of neural nets. Things like layers and layer sizes, activation functions, batching, softmax, and so on.

By the end of the notebook we are going to create a simple feed-forward neural net that learns to recognize handwritten digits using the [MNIST-dataset](http://yann.lecun.com/exdb/mnist/).

We'll first start by training a simple neural network to learn to classify XOR:

<table>
    <thead><tr><td>a</td><td>b</td><td>a XOR b</td></tr></thead>
    <tbody>
        <tr><td>0</td><td>0</td><td>0</td></tr>
        <tr><td>0</td><td>1</td><td>1</td></tr>
        <tr><td>1</td><td>0</td><td>1</td></tr>
        <tr><td>1</td><td>1</td><td>0</td></tr>
    </tbody>
</table>

---

We'll start by defining the structure of our network:
<img src="XOR-nn.png" width="60%">

- The first layer (aka the input layer) has two inputs corresponding to $a$ and $b$.
- The middle / hidden layer is composed of three neurons.
- The final layer (aka the output layer) has two outputs. 

The output of the neural network is a vector of length 2 where the first entry is the probability of the result being 0 and the second entry is the probability of the result being 1.

## Feed-Forward

It's called a **Feed-Forward Neural Net** because we **feed the input forward** through the network starting at the input layer until the output.

Here's how we implement the feed forward algorithm.

$$
Z_1 = X_1 \cdot W_1 \\
X_2 = \text{ReLU}(Z_1) \\
Z_2 = X_2 \cdot W_2 \\
\hat{Y} = \text{Softmax}(Z_2)
$$

Note: $\large \cdot$ represents matrix multiplication.

To start we'll get some notation out of the way:
1. **X1** is the input. 
    - It can either be a single instance i.e. \[0, 0\] (1 x 2) or a batch of instances \[[0,0],[0,1],[1,0]] (3 x 2)
1. **W1** is the first weight matrix with a shape of (2 x 3)
2. **W2** is the second weight matrix with a shape of (3 x 2)

In [1]:
%load_ext autoreload
%autoreload 2

import numpy as np

In [2]:
def softmax(X):
    exp = np.exp(X - X.max(axis=1, keepdims=True))
    return exp / exp.sum(axis=1, keepdims=True)

def ReLU(X):
    return np.maximum(X, 0)

def forward(Ws, X):
    X = np.atleast_2d(X)
    W1, W2 = Ws
    
    Z1 = X @ W1
    X2 = ReLU(Z1)
    Z2 = X2 @ W2
    Yhat = softmax(Z2)
    return [X, Z1, X2, Yhat]

In [6]:
X = np.array([[0, 0],
              [0, 1],
              [1, 0],
              [1, 1]])

y = np.array([0, 1, 1, 0])

print(X, '\n')
print(y)

[[0 0]
 [0 1]
 [1 0]
 [1 1]] 

[0 1 1 0]
