# Part 1: Implementing a feedforward neural network using NumPy

We'll implement a neural network on the MNIST dataset -- from scratch only using the Numpy library. That will allow us to build an intuition of how our model actually works beyond the surface level PyTorch implementation that we'll do later in this project. 

Like I said, this will be a feedforward neural network, meaning information only flows forward during inference. To do this, we'll make use of dense layers, the ReLU activation function, categorical cross-entropy loss, and a softmax activation function for the output. I've made a little diagram of what we're going to be implementing below:
> INSERT IMG

I'll work on everything in sections, implementing it with example classes in this jupyter notebook, and adding the real classes that we'll use into the classes.py file in this same folder (filepath: fnn/p1-numpy/classes.py).

As a precursor to all our work, let's initialize the necessary packages in the workspace. 

In [5]:
# Package Initialization
import numpy as np

Now we can begin the implementation :)

Dense layers, otherwise known as fully-connected layers, are the foundation of neural networks. A dense layer is really quite simple, if you understand the workings of matrix multiplication. I'll briefly go over it, but if you need a better introduction to dense layers I would recommend the book *Neural Networks from Scratch* by Kinsley and Kukieła.

# Section 1: The Forward Pass of Dense Layers

At the core level, the way a neural network layer works is matrix multiplication. We can work through an example for this. Let's think of this as creating a function where we input a list X, of length 3 with integers, and it outputs a list Y, of length 4 with sum(X) in each entry. You can look at the diagram below for an example:

<br/>

<img src="../../diagrams/ex1.png" alt="A diagram showing our proposed 'function.'"/>

Skip this if you understand matrix multiplication: effectively, what we're doing is just a function that carries out matrix multiplication. Matrix multiplication is done by the element wise multiplication of two matrices of sizes (n x m) and (m x p), respectively. As you can see there, that middle dimension adding up is key, and that's because we're doing row x column. I won't go too into the weeds there, but let's look at a mini-example -- almost recursive if that makes sense :). Let's say we're trying to multiply a matrix X of size (1 x 2) times another matrix Y of size (2 x 2). That can be done because the middle dimension lines up. If matrix X is [a, b] and matrix Y is [[c, d],[e, f]]. That is, where every column of matrix Y represents the weights of one neuron as a column vector. Then, what matrix multiplication is doing is just [(a\*c + a\*e), (b\*d + b\*f)]. That's it for this recursive example, and I hope that makes sense.

Now, I'll quickly use Numpy to show the main example I'm referring to above. Please note, np.array() is the standard way for creating arrays or matrices in numpy. 

In [9]:
# Create the input array of size (1 x 3)
inputs = np.array([1., 2., 4.])
# Create the weight array of size (3 x 4) as we have 3 inputs and 4 desired outputs
weights = np.array([[1., 1., 1., 1.],
                    [1., 1., 1., 1.],
                    [1., 1., 1., 1.]])

# Perform matrix multiplication on the inputs and weights, note np.dot() is the standard and combined way of performing both dot products (which are normally for vectors) and matrix multiplication (which is for matrices)
outputs = np.dot(inputs, weights)

print(f"The matrix mult. output is: {outputs}")

The matrix mult. output is: [7. 7. 7. 7.]


That should about give you an intuition of how a neural network layer really works, but there's one more thing to address in this part of things: biases. 

Biases are parameters which are trainable and which are meant to offset the outputs positively or negatively. Each neuron has its own bias parameter, which we usually initialize biases to 0 and then modify according to the gradients during training. It's really quite a simple concept and there's not much more to it.

In practice, these biases are added as scalars (integers) to the product of each neuron. Let's look at how this happens, using our inputs and weights from the previous code cell.

In [12]:
# INPUTS AND WEIGHTS ARE PRE-LOADED FROM THE EARLIER CELL

# NEW: create the biases 
biases = np.array([1., 2., 3., 4.])

# Perform matrix multiplication on the inputs and weights, note np.dot() is the standard and combined way of performing both dot products (which are normally for vectors) and matrix multiplication (which is for matrices)
outputs = np.dot(inputs, weights)

print(f"The outputs before adding biases are: {outputs}")

outputs += biases

print(f"The outputs after adding biases are: {outputs}")

The outputs before adding biases are: [7. 7. 7. 7.]
The outputs after adding biases are: [ 8.  9. 10. 11.]


Now, we understand how the basics of how the whole thing works, at least in the forward pass. All in all, each neuron is really just carrying out ```y=mx+b``` where m is the input, x is the weight, b is the neuron's bias, and the result y is the neuron's output.

Great, so we can now implement the forward class of our dense layer. Like I said above, I'll implement an example class here but all the changes will be saved in the corresponding classes.py file.

In [17]:
class exDenseLayer:
    # Initialization method to initialize a dense layer object
    def __init__(self, numInputs, numNeurons):
        # Create random weights and scale them down, of size (inputs x neurons)
        self.weights = 0.01 * np.random.randn(numInputs, numNeurons)
        # Create an array filled with 0's with a bias for each neuron
        self.biases = np.zeros((1, numNeurons))
    
    # Our forward pass method
    def forward(self, inputs):
        # Calculate outputs using the method previously discussed
        self.outputs = np.dot(inputs, self.weights) + self.biases

That's about everything we need for a forward pass! Now we get to talk about the ReLU activation function!