<h1><center>Forward Propagation in Neural Networks</center></h1>
<h3><center>Reading time: 10 minutes</center></h3>

![Example of a neural network](./Images/neuralnetwork.svg)

*Forward propagation in neural networks is the process of updating/calculating values in individual neurons in our neural network. When performing forward propagation and back propagation, we essentially have developed the basics for our neural network. With the use of these two methods we can easily train a neural network. Forward propagation is the method of calculating the outputs of our neural network, this means we take some inputs and send them through the hidden layers of our neural network to calculate our output. Backpropagation is the mathematical method of looking at our output to calculate loss and moving backwards (right to left) through our neural network and updating the weights and biases for each neuron to minimize this loss.*

**In this walkthrough we will develop a simple neural network by hand and run only the forward pass to gain a strong foundation on what this step of training looks like.**



<h2><center>Mathematical Foundation - Preactivation</center></h2>

*Forward propagation is a 2-step method for each neuron. First we will do what's called "preactivation" and then we will do what's called "activation". This is done for each neuron in the hidden and output layers. Preactivation is just the pure calculation of neuron values. What this means is each neurons value after the input layer is calculated by taking the sum of multiplying the weight (edge) and the incoming neuron value. Most times you will also add a bias to each neuron calculation.*

**_Preactivation is truly just a weighted sum of inputs. This can be derived as_**

<center>$\sum{w_ix_i} + b_l$</center>

*The edges connecting each neuron to another is considered the weight of those neurons connections. Performing the sum of all connected edges times their associated neuron results in the output for a given neuron. Further, each layer normally has a bias which is added to the weighted sum at the very end. In this notebook we will keep the bias at 0, but in most neural networks you will have a shared bias for the entire layer or even a bias for each neuron in a layer. Let's do an example*

![Example of a neural network](./Images/neuralnetwork.png)

**Let's compute the value for $h_1$. Our given derivation is $\sum{w_ix_i} + b_l$, as such we see that the value for $h_1$ is $h_1 = i_1w_1 + i_2w_2 + b_1$. Let's derive each neuron manually using this equation to ensure the step of preactivation is very clear**

* $h_1 = i_1w_1 + i_2w_2 + b_1 $
* $h_2 = i_1w_3 + i_2w_4 + b_1 $
* $output = h_1w_5 + h_2w_6 + b_2$


<h2><center>Mathematical Foundation - Activation</center></h2>


**Great, preactivation is clearly just a weighted sum of the inputs of any given neuron! However, this generally isn't enough. The next step is to apply an activation function to each neuron after calculating its value through preactivation. A common activation function is sigmoid (this is also a very easily understood one). Sigmoid basically takes our value from our preactivation steps and converts it into the following interval    
[0, 1]. Sigmoid is simply the fraction of $1 + e$ to the power of our preactivation value**

<center><h2>$\frac{1}{1+e^{-z}}$</h2></center>

**All we need to do to finish the process of forward propagation is apply this activation function after we do preactivation on each neuron. Let's do the same thing as before and do this manually for each neuron**

* $Preh_1 = i_1w_1 + i_2w_2 + b_1 $
    * $h_1 = \frac{1}{1+e^{-Preh_1}}$
* $Preh_2 = i_1w_3 + i_2w_4 + b_1 $
    * $h_2 = \frac{1}{1+e^{-Preh_2}}$
* $Preoutput = h_1w_5 + h_2w_6 + b_2$
    * $output = \frac{1}{1+e^{-Preoutput}}$



<h2><center>Mathematical Foundation - Numerical Example</center></h2>

**Let's do an example with numerical values. Traditionally, we initialize the bias values for each layer to be 0 and the weights to be within a range of [0,1] (this can vary, but for this example we will randomly initialize all weights between 0 - 1). Obviously, the values for the input neurons will depend on our dataset, so we will just take random inputs as well. Let's take both bias terms to be 0 and $w_1 = 0.1, w_2 = 0.6, w_3 = 0.9, w_4 = 1, w_5 = 0.1, w_6 = 1, i_1 = 1, i_2 = 0$.**

**Now we can do preactivation and activation for each neuron.**

Preactivation -> Activation

* $h_1 = 1*0.1 + 0*0.6 + 0  = 0.1$
    * $h_1 = \frac{1}{1+e^{-0.1}} = 0.525$
    
Preactivation -> Activation

* $h_2 = 1*0.9 + 0*1 + 0  = 0.9$
    * $h_2 = \frac{1}{1+e^{-0.9}} = 0.711$

Preactivation -> Activation

* $output = 0.525*0.1 + 0.711*1 + 0 = 0.7635 $
    * $output = \frac{1}{1+e^{-0.7635}} = 0.682$


<h2><center>Code Example</center></h2>

**Let's continue to use the first diagram. We know we have 6 weights, 2 inputs, and 3 biases. Remember, we really just randomly initialize the weights and set the biases to zero**

In [1]:
import numpy as np

# Sigmoid
def sigmoid(x):
     return 1.0/(1.0 + np.exp(-x))

# Our six weights
Weights = [0.1, 0.6,0.9,1,0.1,1]

# All bias terms in this example are zero
Bias = 0

# Let's do 3 iterations of 3 different inputs
inputs = [[1,0],[0,1],[1,1]]


# Three examples of forward propagation. The first one is the same example we did above
for input in inputs:
    i1 = input[0]
    i2 = input[1]
    
    # h1 & h2 neuron (above diagram)
    h1 = i1*Weights[0] + i2*Weights[1] + 0
    h1 = sigmoid(h1)
                
    h2 = i1*Weights[2] + i2*Weights[3] + 0
    h2 = sigmoid(h2)
    
    output = h1*Weights[4] + h2*Weights[5] + 0
    output = sigmoid(output)
    
    print("Output for inputs {}: {}".format(input, output))
    


Output for inputs [1, 0]: 0.6821017378863259
Output for inputs [0, 1]: 0.6890376797993403
Output for inputs [1, 1]: 0.7184346752219324


<h2><center>Code Example - Numpy</center></h2>

<b><p>With Numpy we can do this much cleaner by taking advantage of matrix multiplication. Here we see that we have a array that holds the weights going into each neuron. Each layer must have a set of weights for each neuron. This set of weights for each neuron will be the size of the previous layers amount of neurons. Here for our hidden layer we have 2 input neurons. This means we are required to have 2 weights going into each neuron in our hidden layer. Since we have 2 neurons in the hidden layer our weights are simply 2 sets of 2 (2x2). Our output layer is one neuron so we need one set of weights and the amount of weights we need are determined by the neuron count in the previous layer, this is clearly 2. Therefore we need 1 set of 2 weights for our final layer (output). Let's look how this makes sense in Linear Algebra</p></b>


$\begin{pmatrix}x_1&x_2\end{pmatrix}\begin{pmatrix}w_1&w_3\\ w_2&w_4\end{pmatrix}=\begin{pmatrix}x_1w_1+x_2w_2&x_1w_3+x_2w_4\end{pmatrix}$

If we think of this first matrix as our weights and the second matrix as the neuron values we can simply yield the result for each neuron in the hidden layer through one matrix multiplication. Note that each row indicates one neuron

$\begin{pmatrix}1&0\end{pmatrix}\begin{pmatrix}0.1&0.9\\ 0.6&1\end{pmatrix}=\begin{pmatrix}0.1&0.9\end{pmatrix}$

We can also do this for the output layer. First make sure to take the sigmoid of this matrix (0.1 and 0.9):

$\begin{pmatrix}0.5249&0.71094\end{pmatrix}\begin{pmatrix}0.1\\ 1\end{pmatrix}=\begin{pmatrix}0.76343\end{pmatrix}$

Clearly, after we do the activation function (sigmoid) on this result we get the same output value as the previous example: ~0.682

In [2]:
import numpy as np


def sigmoid(x):
    return 1.0/(1.0 + np.exp(-x))

# Weights for hidden layer
w1 = np.array([[0.1, 0.6],[0.9,1.0]]).T

# Weights for output layer
w2 = np.array([0.1,1.0]).T

# All bias terms in this example are zero
Bias = 0

# Let's do 3 iterations of 3 different inputs
inputs = np.array([[1,0],[0,1],[1,1]])

for input in inputs:
    # h1 & h2 neuron from above diagram
    hidden = sigmoid(input @ w1)
        
    # ouput neuron from above diagram
    output = sigmoid(hidden @ w2)
        
    print("Output for inputs {}: {}".format(input, output))
    


Output for inputs [1 0]: 0.6821017378863259
Output for inputs [0 1]: 0.6890376797993403
Output for inputs [1 1]: 0.7184346752219324


<h2><center>Robust Code</center></h2>

<b><p>Now that we understand how to do forward propagation by hand using matrix multiplication, let's develop a more robust system from scratch. Note** if we use PyTorch, this is all handled for us, but this is great for learning and understanding how to develop neural networks from scratch!</p></b>

In [3]:
import numpy as np


class NeuralNetwork:
    
    def __init__(self):
        self.W1 = np.array(np.random.randn(2,2))
        self.W2 = np.array(np.random.randn(2,1))
        self.B1 = np.zeros((1,2))
        self.B2 = np.zeros((1,1))
        
    def sigmoid(self, neurons):
        return 1.0/(1.0 + np.exp(-neurons))
    
    def forward(self, input):
        hidden = self.sigmoid(input @ self.W1)
        output = self.sigmoid(hidden @ self.W2)
        return output
    
net = NeuralNetwork()
train = np.array([1,0])
net.forward(train)


array([0.38609474])

<h2><center>Production Code</center></h2>

<b><p>We need a much more robust code base for forward propagation. Something that allows us to tune our layers and inputs much more. With this code base we can now tune our neural network as we please. We can use more hidden neurons, more hidden layers, more output neurons, and more input neurons.</p></b>

In [3]:
import numpy as np


# Each layer in our neural network
class NeuralLayer:
    # Randomly initialize weights and biases based off of layer size
    def __init__(self, input_neurons, output_neurons):
        self.weights = np.random.randn(input_neurons, output_neurons)
        self.bias = np.zeros((1,output_neurons))

    # Two different activations, sigmoid by default
    def sigmoid(self, neurons):
        return 1.0/(1.0 + np.exp(-neurons))
    
    def relu(self, neurons):
        return neuron * (neurons > 0)

    # Forward pass
    def forward(self, input, activation):
        if activation == 'sigmoid':
            return self.sigmoid(input @ self.weights + self.bias)
        else:
            return self.relu(input @ self.weights + self.bias)


# Our neural net
class NeuralNetwork:
    
    # Dynamically create all layers 
    def __init__(self, input_neurons, hidden_neurons, layer_count, output_neurons = 1):
        
        # Used to ensure input neurons match inputted data
        self.neuron_safety = input_neurons
        
        # Assert we have a input and output layer at the least
        assert layer_count >= 2 and output_neurons >= 1
        
        # Input layer
        self.layers = [NeuralLayer(input_neurons, hidden_neurons)]
                
        # Hidden Layers
        for i in range(layer_count - 2):
            self.layers.append(NeuralLayer(hidden_neurons, hidden_neurons))
            
        # Output layer
        self.layers.append(NeuralLayer(hidden_neurons, output_neurons))
    
    # Forward pass for each layer
    def forward(self, inp, activation = 'sigmoid'):
        
        assert inp.shape[0] == self.neuron_safety
        
        
        for layer in self.layers:
            inp = layer.forward(inp, activation)
            
        return inp 

In [4]:
# Create a neural network with 3 inputs, 6 hidden neurons in each layer, and 5 layers 
net = NeuralNetwork(3,6,5)

# Input data
X = np.array(([1,0,6]))

X = net.forward(X)
print(X)

[[0.82857646]]


<h2><center>Summary</center></h2>


**Forward propagation is simply the idea of calculating values for neurons. More so, it's simply the idea of moving from the input layer to the output layer. This is how our neural networks calculates its predictions. We've seen in this notebook how to actually do forward propagation through preactivation and activation. The next step is to use back propagation to update our weights and biases until our model performs well!**


