# Implementing an artifical neuron from scratch

Example of an artificial neural network and the activation function.

In [None]:
import numpy as np

def sigmoid(x):
    y = 1/(1+ np.exp(-x))
    return y

def activate(inputs, weights):
    h = 0
    
    # Perform dot product
    dot_prod = np.dot(inputs, weights) 
    
    # Perform activation function Sigmoid
    h = sigmoid(dot_prod)
    
    #Return h
    return h

In [None]:
#if __name__=="__main__":
inputs = [0.5, 0.3, 0.2]
weights = [0.4, 0.7, 0.2]
output = activate(inputs, weights) #Computational unit of the ANN.
print(output)

First neuron has been implemented from scratch in python. 
Artificial neurons are loosely inspired to biological neurons.
Artificial neurons are computuational units and they transform inputs into outputs using an activation function.

## Computation in neural networks

Introduction of the multi-layer perceptrion.
    A single neuron works for linear problems.
    Real-world problems are complex.
    ANNs can reproduce highly non-linear functions.

The artificial neuron does two things:
1. Performing the net or sum of all weights multiplied (dot product) of the inputs (w.x)
2. Modulating the net input using an activation function (prediction function). a = f(h)

Computation in MLP (1st layer or input layer):
1. x is passed as an input vector.

Computation in MLP (2nd layer or 1st hidden layer):
1. h_(2nd layer) = np.dot(x, W_(1st layer))
2. activation_function_(2nd_layer) = f(h_(2nd layer))

Computation in MLP (3rd layer or 2nd hidden layer): 
1. h_(3rd layer) = np.dot(x, W_(2nd layer))
2. activation_function_(3rd_layer) = f(h_(3rd layer))
*and it continues with as many layers present in the topology of the artifical neural network you are working on.

Computation in MLP(Last layer or output layer):
1. h_(last layer) = np.dot(x, W_(n-1 layer))
2. activation_function_(last_layer) = f(h_(last layer))

In [None]:
inputs = [0.8, 1]
weights = [[1.2, 0.7, 1], 
           [2, 0.6, 1.8]]
#output = activate(inputs, weights) #Computational unit of the ANN.
hidden_output = activate(inputs, weights)
print(hidden_output)

#Weight matrix between hidden layer and output layer given
weights_2 = [1, 0.9, 1.5]
output = activate(hidden_output, weights_2)
print(output)

## Implementing a neural network

In [3]:
import numpy as np

In [8]:
class MLP: #MLP: Multi-Layer Perceptron
    def __init__(self, num_inputs=3, num_hidden=[3, 5], num_outputs=2):
        #First hidden layer has 3 neurons/inputs and second hidden layer has 5 neurons/inputs
        #Output has 2 outcomes, or categories that the prediction will fall into. 
        
        self.num_inputs = num_inputs
        self.num_hidden = num_hidden
        self.num_outputs = num_outputs
        
        #Internal representation of a hidden layers, as a list
        #Each item in list represents # of neurons in a layer. Layer moves from 0 index to # of layers that we have.
        layers = [self.num_inputs] + self.num_hidden + [self.num_outputs] #Concatenating function variables with +
        
        #Initiate random weights; weights represent the connections and their connection strengths (hence the word weights).
        self.weights = [] #Initializing weight vector
        
        #Iterating through all layers to create matrix weight for each pair of layers
        for i in range(len(layers) - 1):
            #Rows are the current layer that it's in and columns are number of neurons on the subsequent layer.
            #We have all connection of a neuron from the previous layer in the rows with the subsequent/previous layer.
            #Number of rows equal number of neurons in a layer and number of columns equal number of neurons in sub layer. 
            w = np.random.rand(layers[i], layers[i+1]) #2-D Array 
            self.weights.append(w) #Store weight matrixes; number of weight matrixes is equal to number of layers minus 1.
            
    def forward_propagate(self, inputs): #Do computation forward into the neural network. 
        activations = inputs #For 1st layer, activation is the inputs.
        
        for w in self.weights: #Loop through all weight matrixes which is looping through all layers in the network.
            #Calculating net inputs of given layer
            net_inputs = np.dot(activations, w) #Matrix multiplication of activation of previous layer with weight matrix. 
            
            #Calculating activation of given layer
            activations = self._sigmoid(net_inputs) #Passing net inputs to the sigmoid function
            
        return activations
        
    def _sigmoid(self, x):
        return 1 / (1 + np.exp(-x))

In [9]:
if __name__ == "__main__":
    
    #create an MLP
    mlp = MLP()
    
    #create some inputs
    inputs = np.random.rand(mlp.num_inputs) #1-D array or vector; size is number of neurons receiving the inputs 
    
    #perform forward prop; ouputs of previous layer is inputs of next layer
    outputs = mlp.forward_propagate(inputs)
    
    #print the results
    print("The network input is: {}".format(inputs))
    #print(f"The network output is: {outputs}")
    print("The network output is: {}".format(outputs))

The network input is: [0.55010556 0.73036645 0.10473932]
The network output is: [0.8970457  0.86531052]


## Training a neural network: Backward propagation and gradient descent

1. Tweak all the weights of the connections among different neurons so that we can have predictions that are really good.
2. How we do this is feed training data (input + target) to the network. 
3. Look at predictions and calculate the error between the prediction and the expected outcome.
4. Calculate gradient of error function over weights. Gradient will tell us the direction of local minimum of loss function. 
5. We use this information to perform iterative adjustments to the weights via back propagation. 
6. Back propagation updates the parameters.

The loss function or error function being used is E(p,y) = 1/2(p-y)^2.

To calculate the gradient of the error function, we use: partial_derivate(E)/partial_derivate(W_nlayer). Therefore, the gradient of the quadratic error function is partial

In machine learning, a loss function and an optimizer are two essential components that help to improve the performance of a model. A loss function measures the difference between the predicted output of a model and the actual output, while an optimizer adjusts the model's parameters to minimize the loss function.

F = F(x,W)

E = E(p,y) = E(F(x,W),y)

The error is a function of the weights itself. It makes sense to calculate the gradient of the error function of the weights since E depends on the weights. 

## Backpropagation (Moving the error signal from right to left)

Going backwards, from the output function to the first layer or input layer of an ANN:
1. Caculate the error on the prediction vs actual.
2. Once we have the errors, we use them to calculate the first derivate of the error with respect to W_n-1_layer.
3. We back propagate, or use this information, to caculate the derivate of the error function with respect to W_n-2_layer.
4. This step is repeated as we move backwards in the ANN until we reach the input layer. The last calculation is done between the first hidden layer and the input layer. 

## Update parameters

To update parameters, we use a very important algorithm called gradient descent. This works by taking a step in the opposite direction to the gradient. The size of the step is the learning rate, a hyperparameter. 

What we mean when taking a step in the opposite direction to the gradient:
1. Gradient on a graph is represented by a straight line pointing to the direction where the loss function increases. 
2. We want to minimize the loss/error function so we take learning steps in the opposite of the gradient. 
3. Steps are taken until we reach the global minimum. 