# Building Neural Networks

### Let's start with building an Artificial Neural Network(ANN)
>In figure below, we can see an analogy between biological neurons and artificial systems. Both contain a main processing element, a `neuron`, with input signals $(x_1, x_2, ..., x_n)$ and an output (Mohammed 8).

<div>
<img src="images/cv15.png" width="400"/> 
<div>
   


Fig.ANN1: Single Neuron - Single layer Network (`Perceptron`)


<div>
<img src="images/cv16.png" width="600"/> 
<div>


Fig.ANN2: Multilayer Neuron
Deep learning involves layers of neurons in a network or  `Multilayer perceptron` <br>
Fig.7 ( for both `ANN1` and `ANN2`, above) source from (Mohammed 9).

* **ANN is imitation of how information is processed in human brain; when millions of neurons (`perceptrons in ANN`) are stacked in layers and connected together,a multilayer neural network is called `deep learning`.** 

# So, what is a `Perceptron`? 
* **Let's zoom in to a `Multilayer perceptron (MLP)` above.**

### Perceptron is the fundamental building blocks of `Neural Networks` in Deep Learning. 
- If we really want to know how neural network works, we better learn closely how perceptron works.<br>
According to [Wikipedia](https://en.wikipedia.org/wiki/Perceptron) definition, a `Perceptron` is an algorithm for learning a binary classifier called a `threshold function`: a function that maps its input $X$ (a real-valued vector) to an output value $f(X)$ (a single binary value):

$$
f(X) = \left\{
\begin{array}{ll}
      1  \quad \text{if} \quad W.X + b > 0, \\ 0 \quad \text{otherwise} \end{array} \right.
$$

where $W$ is a vector of real-valued weights, $W.X$ is the dot product $\sum_{i=1}^{m}w_ix_i$, where $m$ is the number of inputs to the perceptron, and $b$ is the bias. The bias shifts the decision boundary away from the origin and does not depend on any input value. <br>
The value of $f(X)$ ($0$ or $1$) is used to classify $X$  as either a positive or a negative instance, in the case of a binary classification problem.

In the context of neural networks, a perceptron is an `artificial neuron` using the Heaviside step function as the activation function. The perceptron algorithm is also termed the single-layer perceptron, to distinguish it from a multilayer perceptron, which is a misnomer for a more complicated neural network. As a linear classifier, the single-layer perceptron is the simplest feedforward neural network (source: [Wikipedia](https://en.wikipedia.org/wiki/Perceptron)).

## Notation:
The dot product of two vectors `A` and `B`: <br>
$A = [a_1, a_2, ..., a_n]$ and $B = [b_1, b_2, ..., b_n]$ is given by:<br>
$$
A.B = \sum_{i=1}^{n}a_i*b_i
$$

which is simply $A^T*B$, a matrix multiplication (source: [Dot product in matrix notation](https://mathinsight.org/dot_product_matrix_notation)).

Now, getting back to our perceptron concept, assume we have the following vectors: `X` and `W` where $X= [x_1, x_2, x_3]$ for input vectors and $W = [w_1, w_2, w_3]$ for weight vector. For the sake of simplicity, let's assume $x_1 = 3, x_2 = -2$ and $w_0 = 1$ be the weight.

>You might get the impression that neural networks only understand the most useful features, but that’s not entirely true. Neural networks scoop up all the features available and give them random weights. During the training process, the neural network adjusts these weights to reflect their importance and how they should impact the output prediction. The patterns with the highest appearance frequency will have higher weights and are considered more useful features. Features with the lowest weights will have very little impact on the output (Mohammed 32).

<div>
<img src="images/cv18.png" width="600"/> 
<div>

   Fig.8: Source from (Mohammed 40)

>In both artificial and biological neural networks, a neuron does not just output the bare input it receives. Instead, there is one more step, called an activation function; this is the decision-making unit of the brain. In ANNs, the activation function takes the same weighted sum input from before ($z = Σxi · wi + b$) and activates (fires) the neuron if the weighted sum is higher than a certain `threshold`. This activation happens based on the activation function calculations (Mohammed 42). <br>


### As we can see, a perceptron consists of 4 parts:
    1. Input values or One input layer
    1. Weights and Bias
    1. Net sum
    1. Activation Function

According to `Mohammed`, the perceptron's learning logic goes like this:<br>
>1. The neuron calculates the weighted sum and applies the activation function to
make a prediction $\hat y$. This is called the `feedforward process`: 

$$\hat y = activation(\sum x_i · w_i + b)$$ <br>

>2. It compares the output prediction with the correct label to calculate the error:<br> $e r r o r = y – \hat y$. <br>


>3. It then updates the weight. If the prediction is too high, it adjusts the weight to make a lower prediction the next time, and vice versa. <br>


>4. Repeat! <br>
>This process is repeated many times, and the neuron continues to update the weights to improve its predictions until step 2 produces a very small error (close to zero), which means the neuron’s prediction is very close to the correct value. At this point, we can stop the training and save the weight values that yielded the best results to apply to future cases where the outcome is unknown.

* We'll get back to the [code notebook](https://github.com/sthirpa/Data_Scince_Immersive-at-General-Assembly-/blob/Hirpa/CIFAR-10-SH.ipynb) for the implementation of this theory