<h1 style="color:white;background-color:rgb(255, 108, 0);padding-top:1em;padding-bottom:0.7em;padding-left:1em;">2.1 Artificial Neuron Structure</h1>
<hr>

<h2>Introduction</h2>

In this lesson we will learn how the basic unit of a neural network, a single neuron works.
<br>
First we will discuss the theory behind the artificial neuron and then we code examples
<br>
with the help of the NumPy module.

First of all, let's import the NumPy module:

In [None]:
import numpy as np

<h2>Atrificial neuron</h2>

The functioning of an artificial neuron can be best understood through its analogy with the biological neuron.

<center>
<img src="https://cdn-images-1.medium.com/max/1600/0*v4f4-nMoRMNrtUZG.png" width="80%"/>
</center>

Biological neurons take impulses from other neurons through their dendrites. If the intensity of these impulses
<br>
exceed a given level the neuron also emits a signal on its axon. The impulses from other neurons can either be
<br>
inhibited or prohibited, so the neuron is able to consider different inputs with different significance level to
<br>
its own output. In the artificial neuron model, the inputs of the model $(x_i)$ represent the intensity of impulses
<br>
coming from the other neurons. The inhibition/prohibition is achieved by assigning weights $(w_i)$ to the inputs.
<br>
In the cell body the accumulation of the input inpulses is carried out, that is modelled with a summation in the
<br>
artificial neuron model. The property that the biological neuron only activates when the intensity of the inputs
<br>
is greater than a certain level is achieved with a nonlinear function, called activation function in the artificial
<br>
nauron. The activation function serves as a threshold on the output. The final output of the neuron can be
<br>
fed to other neurons through the axon terminals.

So an artificial neuron does the following:

 - Take an input vector $\mathbf{x}$ and a weight vector $\mathbf{w}$
 - Compute the sum of the weighted inputs $net$ as the dot product of the vectors $\mathbf{x}$ and $\mathbf{w}$:
 
 $$net = \sum_{i=0}^n w_ix_i$$
 
 - Apply the activation function $f()$ on $net$ to compute the final result $y$:
 
 $$y = f(net)$$
 
Here $n$ is the number of inputs. Notice that there is an extra input (indexed with $0$). This input does not count
<br>
in the number of inputs $(n)$ and its value is always $1$. The associated weight $w_0$ is often referred to as bias
<br>
and it is signed with $b$ instead of $w_0$.

So in a compact form the output of a single artificial neuron can be computed like:

$$y=f\left(b+\sum_{i=0}^n w_ix_i\right)$$

The only one thing left to discuss is the activation function $f()$.
<br>
The purpose of the activation function is to threshold the weighted sum of the inputs and in most cases, to squash the
<br>
output into a region, so the output value will not be too large or too low which would make further computations difficult.

The first idea that comes to mind is a simple step function (with a hard threshold) like:

$$
f(\mathrm{x})=    
\begin{cases}
      1 & \text{if $\mathrm{x} >$ threshold}\\
      0 & \text{otherwise}
\end{cases}
$$

But it turns out that other activation functions can be better applied in case of more complex problems.
<br>
The most popular activation functions are the sigmoid, the hyperbolic tangent, the ReLU, Leaky ReLU and softmax
<br>
activations. These can be computed like:

Sigmoid:

$$\sigma (\mathrm{x})=\frac{1}{1+e^{-\mathrm{x}}}$$

Hyperbolic tangent:

$$f_{tanh}(\mathrm{x})=\tanh{(\mathrm{x})}=\frac{\sinh{(\mathrm{x})}}{\cosh{(\mathrm{x})}}$$

ReLU:

$$R(\mathrm{x}) = \max{(0,\mathrm{x})}$$


Leaky ReLU:

$$LR(\mathrm{x}) = \max{(\alpha \mathrm{x},\mathrm{x})}, \text{ where $\alpha = 0.01$}$$

The Softmax function is an odd one between these. We will discuss it a bit later. First we have to see how these
<br>
activation functions look like and what are they good for.

Let's plot the the above mentionsed activation functions: