# Introduction to Artificial Neural Networks with Keras

Artificial Neural Networks (ANNs): is a Machine Learning model inspired by the networks of biological neurons found in our brains.

#### Biological Neuron
![Biological Neuron](images/biological-neuron.jpg)


Biological neurons produce short electrical impulses called *action potentials* (APs, or just *signals*) which travel along the axons and make the synapses release chemical signals called *neurotransmitters*. When a neuron receives suffient amount of these neurotransmitters within a few milliseconds, it fires its own electrical impulses.

These individual neurons are organized in a vast network of billions, with each neuron typically connected to thousands of other neurons.

### Logical computations with Neurons

The below diagram represents the logical computations on neuron activation.

![Logical Computations](images/log_comp.png)

* If A is activated, then C also gets activated
* C is activated only when both A and B are activated (logical *AND*)
* C is activated if either A or B is activated (logical *OR*)
* C is activated only if A is active and B is off.

### The Perceptron
The *Perceptron* is one of the simplest ANN architectures based on a slightly different artificial neuron called a *threshold logic unit* (TLU), or sometimes *lienar threshold unit* (LTU). the inputs and outputs are numbers (instead of binary on/off values, and each input connection is associated with a weight.

The TLU computes the weighted sum of its inputs ($z = w_1x_1 + w_2x_2 +  ... + w_nx_n = x^Tw$), then applies a step function to that sum and outputs the result : $h_w(x)$ = step(z), where z = $x^Tw$.


![TLU](images/tlu.png)


The most common step function used in Perceptrons is the *Heaviside step function.* Sometimes the sign function is used instead.

 $$ heaviside(z)=   \left\{
\begin{array}{ll}
      0 & if z < 0 \\
      1 & if z >= 0 \\
\end{array} 
\right.  $$

 $$ sgn(z)=   \left\{
\begin{array}{ll}
      -1 & if z < 0 \\
      1 & if z = 0 \\
      +1 & if z > 0 \\
\end{array} 
\right.  $$


A Perceptron is simply composed of a single layer of TLUs, with each TLU connected to all inputs.  
The inputs of the Perceptron are fed to special passthrough neurons called *input neurons*: they output whatever they are fed. All the input neurons form the *input layer*. Moreover, an extra bias feature is generally added (x$_0$ = 1): called a *bias neuron*, which outputs 1 all the time.

![TLU](images/perceptron.png)


The above Perceptron can classify instances simultaneously into three different binary classes, which makes it a multioutput classifier.

Computing the outputs of a fully connected layer:

$$h_{W,b}(X) = \phi(XW + b)$$

where,<br>
X represents the matrix of input features,<br>
W is weight matrix,<br>
b is bias vector,<br>
$\phi$ is activation function (here, it is a *step function* as the artificial neurons are TLUs).

### How is a Perceptron trained?

The Perceptron training algorithm was largely inspired by *Hebb's Rule*.
It states that the connection between two neurons tends to increse when they fire simultaneously. This later became known as Hebb's rule (or **Hebbian learning**). 
Perceptrons are trained taking into account the error made by the network when it makes a prediction; the Perceptron learning rule reinforces connnections that help reduce the error.

Perceptron learning rule: $$ w_{i,j}^{(next step)} = w_{i,j} + \eta (y_j - \hat y_j) x_i $$

In this equation:
* $ w_{i,j}$ is the connection weight between the i$^{th}$ input neuron and the j$^{th}$ output neuron.
* x$_i$ is the i$^{th}$ input value of the current training instance.
* $y_j$ is the target output of the j$^{th}$ output neuron for the current training instance.
* $\hat y_j$ is the output of the  j$^{th}$ output neuron for the current training instance.
* $\eta$ is the learning rate.

## Setup

In [5]:
# Python ≥3.5 is required
import sys
assert sys.version_info >= (3, 5)

# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"

try:
    # %tensorflow_version only exists in Colab.
    %tensorflow_version 2.x
except Exception:
    pass

# TensorFlow ≥2.0 is required
import tensorflow as tf
assert tf.__version__ >= "2.0"

# Common imports
import numpy as np
import os

# to make this notebook's output stable across runs
np.random.seed(42)

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

# Where to save the figures
PROJECT_ROOT_DIR = "."
IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, "images")
os.makedirs(IMAGES_PATH, exist_ok=True)

def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = os.path.join(IMAGES_PATH, fig_id + "." + fig_extension)
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)

# Ignore useless warnings (see SciPy issue #5998)
import warnings
warnings.filterwarnings(action="ignore", message="^internal gelsd")

We will implement a single-TLU network with Scikit-Learn.

In [7]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import Perceptron

iris = load_iris()

X = iris.data[:, (2, 3)]
y = (iris.target == 0).astype(np.int) # iris setosa

per_clf = Perceptron()
per_clf.fit(X, y)

y_pred = per_clf.predict([[2, 0.5]])

y_pred

array([0])

Perceptrons are incapable of solving some trivial problems (e.g., the XOR classification problem). This limitation can be eliminated by stacking multiple Perceptrons. The resulting ANN is called a *Multilayer Perceptron* (MLP).

### Multilayer Perceptron and Backpropagation

An MLP is composed of one (passthrough) input layer, one or more layers of TLUs, called *hidden layers*, and one final layer of TLUs called the output layer. Every layer except for the output layer includes a bias neuron and is fully connected to the next layer.

![MLP](images/mlp.png)

When an ANN contains a deep stack of hidden layers, it is called a *deep neural network* (DNN).