### Part II. Neural Networks and Deep Learning

# 10. Introduction to Artificial Neural Networks with Keras

Artificial Neural Networks (ANN) is a Machine Learning model inspired by the networks of biological neurons found in our brains.  

Although they draw from our biological brains, they have slightly evolved to be somehow different.

### Why is it different this time?

ANNs have been around for quite some time, with efforts going back to a seminal paper by McCulloch and Pitts (1943). However, after a long winter started in the 1960s they seem to be back in town as the cool kid. 

Why would this time be different? According to the author:

**1. Data quantity**, which allows ANNs to perform traditional ML on large and complex problems

**2. Computing power**, thanks to Moore's Law, the gaming industry (for GPUs) and cloud computing 

**3. Improved algorithms** (not that different from 1990s, but those differences had huge impact)

**4. Theoretical limitations** (e.g. getting stuck in local optima) are **rather rare** in practice, or **not as serious** as previously thought

**5. Virtual cycle** of applications > reaserch + funding > more and better applications

### Logical Computations with Neurons

_**Note**: Skipping the part on biological neurons_

McCulloch and Pitts proposed a simple model of the biological neuron later known as an **artificial neuron** characterized by one or more binary inputs and one binary output. 

Even with this simple neuron, it is possible to build an ANN that computes any logical proposition:

![ANN](images/10.ANN.png)

Assuming an activation threshold of two, we can see from the picture above how this could work. 

### The Perceptron

The next step in complexity is the _Perceptron_. Invented in 1957 by Frank Rosenblatt, it is based on an artificial neuron called a _threshold logic unit_ (TLU) / _linear threshold unit_ (LTU). Inputs and outputs are numbers and each input is connected with a weight.

TLU computes a weighted sum and then applies a step function to the sum:

1. $z = w_1x_1 + w_2n_2 + \cdots + w_nx_n = x^tw$

2. $h_w(x) = step(z) = step(x^tw)$

![Threshold Logic Unit](images/10.TLU.png)

Two common step functions used in Perceptrons:

1. Heaviside (z) $ = \begin{cases}
0 & z < 0\\
1 & z \ge 0
\end{cases}$

2. Sign (z) $ = \begin{cases}
-1 & z < 0 \\
0 & z = 0 \\
+1 & z > 0
\end{cases}$

A single TLU can be used for linear binary classification (using a threshold, similarly to LogReg or SVM).  
A Perceptron is simply a single layer of TLUs, with each TLU connected to all the inputs. Having multiple output TLU makes possible to perform multioutput classification. 

It is then possible to compute the outputs of a layer of artificial neurons for several instances at once:

$h_{W,b} (X) = \phi (XW + b)$

$X$ = matrix of input features  

$W$ = weights of neurons (expept bias). One row per input neuron and one col for artificial neuron. 

$b$ = vector containing all the connection weights between the bias neuron and the artificial neurons. It has one bias term per artificial neuron.

$\phi$ = activation function. In our case (artificial neuron = TLU), the activation function is a step function.

#### Training

The Perceptron is then trained reinforcing connections that help **reduce the error**. 

More specifically, the Perceptron is fed one training instance at a time, and for each instance it makes its predictions. For every output neuron that produced a wrong prediction, it reinforces the connection weights from the inputs that would have contributed to the correct prediction.

More formally:

$w_{i,y}^{(next step)} = w_{i,y} + \eta (y_j - \hat{y}_j)x_i $ 

* $w_{i,y}$ = connection weight between $i^{th}$ input neuron and $j^{th}$ output neuron

* $x_i$ is the $i^{th}$ input value of the current training instance

* $\hat{y}_j$ is the output of the $j^{th}$ output neuron for the current training instance

* $y_j$ is the target output of the $j^{th}$ output neuron for the current training instance

* $\eta$ is the learning rate

**Note**: the Perceptron decision boundaries are linear, but as long as training instances are linearly separable, it will converge to a solution. 

#### Limitations

The Perceptron is a fairly rudimentary ANN architecture, incapable for example to solve the exclusive or (XOR) classification problem. 

It turns out that some of the limitations of Perceptrons can be eliminated by stacking multiple Perceptrons: the resulting ANN is known as **Multilayer Perceptron (MLP)**. 

#### Multilayer Perceptron and Backpropagation

