# Gradient Descent
It is an algorithm used to find the minimum of a function.

That’s called an optimization problem and this one is huge in mathematics.

It’s also the case in data science, especially when we’re trying to compute the estimator of maximum likelihood.

![alt text](images/fig4.png)

Your approach will be to face the descending slope, and boom you go ahead in this direction for a few minutes.

## What about the math?
* If the derivative is positive, it means the slope goes up (when going to the right!)
* If the derivative is negative, it means the slope goes down.
* If the derivative is equal to 0, it means it doesn’t go up or down.

### Example

Conside the following function:

$$ \Large f(x)=2X^{2}cos(x) - 5x $$

![alt text](images/fig5.png)

 #### Steps
 Our goal is to find the minimum, the one you see on the right, with x
between 3 and 4.

We could, in this simple case, compute the derivative, solve f′(x)=0
, etc. But our goal is to understand gradient descent, so let’s do it!

1. Take a random point x0
2. Compute the value of the slope f′(x0)
3. Walk in the direction opposite to the slope: x1=x0−α∗f′(x0) Here, α is this learning rate we mentioned earlier. And the minus sign enables us to go in the opposite direction.

In [1]:
import numpy as np
def f(x):
    return 2 * x * x * np.cos(x) - 5 * x
x = [-1.]
f(x[0])

6.0806046117362795

In [2]:
def df(x):
    return 4 * x * np.cos(x) - 2 * x * x * np.sin(x) - 5

slope = df(x[0])
slope

-5.478267253856766

In [3]:
alpha = 0.05

x.append(x[0] - alpha * slope)
x[1]

-0.7260866373071617

In [4]:
x.append(x[1] - alpha * df(x[1]))
x[2]

-0.4024997370140509

In [5]:
x = [-1.]
for i in range(20):
    x.append(x[i] - alpha * df(x[i]))
x

[-1.0,
 -0.7260866373071617,
 -0.4024997370140509,
 -0.08477906213634434,
 0.18205499002642517,
 0.39684580640116923,
 0.5797318757542436,
 0.7511409760238664,
 0.929843593497496,
 1.1379425635322518,
 1.4100262396071885,
 1.8111367982460322,
 2.4659523010837896,
 3.481091120446543,
 3.9840239754024296,
 3.5799142362878964,
 3.9342838641256046,
 3.6341484369757358,
 3.900044342976242,
 3.670089111844099,
 3.8747793435314155]

# Linear Regression

## Single Layer Preceptron
The Perceptron is one of the simplest ANN architectures. It is
based on a an artificial neuron  called a linear threshold unit (LTU): the
inputs and output are now numbers (instead of binary on/off values) and each input connection is
associated with a weight.

![alt text](images/fig1.png)

A single LTU can be used for simple linear binary classification. It computes a linear combination of the
inputs and if the result exceeds a threshold, it outputs the positive class or else outputs the negative class
(just like a Logistic Regression classifier or a linear SVM).

A Perceptron with two inputs and three outputs is represented below. This Perceptron can
classify instances simultaneously into three different binary classes, which makes it a multioutput
classifier.

![alt text](images/fig2.png)

## Multiple Layer Preceptron
An MLP is composed of one (passthrough) input layer, one or more layers of LTUs, called hidden layers,
and one final layer of LTUs called the output layer. Every layer except the output layer
includes a bias neuron and is fully connected to the next layer. When an ANN has two or more hidden
layers, it is called a deep neural network (DNN)

![alt text](images/fig3.png)

## Training a single LTU


In [6]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import Perceptron
iris = load_iris()
X = iris.data[:, (2, 3)] # petal length, petal width
y = (iris.target == 0).astype(np.int) # Iris Setosa?
per_clf = Perceptron(random_state=42)
per_clf.fit(X, y)
y_pred = per_clf.predict([[2, 0.5]])

In [7]:
X.shape

(150, 2)

In [8]:
y.shape

(150,)

In [9]:
X

array([[1.4, 0.2],
       [1.4, 0.2],
       [1.3, 0.2],
       [1.5, 0.2],
       [1.4, 0.2],
       [1.7, 0.4],
       [1.4, 0.3],
       [1.5, 0.2],
       [1.4, 0.2],
       [1.5, 0.1],
       [1.5, 0.2],
       [1.6, 0.2],
       [1.4, 0.1],
       [1.1, 0.1],
       [1.2, 0.2],
       [1.5, 0.4],
       [1.3, 0.4],
       [1.4, 0.3],
       [1.7, 0.3],
       [1.5, 0.3],
       [1.7, 0.2],
       [1.5, 0.4],
       [1. , 0.2],
       [1.7, 0.5],
       [1.9, 0.2],
       [1.6, 0.2],
       [1.6, 0.4],
       [1.5, 0.2],
       [1.4, 0.2],
       [1.6, 0.2],
       [1.6, 0.2],
       [1.5, 0.4],
       [1.5, 0.1],
       [1.4, 0.2],
       [1.5, 0.2],
       [1.2, 0.2],
       [1.3, 0.2],
       [1.4, 0.1],
       [1.3, 0.2],
       [1.5, 0.2],
       [1.3, 0.3],
       [1.3, 0.3],
       [1.3, 0.2],
       [1.6, 0.6],
       [1.9, 0.4],
       [1.4, 0.3],
       [1.6, 0.2],
       [1.4, 0.2],
       [1.5, 0.2],
       [1.4, 0.2],
       [4.7, 1.4],
       [4.5, 1.5],
       [4.9,

In [10]:
y

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [11]:
y_pred

array([1])