## **Introduction to Multi-Layer Perceptrons**

In [1]:
import numpy as np
import pandas as pd

## create the data set
fat_score = [.2, .1, .2, .2, .4, .3]
salt_score = [.9, .1, .4, .5, .5, .8]
acceptance = [1, 0, 0, 0, 1, 1]

## combine in a df
df = pd.DataFrame({'fat_score': fat_score, 'salt_score': salt_score, 'acceptance': acceptance})
df

Unnamed: 0,fat_score,salt_score,acceptance
0,0.2,0.9,1
1,0.1,0.1,0
2,0.2,0.4,0
3,0.2,0.5,0
4,0.4,0.5,1
5,0.3,0.8,1


In [2]:
## X and y
X = df[['fat_score', 'salt_score']]
y = df['acceptance']

#### **Perceptron**

Only have ONE neuron. There is NO hidden layers. You only have the INPUT layer and the OUTPUT layer. The OUTPUT layer has only one neuron.

In [7]:
from sklearn.neural_network import MLPClassifier

## hyper-parameter: hidden_layer_sizes
## hidden_layer_sizes = ()   this is NO hidden layers
## hidden_layer_sizes = (4,) this is 1 layer with 4 neurons
## hidden_layer_sizes = (10, 5, 2) this is a 3-layer NN with 10 neurons (1st), 5 and 2 neurons on (2nd and 3rd)

nn = MLPClassifier(hidden_layer_sizes = (), max_iter = 5000, random_state = 13)

## fit
## weights are RANDOMLY selected
## gradient descent is performed to get the OPTIMAL weights
nn.fit(X, y)

#### **Multi-Layer Perceptron**

These NNs have hidden layers. If the number of hidden layers is greater than 1, we have a deep learning algorithm. Also called feed-forward Neural Nets.

In [8]:
## 1 hidden layer with 4 neurons
nn = MLPClassifier(hidden_layer_sizes = (4,), max_iter = 5000, random_state = 13)

## fit
nn.fit(X, y)

#### **Weights and Biases**

Neural nets ONLY need to estimate the optimal weights and parameters. The weights and biases are optimized using gradient descent.

* weights: think slopes. They are initialized randomly, usually with normal(0,1) distribution.
* biases: think of y-intercepts. They are usually initialized at 0.

In [12]:
## instace of perceptron
nn = MLPClassifier(hidden_layer_sizes=(), max_iter=5000, random_state=13)

## fit
nn.fit(X, y)

In [13]:
## weights
nn.coefs_

[array([[2.60706861],
        [1.30688719]])]

In [14]:
## biases
nn.intercepts_

[array([-0.92760001])]

In [15]:
## 1 hidden layer
## 4 neurons
nn = MLPClassifier(hidden_layer_sizes=(4,), max_iter=5000, random_state=13)

## fit
nn.fit(X, y)

In [16]:
## weights
nn.coefs_

[array([[-8.25552510e-01,  1.52313287e+00,  9.51346292e-08,
          3.41976125e+00],
        [-7.18271333e-02,  1.40168683e+00,  3.82722973e-46,
          2.51699120e+00]]),
 array([[-1.40466276e+00],
        [ 2.86681285e+00],
        [-1.81338372e-35],
        [ 2.68533518e+00]])]

In [17]:
## biases
nn.intercepts_

[array([ 1.49274775, -0.77081557, -0.92992695, -1.38690639]),
 array([-1.73216471])]

The number of parameters estimates is: weights + biases. Will depend on the sizes of input, hidden layers and output

In [18]:
## Suppose you have
## predictors: 80
## 3 hidden layers
#### 100, 50, 25
## response: 1

(80*100) + 100  + (100*50) + 50  + (50*25) + 25 + (25*1) + 1

14451

#### **Summation and Activation in Neurons**

In [23]:
## instace of perceptron
nn = MLPClassifier(hidden_layer_sizes=(), max_iter=5000, random_state=13, activation = 'identity')

## fit
nn.fit(X, y)

## weights and biases
print(nn.coefs_)
print(nn.intercepts_)

[array([[2.60706861],
       [1.30688719]])]
[array([-0.92760001])]


In [22]:
## Identity Activation (linear activation) f(z) = z

## w1 = 2.61
## w2 = 1.31
## b = -0.93

## x1 = fat_score  = 0.2
## x2 = salt_score = 0.9

##  What is the output of the IDENTITY activation function?

## 1) summation
z = 2.61*0.2 +  1.31*0.9 - 0.93
print(f'The summation in the neuron gets z: {z}')

## 2) activation
f_z = z
print(f'The activation in the neuron gets f(z): {f_z}')

The summation in the neuron gets z: 0.771
The activation in the neuron gets f(z): 0.771


In [25]:
## Logistic Activation (sigmoid activation) f(z) = 1 / (1+exp(-z))

## w1 = 2.61
## w2 = 1.31
## b = -0.93

## x1 = fat_score  = 0.2
## x2 = salt_score = 0.9

##  What is the output of the LOGISTIC activation function?

## 1) summation
z = 2.61*0.2 +  1.31*0.9 - 0.93
print(f'The summation in the neuron gets z: {z}')

## 2) activation
f_z = 1 / (1+np.exp(-z))
print(f'The activation in the neuron gets f(z): {f_z}')

The summation in the neuron gets z: 0.771
The activation in the neuron gets f(z): 0.6837371741078874


In [27]:
## ReLu Activation (rectifier linear unit) f(z) = max(0,z)

## w1 = 2.61
## w2 = 1.31
## b = -0.93

## x1 = fat_score  = 0.2
## x2 = salt_score = 0.9

##  What is the output of the ReLu activation function?

## 1) summation
z = 2.61*0.2 +  1.31*0.9 - 0.93
print(f'The summation in the neuron gets z: {z}')

## 2) activation
f_z = max(0,z)
print(f'The activation in the neuron gets f(z): {f_z}')

The summation in the neuron gets z: 0.771
The activation in the neuron gets f(z): 0.771
