# Multilayer Perceptron and Neural Network

## A Perceptron is an algorithm used for supervised learning of binary classifiers
<img src="perceptron_2.jpg"
      />

### Update rule
$\mathbf{w}_{t+1}=\mathbf{w}_{t}+\left(1-H\left(y_{i} \mathbf{w}^{\top} \mathbf{x}_{i}\right)\right) y_{i} \mathbf{x}_{i}$

In [53]:
import numpy as np
import pandas as pd
from sklearn import datasets

In [54]:
class perceptron:
    def __init__(self,lr=0.1,n_iter=200,init_param='random'):
        self.lr = lr
        self.n_iter = n_iter
        self.init_param = init_param
        self.theta = None
    
    
    def fit(self,X,y):
        b = np.ones(X.shape[0])
        b = b.reshape(b.shape[0],-1)
        X = X.reshape(X.shape[0],-1)
        X = np.hstack((b,X))
        
        y = np.where(y==0,-1,1)

        if self.init_param =='zero':
            self.theta = np.zeros(X.shape[1])
        elif self.init_param =='random':
            self.theta = np.random.rand(X.shape[1])
        else:
            raise Exception("Wrong parameters initialization, initialize to zero or random")
            
        for _iter in range(self.n_iter):
            for ind in range(X.shape[0]):
                y_hat = self.theta.T.dot(X[ind])
                if np.sign(y_hat) == y[ind]:
                    pass
                else:
                    self.theta = self.theta + y[ind] * X[ind]
    
    def predict(self,X):
        b = np.ones(X.shape[0])
        b = b.reshape(b.shape[0],-1)
        X = X.reshape(X.shape[0],-1)
        X = np.hstack((b,X))
        pred = np.sign(X.dot(self.theta))
        return np.where(pred==1,1,0)

    def accuracy(self,pred,label):
        return np.sum(pred==label)/len(label)

In [55]:
iris = datasets.load_iris()
X = iris.data[:, :]  
y = iris.target
y = (y>0)*1

In [68]:
data = np.hstack((X,y.reshape(-1,1)))
np.random.shuffle(data)
data = pd.DataFrame(data,columns=['Feature1','Feature2','Feature3','Feature4','Target'])

In [69]:
#data.head(6)

Unnamed: 0,Feature1,Feature2,Feature3,Feature4,Target
0,5.1,3.4,1.5,0.2,0.0
1,5.6,2.8,4.9,2.0,1.0
2,7.7,3.0,6.1,2.3,1.0
3,6.2,2.9,4.3,1.3,1.0
4,5.0,3.5,1.6,0.6,0.0
5,5.8,2.8,5.1,2.4,1.0


In [60]:
model = perceptron(n_iter=300,init_param='random')
print('Model Pramenters: ',model.theta)

None


In [62]:
model.fit(X,y)
print('Model Pramenters: ',model.theta)

Model Pramenters:  [-0.68541919 -1.12122172 -3.4261289   5.38552346  2.85796252]


In [65]:
print('Training accuracy: ',model.accuracy(model.predict(X),y))

Training accuracy:  1.0


## Multi-layer Perceptron
<img src="mlp_1.jpeg"
      />
      
MLp are typically represented by composing together many different functions. $f(\boldsymbol{x})=f^{(3)}\left(f^{(2)}\left(f^{(1)}(\boldsymbol{x})\right)\right)$

# Forward pass and back-prop
<img src="MLP.png"/>
$\frac{\partial L}{\partial W_{2}}=\frac{\partial L}{\partial X_{2}} \frac{\partial X_2}{\partial W_{2}}$ 

$\frac{\partial L}{\partial W_{1}}=\frac{\partial L}{\partial X_{2}} \frac{\partial X_2}{\partial X_{1}} \frac{\partial X_1}{\partial W_{1}}$

#### Parameters update $W \leftarrow W-\alpha * \nabla_{w} L$
Assume Mean squared error loss $L(X_2, Y) = ||X2 - Y||^2$

$\frac{\partial L}{\partial X_{2}} = 2(X_2 - Y)$ 

$\frac{\partial X_2}{\partial W_{2}} = X_1$

$\frac{\partial X_2}{\partial X_{1}} = W_2$

$\frac{\partial X_1}{\partial W_{1}} = X$

$\nabla_{w_2} L = 2(X_2 - Y)X_1$ and $\nabla_{w_1} L = 2(X_2 - Y)W_2X$
