## Regression with a Perceptron Model

**Features** are $x_1, x_2$ and **weights** are $w_1, w_2$, **bias** $b$
Therefore the **predictions** are modelled as:

__Prediction Function__
$\hat{y} = w_1 x_1 + w_2 x_2 + b$

Using existing observations (train-data) calculate $y$ therefore loss-function based on the mean squared error (MSE) is,

__Loss Function__
$L(y, \hat{y}) = \frac{1}{2} (y-\hat{y})^2$

**Optimization goal**: find $w$ and $b$ to have least-error for predictions.

In order to use GD algorithm, we need following

$ w_1 \longrightarrow w_1 - \alpha \frac{\delta{L}}{\delta{w_1}}$

$ w_2 \longrightarrow w_2 - \alpha \frac{\delta{L}}{\delta{w_2}}$

$ b \longrightarrow b - \alpha \frac{\delta{L}}{\delta{b}} $


Taking partial derivatives,

$\frac{\delta{L}}{\delta{w_1}} = \frac{\delta{L}}{\delta{\hat{y}}} \frac{\delta{\hat{y}}}{\delta{w_1}}$

$\frac{\delta{L}}{\delta{w_2}} = \frac{\delta{L}}{\delta{\hat{y}}} \frac{\delta{\hat{y}}}{\delta{w_2}}$

$\frac{\delta{L}}{\delta{b}} = \frac{\delta{L}}{\delta{\hat{y}}} \frac{\delta{\hat{y}}}{\delta{b}}$

Calculating individual partial derivatives, we get,

$\frac{\delta{L}}{\delta{\hat{y}}} = -(y-\hat{y}) $

$ \frac{\delta{L}}{\delta{\hat{y}}} \frac{\delta{\hat{y}}}{\delta{w_1}} = -x_1 (y-\hat{y})$

$ \frac{\delta{L}}{\delta{\hat{y}}} \frac{\delta{\hat{y}}}{\delta{w_2}} = -x_2 (y-\hat{y})$

$ \frac{\delta{L}}{\delta{\hat{y}}} \frac{\delta{\hat{y}}}{\delta{b}} = -(y-\hat{y})$


Finally, updated values will be,

$ w_1 \longrightarrow w_1 - \alpha (-x_1 (y-\hat{y}))$

$ w_2 \longrightarrow w_2 - \alpha (-x_2 (y-\hat{y}))$

$ b \longrightarrow b - \alpha(-(y-\hat{y}))$

## Classification with a Perceptron

Activation: Sigmoid Function: 

__Prediction Function__
$\hat{y} = \sigma(w_1 x_1 + w_2 x_2 + b)$

$\sigma(z) = \frac{1}{1+e^{-z}} $

$\sigma'(z) = \sigma(z) [1 - \sigma(z)] $

__Loss Function__
$L(y, \hat{y}) = -y\ln({\hat{y}}) -(1-y)ln(1-\hat{y}) $


Calculating individual partial derivatives, we get,

$\frac{\delta{L}}{\delta{\hat{y}}} = -y(1/\hat{y}) - (1-y) (1/(1-\hat{y}) =\frac{-(y - \hat{y})}{\hat{y}(1 - \hat{y})}$

$ \frac{\delta{\hat{y}}}{\delta{w_1}} = x_1 \hat{y}(1-\hat{y})$

$ \frac{\delta{\hat{y}}}{\delta{w_2}} = x_2 \hat{y}(1-\hat{y})$

$ \frac{\delta{\hat{y}}}{\delta{b}} = \hat{y}(1-\hat{y})$

Full derivative, 

$ \frac{\delta{L}}{\delta{\hat{y}}} \frac{\delta{\hat{y}}}{\delta{w_1}} = -x_1 (y - \hat{y}) $

$ \frac{\delta{L}}{\delta{\hat{y}}} \frac{\delta{\hat{y}}}{\delta{w_2}} = -x_2 (y - \hat{y})$

$ \frac{\delta{L}}{\delta{\hat{y}}} \frac{\delta{\hat{y}}}{\delta{b}} = -(y - \hat{y})$

Finally, updated values will be, (same as linear regression perceptron model)

$ w_1 \longrightarrow w_1 - \alpha (-x_1 (y-\hat{y}))$

$ w_2 \longrightarrow w_2 - \alpha (-x_2 (y-\hat{y}))$

$ b \longrightarrow b - \alpha(-(y-\hat{y}))$