In [1]:
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt

## Basic Perceptron
Below code reprsents working of a perceptron (with 1 Nueron) using **Gradient Descent Algorithm**.

But will apply a sigmoid activation function at out neuron (Green Circle)
<img src="Basic_Preceptron.png">


## Sigmoid function

$g(z) = \frac{1}{1+{e}^{-z}}$

### Derivative of sigmoid function

$g^{\prime}(z) = g(z)(1-g(z)$

<img src="sigmoid.png">

In [14]:
# define activation function sigmoid
def sigmoid(x):
    return 1/(1+np.exp(-x))

# define derivative of sigmoid function
def sigmoid_deriv(x):
    return x*(1-x)

## Gradient Descent Explained (Maths) using sigmoid activation function

$\hat{y} = sigmoid(w_1x_1 + w_2x_2) $

### Cost

$C = \frac{1}{2}(y - \hat{y})^2$

$C = \frac{1}{2}(y - sigmoid(w_1x_1 - w_2x_2))^2$

### Getting Gradients
$\frac{dC}{dw_1} = \frac{1}{2}*2(y - sigmoid(w_1x_1 - w_2x_2))\frac{d\,sigmoid(w_1x_1 - w_2x_2)}{dw_1} $

$\frac{dC}{dw_1} = \frac{1}{2}*2(y - \hat{y})\frac{d\,\hat{y}}{dw_1}$

$\frac{dC}{dw_1} = \frac{1}{2}*2(y - \hat{y})(\hat{y})(1 - \hat{y})(-x_1)$

similarly 

$\frac{dC}{dw_2} = \frac{1}{2}*2(y - \hat{y})(\hat{y})(1 - \hat{y})(-x_2)$
### Updating weights
$w_1 \to w_1 - lr*\frac{dC}{dw_1} $  where : lr is learning rate or step size

$w_2 \to w_2 - lr*\frac{dC}{dw_2}$

### Calculate new $\hat{y}$ using new weights
$\hat{y} = sigmoid(w_1x_1 + w_2x_2) $

**Repeat the above process till we reach global minima of cost function i.e. minimum cost.**

## Generalized Perceptron using Sigmoid Activation Function

Below model is same as we did in previous model exercise but we apply sigmoid activation function on neuron and while doing gradient descent we use the derivation of sigmoid (explained above)

In [17]:
def nn_batch_generalized(x,y,w,lr,n_epoch):
    cost_list = []  # initialize list to store epochs and cost
    y_hat = sigmoid(np.dot(x,w))  # calculate y_hat for first iteration note activation function sigmoid applied
    for epoch in range(n_epoch):
        dcostdw = 2*0.5*np.dot(-x.T,np.multiply((y-y_hat),sigmoid_deriv(y_hat)))
        w = w - lr*dcostdw
        y_hat = sigmoid(np.dot(x,w))  # new y_hat using updated weights
        cost = 0.5*np.sum(y - y_hat)**2
        cost_list.append([epoch,cost])
        if epoch%100==0:
            print("epoch :{:d} cost:{:f}".format(epoch,cost))
        if cost <= 0.001:
            print("epoch :{:d} cost:{:f}".format(epoch,cost))
            break

    cost_list = pd.DataFrame(cost_list,columns=['epoch','cost'])
    return w,cost_list

#### Try using different values in x 

In [25]:
x = np.array([[10,10,20],
              [8,24,25],
              [30,5,23],
              [18,25,28],
              [5,2,9]])

y = [[0],[0],[1],[1],[0]]
w = np.zeros((x.shape[1],1))
lr = 0.01
n_epoch = 10000
w,cost_list = nn_batch_generalized(x,y,w,lr,n_epoch)

epoch :0 cost:0.337389
epoch :51 cost:0.000035


In [27]:
print "y_hat_final:\n{:s}".format(sigmoid(np.dot(x,w)))
print "trained weights :\n{:s}".format(w)
# values >=0.5 represent 1 else 0 # refer sigmoid plot above

y_hat_final:
[[ 0.14114532]
 [ 0.10988658]
 [ 0.93456342]
 [ 0.53576365]
 [ 0.27025969]]
trained weights :
[[ 0.30264395]
 [ 0.1328052 ]
 [-0.30801506]]
