
# Assignment No 5c
###### *Sibt ul Hussain*
----
## Goal

Your goal in this assigment is to implement a Perceptron Classifier.

**Note** Please note that you are allowed to use only those libraries which we have discussed in the class, i.e. numpy, scipy, pandas.

## Submission Instructions
You are required to submit the original notebook file on the Slate (with .ipynb extension), with complete set of outputs. Students failing to do so will get zero marks. 

*Please read each step carefully and understand it fully before proceeding with code writing*

## Plagiarism
Any form of plagiarism will not be tolerated and result in 0 marks.

## For Graphical Debugging:
You can use the [pycharm](https://www.jetbrains.com/pycharm/download/#section=linux) excellent graphical debugging based IDE.



### Tasks

1. Complete the missing functions definitions in file "perceptron.py". You will need to write the functions hypothesis, cost_function and derivative_cost_function. **Please read the function definition before proceeding with code writing**.
2. Complete the missing function definition gradient_descent  in file "optimizer.py"
3. Run the complete notebook & check that you are getting the right results from your classifiers.

In [None]:
%pylab inline
import scipy.stats
from collections import defaultdict  # default dictionary 
plt.style.use('ggplot')
matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)
%load_ext autoreload 
%autoreload 2

In [None]:
import pandas as pd
import tools as t # set of tools for plotting, data splitting, etc..

### Perceptron
Given a set of $m$ $d$-dimensional labelled training examples $X$ and their labels $Y $($Y \in \{-1, +1\})$.
Your goal in this assignment is to implement perceptron classifier. Recall that a perceptron uses the hypothesis $\begin{equation} h_\theta(x) = x^T\theta. \end{equation}$ with the classification rule $sign(h_\theta(x))$. 

In perceptron we try to optimize (minimize) following cost function (without regularization):

 $$\begin{equation} J_\theta = \frac{1}{2m}\sum_{i=1}^m  max(0,-y^{(i)} x^{(i) T}\theta)+\frac{\lambda}{2}\sum_{i=1}^k \theta_k^2\end{equation}$$
 
 
Here $m$ is the number of training exmaples.


We will be adding an extra column to our X input matrix for the offset, then we can write our hypothesis in the form of matrix-vector product. I.e. earlier we were writing our hypothesis as: $h_\theta(x^i)=\theta_0+ x^i *\theta_1$, [*Remember the notation we are using, superscript is being used to represent the example, and subscript is being used for representing the feature, so $x^i_j$ means j-th feature of i-th example in our set*]

Since we can write this expression in the form of dot product, i.e.  $h_\theta(x^i)=x^{(i)T}\theta$

So to simplify the calculations we will append an extra 1 at the start of each example to perform these computations using matrix-vector product.
 
Recall the partial derivative of the cost function wrt $\theta_j$ for a single example will be =

$$
\frac{\partial J}{\partial \theta_j}= \lambda * \theta_j+ \begin{cases}-y\cdot x_j & \text{if $y\cdot x^T \theta <0$}, \\ 0 &
\text{otherwise}.\end{cases}
$$





In [None]:
from perceptron import * 
from preprocessing import * 

#### Create some dummy data for testing
Please read the code carefully and see whats it is doing...

In [None]:
# Create some dummy data for testing

np.random.seed(seed=99)

# make some data up
mean1 = [-3,-3]
mean2 = [2,2]
cov = [[1.0,0.0],[0.0,1.0]] 

#create some points
nexamples=500
x1 = np.random.multivariate_normal(mean1,cov,nexamples/2)
x2 = np.random.multivariate_normal(mean2,cov,nexamples/2)

X=np.vstack((x1,x2))
Y=np.vstack((1*np.ones((nexamples/2,1)),-1*np.ones((nexamples/2,1))))

plt.scatter(x1[:,0],x1[:,1], c='r', s=100)
plt.scatter(x2[:,0],x2[:,1], c='b', s=100)



plt.title("Linear Classification")
plt.xlabel("feature $x_1$")
plt.ylabel("feature $x_2$")

fig_ml_in_10 = plt.gcf()
plt.savefig('linear-class-percep.svg',format='svg')

In [None]:
print X.shape,Y.shape, max(Y),min(Y)

In [None]:
#Scale the features....
preprocess=PreProcessing(X)
X=preprocess.process_features(X)

In [None]:
#Lets append a vector of dummy 1's at the start of X to simplify the calculations...
X=np.hstack((X,np.ones((X.shape[0],1))))

In [None]:
print X.shape,Y.shape

### Create the Classifier Object

In [None]:
#create a perceptron class object
percep=Perceptron(0)

### Let's Check the Derivatives...

In [None]:
#lets check the derivatives of perceptron, 

#Please note that these derivatives of perceptron can fluctuate, due to kink at zero 
#right way of checking it we after derivative, cost function value must be zero..;
from optimizer import *
Optimizer.gradient_check(X,Y,percep.cost_function,percep.derivative_cost_function)

### Training Time

In [None]:
percep.train(X,Y,Optimizer(alpha=0.01)) # your cost function at the minimum must be zero...

In [None]:
#Lets plot the decision boundary...
plt.scatter(x1[:,0],x1[:,1], c='r', s=100)
plt.scatter(x2[:,0],x2[:,1], c='b', s=100)

minx=min(X[:,0])
maxx=max(X[:,0])

y1=(-percep.theta[2]-percep.theta[0]*minx)/percep.theta[1]
y2=(-percep.theta[2]-percep.theta[0]*maxx)/percep.theta[1]
print y1, y2
plt.plot([minx,y1],[maxx,y2], c='g', linewidth=5.0)

plt.title("Linear Classification")
plt.xlabel("feature $x_1$")
plt.ylabel("feature $x_2$")

fig_ml_in_10 = plt.gcf()
plt.savefig('linear-class-temp.svg',format='svg')



In [None]:
npts=10000
model=percep
ax=plt.gca()
x0spr = max(X[:,0])-min(X[:,0])
x1spr = max(X[:,1])-min(X[:,1])

tx=np.random.rand(npts,2)
tx[:,0] = tx[:,0]*x0spr + min(X[:,0])
tx[:,1] = tx[:,1]*x1spr + min(X[:,1])

tx=np.hstack((tx,np.ones((tx.shape[0],1))))
print tx.shape
cs= model.predict(tx)
print cs, np.unique(cs)
ax.scatter(tx[:,0],tx[:,1],c=cs, alpha=.35)

ax.hold(True)
ax.scatter(X[:,0],X[:,1],
             c=list(map(lambda x:'r' if x==1 else 'lime',Y)), 
             linewidth=0,s=25,alpha=1)
ax.set_xlim([min(X[:,0]), max(X[:,0])])
ax.set_ylim([min(X[:,1]), max(X[:,1])])

### Testing on IRIS dataset

In [None]:
#load the data set
data=pd.read_csv('./iris.data')
data.columns=['SepalLength','SepalWidth','PetalLength','PetalWidth','Class']
print data.describe()

In [None]:
# Get your data in matrix
X=np.asarray(data[['SepalLength','SepalWidth','PetalLength','PetalWidth']].dropna())
Y=np.asarray(data['Class'].dropna())
print " Data Set Dimensions=", X.shape, " True Class labels dimensions", Y.shape   

In [None]:
preprocess=PreProcessing(X)
X=preprocess.process_features(X)

In [None]:
Y[Y=='Iris-virginica']='Iris-versicolor'
print Y, len(Y), np.unique(Y)

In [None]:
Y[Y=='Iris-versicolor']=-1
Y[Y=='Iris-setosa']=+1
#Lets append a vector of dummy 1's at the start of X to simplify the calculations...
X=np.hstack((X,np.ones((X.shape[0],1))))

In [None]:
print X

In [None]:
percep=Perceptron(lembda=0.00)
feat=[0,1,4]

In [None]:
print Y

In [None]:
# see the documentation of split_data in tools for further information...
Xtrain,Ytrain,Xtest,Ytest=t.split_data(X,Y)
Ytrain=Ytrain.reshape(len(Ytrain),1)
Ytest=Ytest.reshape(len(Ytest),1)
print " Training Data Set Dimensions=", Xtrain.shape, "Training True Class labels dimensions", Ytrain.shape   
print " Test Data Set Dimensions=", Xtest.shape, "Test True Class labels dimensions", Ytest.shape   


In [None]:
percep.train(Xtrain,Ytrain,Optimizer(alpha=0.0001)) # your cost function at the minimum must be zero...

In [None]:
#Lets test it on the set of unseen examples...
pclasses=percep.predict(Xtest)

In [None]:
#Lets see how good we are doing, by finding the accuracy on the test set..
print np.sum(pclasses==Ytest)
print "Accuracy = ", np.sum(pclasses==Ytest)/float(Ytest.shape[0])
t.print_confusion_matrix(pclasses.ravel(),Ytest.ravel())

In [None]:
from nose.tools import assert_greater_equal
acc = np.sum(pclasses==Ytest)/float(Ytest.shape[0])
assert_greater_equal(acc, 0.97)