**Name:** Byaravalli Arun Suhag

**EID:** 53265857

# CS4487 - Tutorial 10
## Stochastic Gradient Descent

In this tutorial you will use stochastic gradient descent to train classifiers quickly.

First we need to initialize Python.  Run the below cell.

In [1]:
%matplotlib inline
import IPython.core.display         
# setup output image format (Chrome works best)
IPython.core.display.set_matplotlib_formats("svg")
import matplotlib.pyplot as plt
import matplotlib
from numpy import *
from sklearn import *
import glob
import os
import IPython.utils.warn as warn
import cPickle, gzip, numpy
import time

random.seed(100)
rbow = plt.get_cmap('rainbow')



We will use a larger version of the MNIST digits dataset.  Download "mnist.pkl.gz" from Canvas and put it in the same directory as this ipynb file. The training set has 50,000 images and the test set has 10,000 images

In [2]:
# Load the dataset
f = gzip.open('mnist.pkl.gz', 'rb')
train_set, valid_set, test_set = cPickle.load(f)
f.close()

trainX,trainY = train_set
valX,valY = valid_set
testX,testY = test_set

print trainX.shape
print testX.shape

(50000L, 784L)
(10000L, 784L)


Now we will train a linear SVM using the standard algorithm, and time how long it takes.  Run the below code.  It may take a few minutes to finish.

In [3]:
starttime = time.clock()
clfo = svm.LinearSVC(C=1.0)
clfo.fit(trainX, trainY)
print "elapsed time (sec):", time.clock() - starttime

elapsed time (sec): 67.0571290864


Here are the training and test errors.

In [4]:
Ypred = clfo.predict(trainX)
trainerr_svm = mean(Ypred != trainY)

Ypred = clfo.predict(testX)
testerr_svm = mean(Ypred != testY)
print "SVM errors:", trainerr_svm, testerr_svm

SVM errors: 0.07384 0.0846


## SGD Classifier
Now train a SGD classifier using the SVM loss and L2 penalty.  Time the amount of time it takes to fit the classifier (use the `fit` function).  Calculate the training and test error of the SGD classifier.  Use `alpha=0.1`.  Remember, alpha = 1/C.

In [5]:
### INSERT YOUR CODE HERE

In [6]:
starttime = time.clock()
clf = linear_model.SGDClassifier(
    loss='hinge',  # SVM loss (change to 'log' for logistic regression)
    penalty='l2',  # standard penalty (change to 'l1' for feature selection)
    alpha=1.0,     # penalty parameter: C=1/alpha 
    average=True)  # use a running average for classifier weights
clf.fit(trainX, trainY)
print "elapsed time (sec):", time.clock() - starttime

elapsed time (sec): 5.87798202469


In [7]:
Ypred = clf.predict(trainX)
trainerr_sgdsvm = mean(Ypred != trainY)

Ypred = clf.predict(testX)
testerr_sgdsvm = mean(Ypred != testY)
print "SGDClassifier SVM errors:", trainerr_sgdsvm, testerr_sgdsvm

SGDClassifier SVM errors: 0.21466 0.2007


_How does the speed and the accuracy compare with the original SVM?_
- **The training with SGD using SVM and L2 penalty finishes in 5.8s and on the other hand svm alone takes around 67.1s. Thus we can clearly see that the former completes training faster**

## Parallel SGD Classifier
Now train a parallel SGD classifier using IPython clusters, and measure the fitting time.  Use the same value for alpha as your SGD Classifier. Try different batch sizes (B) and number of processes (K).  Calculate the training and test error.

First start the IP clusters using the "IPython Clusters" tab in Jupyter.  If the tab says "Clusters tab now provided by IPython parallel", then run `ipcluster nbextension enable` to enable it.  Alternatively, you can run  `ipcluster start -n 4` on the command line to directly start 4 clients.

In [8]:
# load the client interface
import ipyparallel

clients = ipyparallel.Client()
clients.block = True   # wait for calculations to finish
print clients.ids      # client process ids

# get the load-balanced scheduler
lview = clients.load_balanced_view()

[0, 1, 2, 3]


In [9]:
%%px
# load libraries on all clients
from numpy import *
from sklearn import *

[stderr:0] 
[stderr:1] 
[stderr:2] 
[stderr:3] 


In [10]:
### INSERT YOUR CODE HERE

In [11]:
def par_sgd(data, param):
    # run SGD on a dataset
    clf = linear_model.SGDClassifier(
        loss='hinge', 
        penalty='l2',
        alpha=param['alpha'],
        n_iter=10,       # number of epochs
        average=False)  # don't use averaging, since we will do it later

    clf.fit(data['trainX'], data['trainY'])
    return clf

In [12]:
def combine_sgd(clfs):
    # combine sgd classifiers
    
    # make a copy of the first one
    import copy
    clfout = copy.deepcopy(clfs[0])
    K = len(clfs)

    # add all the remaining ones to it
    for i in range(1,K):
        clfout.coef_ += clfs[i].coef_
        clfout.intercept_ += clfs[i].intercept_

    # take the average
    clfout.coef_ /= K
    clfout.intercept_ /= K

    return clfout

In [13]:
param = {'alpha': 0.1}

K = 10           # use 10 processes
N = len(trainX)  # dataset size
B = int(0.25*N)  # batch size

random.seed(612)

starttime = time.clock()

# split data into batches
data_batches = []
for i in range(K):
    rp = random.permutation(N)
    trainX_shuffle = trainX[rp[range(B)]]
    trainY_shuffle = trainY[rp[range(B)]]
    data_batches.append({'trainX': trainX_shuffle, 'trainY': trainY_shuffle})

# run par_sgd on each batch of data
lview.block = True
clfs = lview.map(par_sgd, data_batches, [param]*K)

# without load-balanced view (for testing)
#clfs = map(par_sgd, data_batches, [param]*K)

# combine classifiers
clf = combine_sgd(clfs)  

# training error
Ypred = clf.predict(trainX)
err = mean(Ypred != trainY)
print "final classifier error:", err

# compare with individual classifiers
errs = []
for myclf in clfs:
    mypred = myclf.predict(trainX)
    errs.append( mean(mypred != trainY) )

print "individual clf errors:"
for i, err in enumerate(errs):
    print "Classifier %-2s: Error: %-10s" %(str(i + 1), str(err))
    
print "elapsed time (sec):", time.clock() - starttime

final classifier error: 0.13326
individual clf errors:
Classifier 1 : Error: 0.13296   
Classifier 2 : Error: 0.13704   
Classifier 3 : Error: 0.1338    
Classifier 4 : Error: 0.13704   
Classifier 5 : Error: 0.1348    
Classifier 6 : Error: 0.13484   
Classifier 7 : Error: 0.13916   
Classifier 8 : Error: 0.13298   
Classifier 9 : Error: 0.13436   
Classifier 10: Error: 0.1332    
elapsed time (sec): 46.2383533827


_How does the speed and the accuracy compare with the original SVM?_
- **The error for original SVM is less than the later but the training time for the original SVM is more than the later.**
- **Original SVM : traing time= 67.1s  and error = 0.07**
- **Parallel SGD classifier : traing time= 46s  and error = 0.13326**

Thus we can clearly see from the above experiment that their is a trade-off between time and accuracy 
