# ELE435/535 LAB 8: Kernel SVM
## Version: 2022

## Objectives:  

We will use a linear SVM, kernel SVM, and logistic regression to classify MNIST digits into c=10 classes. The best performance among these methods will lower bound what we expect to achive using a Neural Network. Hence it will provide a benchmark for the next round of methods: Multinomial Softmax Regression, a one hidden layer neural network, a Multilayer feedforward neural network, and a convolutional neural network. 
  
The SVM and logistic regression a naturally binary classifiers. But the methods can be used to perform multi-class classification using the one-versus-the-rest method. As the name suggests this trains $c$ binary classifiers, each of which distinguishes one class from the rest. The final classification is made by resolving conflicting classifications (no need to go into the details).

In [11]:
# ! python -m pip install --upgrade pip
# ! python -m pip install scikit-learn-intelex

from sklearnex import patch_sklearn
patch_sklearn()
import matplotlib.pyplot as plt
import numpy as np
import sklearn
from time import time
import datetime
%matplotlib inline

# from google.colab import files
# uploaded = files.upload()

Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)


## Kernel-SVM on MNIST

**1.1)** First, import the provided subsets of the MNIST dataset:  
MNISTcwtrain1000.npy and  MNISTcwtest100.npy

Normalize the scalar data values to the range [0,1].


In [12]:
train_data = np.load('MNISTcwtrain1000.npy')
train_data = train_data.astype(dtype='float64')
test_data = np.load('MNISTcwtest100.npy')
test_data = test_data.astype(dtype='float64')

train_data = train_data/255.0
test_data = test_data/255.0
print('training data: ', train_data.shape)
print('testing data: ', test_data.shape)

training data:  (784, 10000)
testing data:  (784, 1000)


**1.2)** The SVM can be used to classify multiclass data. One-versus-the-rest is the default when you call Scikit to learn an SVM with multi-class labels.

Train a one-versus-rest SVM using a linear kernel and C=0.1.   
Report classification accuracy on the training and testing data. (Use sklearn's built-in commands for training and testing)

In [13]:
# This code is provided

from sklearn import svm

train_label = np.ones(10000)
train_label[0:1000] *= 0
train_label[2000:3000] *= 2
train_label[3000:4000] *= 3
train_label[4000:5000] *= 4
train_label[5000:6000] *= 5
train_label[6000:7000] *= 6
train_label[7000:8000] *= 7
train_label[8000:9000] *= 8
train_label[9000:10000] *= 9

test_label = np.ones(1000)
test_label[0:100] *= 0
test_label[200:300] *= 2
test_label[300:400] *= 3
test_label[400:500] *= 4
test_label[500:600] *= 5
test_label[600:700] *= 6
test_label[700:800] *= 7
test_label[800:900] *= 8
test_label[900:1000] *= 9

start = time()
#-----------------------------------
# Your code here

penalty = 0.1
svm_classifier = svm.SVC(kernel='linear',C=penalty)
svm_classifier.fit(train_data.T, train_label)

predict_train = svm_classifier.predict(train_data.T)
acc_train = np.count_nonzero(predict_train - train_label == 0) / 10000
predict_test = svm_classifier.predict(test_data.T)
acc_test = np.count_nonzero(predict_test - test_label == 0) / 1000

#------------------------------------
# This code is provided
end = time()
print('Training Accuracy: ' + str(acc_train))
print('Testing Accuracy: ' + str(acc_test))
print('Estimated running time: ' + str(datetime.timedelta(seconds=end - start)))

Training Accuracy: 0.9739
Testing Accuracy: 0.921
Estimated running time: 0:00:18.434515


**1.3)** Now, do the same using an SVM with 'rbf' (Gaussian) kernel. Search over C in the interval [0.005,0.1] and 'gamma' in the interval [0.005, 0.1] and report the best test accuracy (use sklearn's built-in commands). Hint: In order to get a feeling for selecting an appropriate value for gamma, take a look at http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html.

In [25]:
# This code is provided
start = time()

acc_train_rbf = 0
acc_test_rbf = 0
best_c = 0
best_gam = 0
#--------------------------------------------
# Your code here
for penalty in np.geomspace(0.005, 0.1, 3):
  for g in np.geomspace(0.005, 0.1, 3):

    svm_classifier = svm.SVC(kernel='rbf',C=penalty, gamma=g)
    svm_classifier.fit(train_data.T, train_label)

    predict_train = svm_classifier.predict(train_data.T)
    predict_test = svm_classifier.predict(test_data.T)
    
    acc_train_case = np.count_nonzero(predict_train - train_label == 0) / 10000
    acc_test_case = np.count_nonzero(predict_test - test_label == 0) / 1000

    if (acc_train_rbf < acc_train_case):
      acc_train_rbf = acc_train_case
    
    if (acc_test_rbf < acc_test_case):
      acc_test_rbf = acc_test_case
      best_c = penalty
      best_gam = g


#---------------------------------------------
# This code is provided
end = time()

print('Training Accuracy: ' + str(acc_train_rbf))
print('Testing Accuracy: ' + str(acc_test_rbf))
print('C and gamma that gave best testing accuracy: ' + str(best_c) + ', ' + str(best_gam))
print('Estimated running time:' + str(datetime.timedelta(seconds=end - start)))

Training Accuracy: 0.9492
Testing Accuracy: 0.915
C and gamma that gave best testing accuracy: 0.1, 0.022360679774997897
Estimated running time:0:09:32.423270


**1.3)** Now, do the same using l2 regularized logistic regression. For multi-class data, scikit learn defaults to one-versus-the-rest classification. The regularization parameter $C$ plays the role of $1/\lambda.$ Smaller values of $C$ mean stronger regularization. Search over three or fours values in the interval [0.01, 1] to find the best testing performance.

In [28]:
# This code is provided
from sklearn.linear_model import LogisticRegression
start = time()

acc_train_log = 0
acc_test_log = 0
best_c = 0
#--------------------------------------------
# Your code here
for penalty in np.geomspace(0.01, 1, 4):
  logreg_classifier = LogisticRegression(C=penalty, max_iter=500)
  logreg_classifier.fit(train_data.T, train_label)

  predict_train = logreg_classifier.predict(train_data.T)
  predict_test = logreg_classifier.predict(test_data.T)

  acc_train_case = np.count_nonzero(predict_train - train_label == 0) / 10000
  acc_test_case = np.count_nonzero(predict_test - test_label == 0) / 1000

  if (acc_train_log < acc_train_case):
    acc_train_log = acc_train_case
  
  if (acc_test_log < acc_test_case):
    acc_test_log = acc_test_case
    best_c = penalty
#---------------------------------------------
# This code is provided
end = time()
print('Training Accuracy: ' + str(acc_train_log))
print('Testing Accuracy: ' + str(acc_test_log))
print('C that gave best testing accuracy: ' + str(best_c))
print('Estimated running time:' + str(datetime.timedelta(seconds=end - start)))

Training Accuracy: 0.9694
Testing Accuracy: 0.895
C that gave best testing accuracy: 0.046415888336127774
Estimated running time:0:00:25.057757


**1.4)** With what accuracy can the best of the above clasifiers predict the classes of the 1,000 test images? This is the benchmark to beat.


ANS: 92%