Support Vector Machines - is a model that can do both classification and 
regression. 

<img src="svm1.png" width=300, height=200>

References: https://commons.wikimedia.org/wiki/File:SVM_Example_of_Hyperplanes.png

Let's look at another example

<img src="svm2.png" width=300, height=200>

https://commons.wikimedia.org/wiki/File:SVM_margin.png

Support vectors are the points that lie close to the decision boundary. 

The dataset will comprise of $(x_i, y_i)$ where $y_i$ is either 1 or -1 
that indicates the class that $x_i$ belongs.

Our goal is to find a hyperplane that separates the two classes with maximum margin. This hyperplane can be represented by 
$\vec{w}\vec{x} - \vec{b} = 0.$

Using the training dataset, we compute $\vec{w}$ and $\vec{b}.$ 

Any point that lies on or above 

$\vec{w}\vec{x} - \vec{b} = 1$ will be classified as class 1 
and any point thay lies on or below

$\vec{w}\vec{x} - \vec{b} = -1$ will be classified as 
class 2.

The distance between the two hyperplanes is $\frac{2}{||\vec{w}||},$ we want to maximize this which is same as minimizing $||\vec{w}||.$ 

In Hard-margin, we are very particular about the margin. No data points can lie within the margin. So hard-margins are narrow.

In Soft-margin, data points can lie within the margin. Soft-margins are wide. 

The loss function for SVM is defined by

$max(0, 1-y_i(\vec{w}.\vec{x_i} - b))$ 


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
# https://archive.ics.uci.edu/ml/datasets/banknote+authentication

df = pd.read_csv("bill_authentication.csv")
df1 = df.copy(deep=True)

In [3]:
df.shape

(1372, 5)

In [4]:
df.head()

Unnamed: 0,Variance,Skewness,Curtosis,Entropy,Class
0,3.6216,8.6661,-2.8073,-0.44699,0
1,4.5459,8.1674,-2.4586,-1.4621,0
2,3.866,-2.6383,1.9242,0.10645,0
3,3.4566,9.5228,-4.0112,-3.5944,0
4,0.32924,-4.4552,4.5718,-0.9888,0


In [5]:
df["Class"].unique()

array([0, 1], dtype=int64)

Important Links to understand skewness, curtosis (kurtosis), entropy

https://www.mathsisfun.com/data/skewness.html

https://www.statology.org/can-kurtosis-be-negative/
    
https://towardsdatascience.com/entropy-is-a-measure-of-uncertainty-e2c000301c2c

In [6]:
xc = df.drop('Class', axis=1)
yc = df['Class']

In [7]:
from sklearn.model_selection import train_test_split
xc_train, xc_test, yc_train, yc_test = train_test_split(xc, yc, test_size = 0.20)

In [8]:
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.svm import SVC

from sklearn import metrics

In [32]:
svclassifier = SVC(kernel='linear') # creating an object
svclassifier.fit(xc_train, yc_train) # fiting the data to the model

yc_pred = svclassifier.predict(xc_test)

print(confusion_matrix(yc_test,yc_pred))
print(classification_report(yc_test,yc_pred))



# Model Accuracy, how often is the classifier correct?
print(xc_train.shape)
print(yc_train.shape)
print("Accuracy:", metrics.accuracy_score(yc_test, yc_pred))

[[148   1]
 [  1 125]]
              precision    recall  f1-score   support

           0       0.99      0.99      0.99       149
           1       0.99      0.99      0.99       126

    accuracy                           0.99       275
   macro avg       0.99      0.99      0.99       275
weighted avg       0.99      0.99      0.99       275

(1097, 4)
(1097,)
Accuracy: 0.9927272727272727


In [10]:
# building a SVC with kernel = "rbf"

svc2 = SVC() # creating an object
svc2.fit(xc_train, yc_train) # fiting the data to the model

yc_pred = svc2.predict(xc_test)

print(confusion_matrix(yc_test,yc_pred))
print(classification_report(yc_test,yc_pred))



# Model Accuracy, how often is the classifier correct?
print("Accuracy:", metrics.accuracy_score(yc_test, yc_pred))

[[145   0]
 [  0 130]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00       145
           1       1.00      1.00      1.00       130

    accuracy                           1.00       275
   macro avg       1.00      1.00      1.00       275
weighted avg       1.00      1.00      1.00       275

Accuracy: 1.0


In [11]:
# building a SVC with kernel = "polynomial" with degree 3


svc3 = SVC(kernel="poly", degree=3) # creating an object
svc3.fit(xc_train, yc_train) # fiting the data to the model

yc_pred = svc3.predict(xc_test)

print(confusion_matrix(yc_test,yc_pred))
print(classification_report(yc_test,yc_pred))



# Model Accuracy, how often is the classifier correct?
print("Accuracy:", metrics.accuracy_score(yc_test, yc_pred))

[[140   5]
 [  0 130]]
              precision    recall  f1-score   support

           0       1.00      0.97      0.98       145
           1       0.96      1.00      0.98       130

    accuracy                           0.98       275
   macro avg       0.98      0.98      0.98       275
weighted avg       0.98      0.98      0.98       275

Accuracy: 0.9818181818181818


### Kernel Trick


References: http://www.eric-kim.net/eric-kim-net/posts/1/kernel_trick.html

https://prateekvjoshi.com/2012/09/01/kernel-functions-for-machine-learning/

<img src="kernel1.png" width=500 height=400>

#### Different kernel functions

References: https://www.slideshare.net/okamoto-laboratory/families-of-triangular-norm-based-kernel-function-and-its-application-to-kernel-kmeans-conference

<img src="kernel2.png" width=400, height=300>

When to use which kernel?

Use linear SVM for linear problems and non-linear kernels such as RBF for non-linear data. 

Let us consider SVM with kernel trick

In [12]:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

# Assign colum names to the dataset
colnames = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']

# Read dataset to pandas dataframe
irisdata = pd.read_csv(url, names=colnames)

print(irisdata.columns)

Index(['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class'], dtype='object')


In [13]:
print(irisdata.shape)

(150, 5)


In [14]:
print(irisdata.head())

   sepal-length  sepal-width  petal-length  petal-width        Class
0           5.1          3.5           1.4          0.2  Iris-setosa
1           4.9          3.0           1.4          0.2  Iris-setosa
2           4.7          3.2           1.3          0.2  Iris-setosa
3           4.6          3.1           1.5          0.2  Iris-setosa
4           5.0          3.6           1.4          0.2  Iris-setosa


In [15]:
print(irisdata['Class'].unique())

['Iris-setosa' 'Iris-versicolor' 'Iris-virginica']


In [16]:
irisdata1 = irisdata.copy(deep=True)


In [17]:
c = {'Iris-setosa':0, 'Iris-versicolor':1, 'Iris-virginica':2}

#irisdata1.Class = irisdata1.Class.apply(lambda x: c[x])

irisdata.Class = [c[item] for item in irisdata.Class] # list comprehension 

In [18]:
print(irisdata.Class[0:2])
print(irisdata.head())

0    0
1    0
Name: Class, dtype: int64
   sepal-length  sepal-width  petal-length  petal-width  Class
0           5.1          3.5           1.4          0.2      0
1           4.9          3.0           1.4          0.2      0
2           4.7          3.2           1.3          0.2      0
3           4.6          3.1           1.5          0.2      0
4           5.0          3.6           1.4          0.2      0


In [19]:
x = irisdata.drop('Class', axis=1)
y = irisdata['Class']

In [20]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.20, random_state=42)

Links to understand $gamma,$ and degree

https://scikit-learn.org/stable/modules/svm.html

In [21]:
# polynomial kernel

from sklearn.svm import SVC
svclassifier = SVC(kernel='poly', degree=2, gamma='auto')
svclassifier.fit(x_train, y_train)

SVC(degree=2, gamma='auto', kernel='poly')

In [22]:
y_pred = svclassifier.predict(x_test)

In [23]:
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

# Model Accuracy, how often is the classifier correct?
print("Accuracy:", metrics.accuracy_score(y_test, y_pred))

[[10  0  0]
 [ 0  8  1]
 [ 0  0 11]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      0.89      0.94         9
           2       0.92      1.00      0.96        11

    accuracy                           0.97        30
   macro avg       0.97      0.96      0.97        30
weighted avg       0.97      0.97      0.97        30

Accuracy: 0.9666666666666667


In [24]:
# RBF kernel

from sklearn.svm import SVC
svclassifier = SVC(kernel='rbf', gamma='auto')
svclassifier.fit(x_train, y_train)

SVC(gamma='auto')

In [25]:
y_pred = svclassifier.predict(x_test)

In [26]:
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

print("Accuracy:", metrics.accuracy_score(y_test, y_pred))

[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

Accuracy: 1.0


#### Hyperparameters tuning

In [27]:
# https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

from sklearn.model_selection import GridSearchCV

parameters = {'kernel':['linear', 'rbf', 'poly'], 'C':[1, 5, 10]}

svc = SVC()

clf = GridSearchCV(svc, param_grid = parameters, cv = 3, verbose=True, n_jobs=-1)

final_clf = clf.fit(xc_train, yc_train)


print(sorted(final_clf.cv_results_.keys()))

Fitting 3 folds for each of 9 candidates, totalling 27 fits
['mean_fit_time', 'mean_score_time', 'mean_test_score', 'param_C', 'param_kernel', 'params', 'rank_test_score', 'split0_test_score', 'split1_test_score', 'split2_test_score', 'std_fit_time', 'std_score_time', 'std_test_score']


In [28]:
print(final_clf.best_estimator_)
print(final_clf.best_params_)

SVC(C=5)
{'C': 5, 'kernel': 'rbf'}


In [29]:
# Using the best parameters from the above GridSearchCV
# we build another svc model

from sklearn.metrics import classification_report, confusion_matrix
from sklearn.svm import SVC

svc = SVC(kernel = 'rbf', C=5)
svc.fit(xc_train, yc_train) # fiting the data to the model

svc.fit(xc_train, yc_train) # fiting the data to the model
yc_pred = svc.predict(xc_test)
print(confusion_matrix(yc_test,yc_pred))
print(classification_report(yc_test,yc_pred))

print("Accuracy:", metrics.accuracy_score(y_test, y_pred))

[[145   0]
 [  0 130]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00       145
           1       1.00      1.00      1.00       130

    accuracy                           1.00       275
   macro avg       1.00      1.00      1.00       275
weighted avg       1.00      1.00      1.00       275

Accuracy: 1.0


In [30]:
"""
In-class activity: consider Titanic dataset and apply SVM along with GridSearchCV to determine the best parameteres 
for kernel and C. 
"""

'\nIn-class activity: consider Titanic dataset and apply SVM along with GridSearchCV to determine the best parameteres \nfor kernel and C. \n'