# SVM (Support Vector Machine) 

# Coding Examples

In [8]:
# doing the minimum necessary imports
# more modules would be imported as and when needed

import pandas as pd  
import numpy as np  
import matplotlib.pyplot as plt  
%matplotlib inline

# reading data from CSV file. 
# reading bank currency note data into pandas dataframe.
bankdata = pd.read_csv("bill_authentication.csv")  

# Exploratory Data Analysis
print(bankdata.shape)  
print("------------")
print(bankdata.head()) 

(1372, 5)
------------
   Variance  Skewness  Curtosis  Entropy  Class
0   3.62160    8.6661   -2.8073 -0.44699      0
1   4.54590    8.1674   -2.4586 -1.46210      0
2   3.86600   -2.6383    1.9242  0.10645      0
3   3.45660    9.5228   -4.0112 -3.59440      0
4   0.32924   -4.4552    4.5718 -0.98880      0


In [9]:
# Data Preprocessing
# Data preprocessing involves 
# (1) Dividing the data into attributes and labels and 
# (2) dividing the data into training and testing sets.

# To divide the data into attributes and labels, do :
X = bankdata.drop('Class', axis=1)  
y = bankdata['Class']  

# the final preprocessing step is to divide data into training and test sets
from sklearn.model_selection import train_test_split  
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state=10)


# Training the Algorithm. Here we would use simple SVM , 
# i.e linear SVM
from sklearn.svm import SVC  

svclassifier = SVC(kernel='linear')  # classifying linear data
# kernel can take many values like
# Gaussian, polynomial, sigmoid, or computable kernel
# default kernel = rbf ( Radial Basis Function)

svclassifier.fit(X_train, y_train)  

# Making Predictions
y_pred = svclassifier.predict(X_test)

# Evaluating the Algorithm
from sklearn.metrics import classification_report, confusion_matrix  

print(confusion_matrix(y_test,y_pred))  

print(classification_report(y_test,y_pred)) 

# Remember : for evaluating classification-based ML algo use  
# confusion_matrix, classification_report and accuracy_score.
# 

[[151   1]
 [  1 122]]
              precision    recall  f1-score   support

           0       0.99      0.99      0.99       152
           1       0.99      0.99      0.99       123

    accuracy                           0.99       275
   macro avg       0.99      0.99      0.99       275
weighted avg       0.99      0.99      0.99       275



 # Applying SVM over non-linear data
 
In case of non-linearly separable data, the simple SVM algorithm cannot be used. Rather, a modified version of SVM, called Kernel SVM, is used.

Basically, the kernel SVM projects the non-linearly separable data in lower dimensions to linearly separable data in higher dimensions in such a way that data points belonging to different classes are allocated to different dimensions. Again, there is complex mathematics involved in this, but you do not have to worry about it in order to use SVM. Rather we can simply use Python's Scikit-Learn library to implement and use the kernel SVM.

Implementing Kernel SVM with Scikit-Learn is similar to the simple SVM. In this section, we will use the famous iris dataset to predict the category to which a plant belongs based on four attributes: sepal-width, sepal-length, petal-width and petal-length.

We will try all three possible kernels; namely polynomial, Gaussian, and sigmoid kernels. 

In [3]:
import seaborn as sns
import numpy as np
import pandas as pd  
import matplotlib.pyplot as plt
from sklearn import svm, datasets

# import some data to play with
irisdata = sns.load_dataset('iris')
irisdata.head()  # have a look at the attributres(=> X) and Labels(=> y)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [4]:
# Preprocessing data
X = irisdata.drop('species', axis=1)  
y = irisdata['species']

# Train Test Split
from sklearn.model_selection import train_test_split  
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state=10)  

# Training the Algorithm
To train the kernel SVM, we use the same SVC class of the Scikit-Learn's svm library.

We will implement polynomial, Gaussian, and sigmoid kernels to see which one works better for our problem.

# 1. Polynomial Kernel

See this https://slideplayer.com/slide/9163126/27/images/8/Graphs+of+Polynomial+Functions.jpg

In the case of polynomial kernel, you also have to pass a value for the degree parameter of the SVC class. This basically is the degree of the polynomial. Take a look at how we can use a polynomial kernel to implement kernel SVM:

In [5]:
from sklearn.svm import SVC  
svclassifier = SVC(kernel='poly', degree=8, gamma='auto')  
# gamma is optional. But it gives a FutureWarning. To avoid it , specify
# gamma as 'auto' or 'scale'

svclassifier.fit(X_train, y_train)

# Making Predictions
# Now once we have trained the algorithm, 
# the next step is to make predictions on the test data.
y_pred = svclassifier.predict(X_test)  


# Evaluating the Algorithm
from sklearn.metrics import classification_report, confusion_matrix  
print(confusion_matrix(y_test, y_pred))  
print(classification_report(y_test, y_pred))

# Note : Note the misclassification in 'virginica' species

[[10  0  0]
 [ 0 12  1]
 [ 0  0  7]]
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       1.00      0.92      0.96        13
   virginica       0.88      1.00      0.93         7

    accuracy                           0.97        30
   macro avg       0.96      0.97      0.96        30
weighted avg       0.97      0.97      0.97        30



# 2. Gaussian Kernel

To use Gaussian kernel, you have to specify 'rbf' as value for the Kernel parameter of the SVC class.

In [6]:
from sklearn.svm import SVC  
svclassifier = SVC(kernel='rbf', gamma='auto')  
svclassifier.fit(X_train, y_train) 

# Prediction and Evaluation
y_pred = svclassifier.predict(X_test)  

from sklearn.metrics import classification_report, confusion_matrix  
print(confusion_matrix(y_test, y_pred))  
print(classification_report(y_test, y_pred))  

# Note : Note the best performance thats 100% precise

[[10  0  0]
 [ 0 13  0]
 [ 0  0  7]]
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       1.00      1.00      1.00        13
   virginica       1.00      1.00      1.00         7

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



# 3. Sigmoid Kernel
Finally, let's use a sigmoid kernel for implementing Kernel SVM. 
To use the sigmoid kernel, you have to specify 'sigmoid' as value for the kernel parameter of the SVC class.Take a look at the following script:  

In [7]:
from sklearn.svm import SVC  
svclassifier = SVC(kernel='sigmoid', gamma='auto')  
svclassifier.fit(X_train, y_train)

# Prediction and Evaluation
y_pred = svclassifier.predict(X_test)  

from sklearn.metrics import classification_report, confusion_matrix  
print(confusion_matrix(y_test, y_pred))  
print(classification_report(y_test, y_pred))

# Note : Note the very poor perfomance from Sigmoid kernel

[[ 0  0 10]
 [ 0  0 13]
 [ 0  0  7]]
              precision    recall  f1-score   support

      setosa       0.00      0.00      0.00        10
  versicolor       0.00      0.00      0.00        13
   virginica       0.23      1.00      0.38         7

    accuracy                           0.23        30
   macro avg       0.08      0.33      0.13        30
weighted avg       0.05      0.23      0.09        30



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


# Comparison of Kernel Performance

If we compare the performance of the different types of kernels we can clearly see that the sigmoid kernel performs the worst. This is due to the reason that sigmoid function returns two values, 0 and 1, therefore it is more suitable for binary classification problems. However, in our case we had three output classes.

Amongst the Gaussian kernel and polynomial kernel, we can see that Gaussian kernel achieved a perfect 100% prediction rate while polynomial kernel misclassified three instances. Therefore the Gaussian kernel performed slightly better. However, there is no hard and fast rule as to which kernel performs best in every scenario. It is all about testing all the kernels and selecting the one with the best results on your test dataset.