### Support Vector Machine Algorithm

Implementing Kernel SVM with Scikit-Learn is similar to the simple SVM. In this section, we will use the famous iris dataset to predict the category to which a plant belongs based on four attributes: sepal-width, sepal-length, petal-width and petal-length.
The dataset can be downloaded from the following link:
https://archive.ics.uci.edu/ml/datasets/iris4
The rest of the steps are typical machine learning steps and need very little explanation until we reach the part where we train our Kernel SVM.



In [1]:
# Importing Libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [2]:
# Importing the Dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

In [3]:
# Assign colum names to the dataset
colnames = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']

In [4]:
# Read dataset to pandas dataframe
irisdata = pd.read_csv(url, names=colnames)

In [5]:
irisdata.head()

Unnamed: 0,sepal-length,sepal-width,petal-length,petal-width,Class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [6]:
# Preprocessing
X = irisdata.drop('Class', axis=1)
y = irisdata['Class']

In [7]:
# Train Test Split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)

##### **Training the Algorithm**

To train the kernel SVM, we use the same SVC class of the Scikit-Learn's svm library. The difference lies in the value for the kernel parameter of the SVC class. In the case of the simple SVM we used "linear" as the value for the kernel parameter. However, for kernel SVM you can use Gaussian, polynomial, sigmoid, or computable kernel. We will implement polynomial, Gaussian, and sigmoid kernels to see which one works better for our problem.

###### Polynomial Kernel

In [8]:
from sklearn.svm import SVC
svclassifier = SVC(kernel='poly', degree=8)
svclassifier.fit(X_train, y_train)

SVC(degree=8, kernel='poly')

In [9]:
# Making Predictions
# Now once we have trained the algorithm, the next step is to make predictions on the test data.
# Execute the following script to do so:
y_pred = svclassifier.predict(X_test)

In [10]:
# Evaluating the Algorithm
# As usual, the final step of any machine learning algorithm is to make evaluations for polynomial kernel. Execute the following script:

from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))


[[ 8  0  0]
 [ 0 11  1]
 [ 0  2  8]]
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00         8
Iris-versicolor       0.85      0.92      0.88        12
 Iris-virginica       0.89      0.80      0.84        10

       accuracy                           0.90        30
      macro avg       0.91      0.91      0.91        30
   weighted avg       0.90      0.90      0.90        30



###### Gaussian Kernel


In [11]:
from sklearn.svm import SVC
svclassifier = SVC(kernel='rbf')
svclassifier.fit(X_train, y_train)

SVC()

In [12]:
# Prediction and Evaluation
y_pred = svclassifier.predict(X_test)
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))


[[ 8  0  0]
 [ 0 12  0]
 [ 0  0 10]]
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00         8
Iris-versicolor       1.00      1.00      1.00        12
 Iris-virginica       1.00      1.00      1.00        10

       accuracy                           1.00        30
      macro avg       1.00      1.00      1.00        30
   weighted avg       1.00      1.00      1.00        30



###### Sigmoid Kernel

In [13]:
# Finally, let's use a sigmoid kernel for implementing Kernel SVM. Take a look at the following script:
from sklearn.svm import SVC
svclassifier = SVC(kernel='sigmoid')
svclassifier.fit(X_train, y_train)

SVC(kernel='sigmoid')

In [14]:
# Prediction and Evaluation
y_pred = svclassifier.predict(X_test)
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

[[ 8  0  0]
 [12  0  0]
 [10  0  0]]
                 precision    recall  f1-score   support

    Iris-setosa       0.27      1.00      0.42         8
Iris-versicolor       0.00      0.00      0.00        12
 Iris-virginica       0.00      0.00      0.00        10

       accuracy                           0.27        30
      macro avg       0.09      0.33      0.14        30
   weighted avg       0.07      0.27      0.11        30



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


**Comparison of Kernel Performance**

If we compare the performance of the different types of kernels we can clearly see that the sigmoid kernel performs the worst. This is due to the reason that sigmoid function returns two values, 0 and 1, therefore it is more suitable for binary classification problems. However, in our case we had three output classes.

Amongst the Gaussian kernel and polynomial kernel, we can see that Gaussian kernel achieved a perfect 100% prediction rate while polynomial kernel misclassified one instance. Therefore the Gaussian kernel performed slightly better. However, there is no hard and fast rule as to which kernel performs best in every scenario. It is all about testing all the kernels and selecting the one with the best results on your test dataset.


###### Grid Search

Now we will tune the parameters, check for the improvement
Tuning parameters value for machine learning algorithms effectively improves the model performance.


In [16]:
# Import Gridsearch from Scikit Learn.
from sklearn.model_selection import GridSearchCV

In [17]:
# Create a dictionary called param_grid and fill out some parameters for C and Gamma
param_grid = {'C':[0.1,1,10,100], 'gamma':[1,0.1,0.01,0.001]}

In [18]:
# Create a GridSearchCV object and fit it to the training data
# Grid search is a model hyperparameter optimization technique.
grid = GridSearchCV(SVC(), param_grid, refit = True, verbose=3)
grid.fit(X_train, y_train)

Fitting 5 folds for each of 16 candidates, totalling 80 fits
[CV 1/5] END ....................C=0.1, gamma=1;, score=1.000 total time=   0.0s
[CV 2/5] END ....................C=0.1, gamma=1;, score=0.792 total time=   0.0s
[CV 3/5] END ....................C=0.1, gamma=1;, score=1.000 total time=   0.0s
[CV 4/5] END ....................C=0.1, gamma=1;, score=0.958 total time=   0.0s
[CV 5/5] END ....................C=0.1, gamma=1;, score=0.958 total time=   0.0s
[CV 1/5] END ..................C=0.1, gamma=0.1;, score=0.875 total time=   0.0s
[CV 2/5] END ..................C=0.1, gamma=0.1;, score=0.792 total time=   0.0s
[CV 3/5] END ..................C=0.1, gamma=0.1;, score=0.958 total time=   0.0s
[CV 4/5] END ..................C=0.1, gamma=0.1;, score=0.917 total time=   0.0s
[CV 5/5] END ..................C=0.1, gamma=0.1;, score=0.958 total time=   0.0s
[CV 1/5] END .................C=0.1, gamma=0.01;, score=0.667 total time=   0.0s
[CV 2/5] END .................C=0.1, gamma=0.01;

GridSearchCV(estimator=SVC(),
             param_grid={'C': [0.1, 1, 10, 100],
                         'gamma': [1, 0.1, 0.01, 0.001]},
             verbose=3)

In [23]:
print(grid.best_estimator_)

SVC(C=10, gamma=0.1)


In [19]:
# Let us predict using the Grid model
pred_grid = grid.predict(X_test)

In [20]:
# Let us compute the confusion matrix
print(confusion_matrix(y_test, pred_grid))

[[ 8  0  0]
 [ 0 12  0]
 [ 0  1  9]]


In [21]:
# Let us print the report also
print(classification_report(y_test, pred_grid))

                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00         8
Iris-versicolor       0.92      1.00      0.96        12
 Iris-virginica       1.00      0.90      0.95        10

       accuracy                           0.97        30
      macro avg       0.97      0.97      0.97        30
   weighted avg       0.97      0.97      0.97        30

