# SVM (Support Vector Machine) Notebook

This notebook is design as a comlementary material for the DSGT workshop.

In this notebook, we'll look at the effect and performance of SVM classifiers in supervised learning by finetuning the parameters:

- Regularizer/C (how much do we want to avoid misclassification)
- Kernel (transformation of data)
- Gamma (how far the influence of each data reaches)

In [None]:
# Import all the general imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# Import our sklearn packages for SVM 
from sklearn import svm, datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, recall_score, precision_score, confusion_matrix
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

We will be loading a banknote dataset that involves predicting whether a given banknote is authentic given a number of measures taken from a photograph.

It is a binary (2-class) classification problem. There are 1,372 observations with 4 input variables and 1 output variable. The variable names are as follows:

- Variance of Wavelet Transformed image (continuous).
- Skewness of Wavelet Transformed image (continuous).
- Kurtosis of Wavelet Transformed image (continuous).
- Entropy of image (continuous).
- Class (0 for authentic, 1 for inauthentic).

In [None]:
# Load dataset from csv
# What pandas function should we call?
data = ___________('data_banknote_authentication.csv')

# Convert the dataset into DataFrame
data.head()

Goal: Classify the banknotes into correct classes (authentic or not)

Recall that the correct answer/target/label are given in a supervised learning problem.

So, what columns are our features (X)? What are our target (y)?

In [None]:
# Let's use the Variance and Skewness as our features

# How can we get extract those two columns from data? 
# Keyword: slicing
X = data.__________

#X = StandardScaler().fit_transform(data.iloc[:,0:4])
#pca = PCA(n_components = 2)
#X = pd.DataFrame(data = pca.fit_transform(X), columns = ['component_1', 'component_2'])

# Take a look at the first few rows of X
X.head()

In [None]:
# Again, how would you get our y (label) from data
# Hint: we need only one column
y = data________

y.head()

### Linear SVM

Recall the standard flow of training our model:
- CREATE a model instance

- FIT your data to it (feed your model the data you want it to learn)

- Use the model to PREDICT (show your model data it's never seen and see how well it performs)

In [None]:
# Let's shuffle and split our data into training and testing sets (train : test = 8 : 2)
X_train ,X_test, Y_train , Y_test = train_test_split(X , y , test_size = 0.2, random_state = 34) # shuffled by default

In [None]:
# Create a model instance
# Let's start with a linear SVC with C = 1 and gamma = 'auto'
# How should we pass in the parameters above?
classifier1 = svm.SVC(_________________________________)

# Fit our data to the model
svc1 = classifier1.fit(X_train, Y_train)

In [None]:
# Create a mesh to plot in
x_min, x_max = X_test.iloc[:, 0].min() - 1, X_test.iloc[:, 0].max() + 1 # Max and min for the 1st feature
y_min, y_max = X_test.iloc[:, 1].min() - 1, X_test.iloc[:, 1].max() + 1 # Max and min for the 2nd feature
x_h = abs(x_max - x_min) / 100 # Divide the range into 100 steps
y_h = abs(y_max - y_min) / 100 # Divide 
xx, yy = np.meshgrid(np.arange(0, abs(x_max - x_min), x_h) - abs(x_min), # Plot the mesh
                     np.arange(0, abs(y_max - y_min), y_h) - abs(y_min))

In [None]:
plt.subplot(1, 1, 1)
Z = svc1.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Create a contour of the boundary output by our model
plt.contourf(xx, yy, Z, cmap = plt.cm.RdYlGn, alpha = 0.8)

# Add the data points with correct label color to our contour
plt.scatter(X_test.iloc[:, 0], X_test.iloc[:, 1], c = Y_test, cmap = plt.cm.RdYlGn)

# Other elements of our plot
plt.xlabel('Variance')
plt.ylabel('Skewness')
plt.xlim(xx.min(), xx.max())
plt.title('SVC with linear kernel')
plt.show()

### RBF SVM

In [None]:
classifier2 = svm.SVC(kernel = 'rbf', C = 1, gamma = 'auto')
svc2 = classifier2.fit(X, y)

In [None]:
plt.subplot(1, 1, 1)
Z = svc2.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap = plt.cm.RdYlGn, alpha = 0.8)
plt.scatter(X_test.iloc[:, 0], X_test.iloc[:, 1], c = Y_test, cmap = plt.cm.RdYlGn)
plt.xlabel('Variance')
plt.ylabel('Skewness')
plt.xlim(xx.min(), xx.max())
plt.title('SVC with rbf kernel')
plt.show()

Notice any difference between linear SVC and rbf SVC?

### Compare the four: linear, rbf, polynomial, sigmoid

Here's a helper method for graphing the SVM result.

In [None]:
def plot_svm_result(svc, ax, title):
    Z = svc.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    ax.contourf(xx, yy, Z, cmap = plt.cm.RdYlGn, alpha = 0.8)
    ax.scatter(X_test.iloc[:, 0], X_test.iloc[:, 1], c = Y_test, cmap = plt.cm.RdYlGn)
    ax.set_xlabel('Variance')
    ax.set_ylabel('Skewness')
    ax.set_xlim(xx.min(), xx.max())
    ax.set_title(title)

Let's take a look at the plot for linear, rbf, polynomial, and sigmoid kernel

In [None]:
# Create a fig that contains four subplots for four different kernels
fig, [[ax1, ax2], [ax3, ax4]] = plt.subplots(2, 2, figsize = (14, 10))

# ADJUST the parameters here:
c = 1
g = 'auto' #gamma can also be a numeric value

# Create model instance and fit the data
svc1 = svm.SVC(kernel = 'linear', C = c, gamma = g).fit(X_train, Y_train)
svc2 = svm.SVC(kernel = 'rbf', C = c, gamma = g).fit(X_train, Y_train)
svc3 = svm.SVC(kernel = 'poly', C = c, gamma = g).fit(X_train, Y_train)
svc4 = svm.SVC(kernel = 'poly', C = c, gamma = g).fit(X_train, Y_train)

# Plot linear SVM
plot_svm_result(svc1, ax1, 'SVC with linear kernel')

# Plot rbf SVM
plot_svm_result(svc2, ax2, 'SVC with rbf kernel')

# Plot polynomial SVM
plot_svm_result(svc3, ax3, 'SVC with polynomial kernel')

# Plot sigmoid SVM
plot_svm_result(svc4, ax4, 'SVC with sigmoid kernel')

Try tuning one of the three parameters (kernel, C, gamma) at once.

Which do you think perform the best?

In [None]:
accuracy = []
recall = []
precision = []

models = [svc1, svc2, svc3, svc4]

# Compute the accuracy, recall, precision for four different classifier
for model in models:
    accuracy.append(accuracy_score(Y_test, model.predict(X_test)))
    recall.append(recall_score(Y_test, model.predict(X_test)))
    precision.append(precision_score(Y_test, model.predict(X_test)))

# Calculate the True Positives, False Positives, False Negatives, and True Negatives
metric1 = confusion_matrix(Y_test, svc1.predict(X_test)).ravel()
metric2 = confusion_matrix(Y_test, svc2.predict(X_test)).ravel()
metric3 = confusion_matrix(Y_test, svc3.predict(X_test)).ravel()
metric4 = confusion_matrix(Y_test, svc4.predict(X_test)).ravel()

# Put all metrics in one table
df = pd.DataFrame({'Kernel': ['linear', 'rbf', 'polynomial', 'sigmoid'], 
                  'TP': [metric1[0], metric2[0], metric3[0], metric4[0]],
                  'FP': [metric1[1], metric2[1], metric3[1], metric4[1]],
                  'FN': [metric1[2], metric2[2], metric3[2], metric4[2]],
                  'TN': [metric1[3], metric2[3], metric3[3], metric4[3]],
                  'accuracy': accuracy,
                  'recall': recall,
                  'precision': precision})
df.set_index('Kernel', inplace = True)
df