# Support Vector Machines

_We copy in a utility found here, written strictly by Meng-Fen Chiang_

In [6]:
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
import numpy as np

def plot_svc_decision_function(model, features, labels):
    plt.scatter(features[labels == -1, 0],
            features[labels == -1, 1],
            s=50, c='lightblue',
            marker='s', edgecolor='black',
            label='class 1')
    plt.scatter(features[labels == 1, 0],
            features[labels == 1, 1],
            s=50, c='orange',
            marker='o', edgecolor='black',
            label='class 2')
    
    plt.legend(scatterpoints=1)
    plt.tight_layout()

    ax = plt.gca()
    xlim = ax.get_xlim()
    ylim = ax.get_ylim()

    colors = ('lightblue', 'orange', 'lightgreen', 'gray', 'cyan')
    cmap = ListedColormap(colors[:2])
    x1_min, x1_max = xlim[0] - 1, xlim[1] + 1
    x2_min, x2_max = ylim[0] - 1, ylim[1] + 1
    xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, 0.02),
                           np.arange(x2_min, x2_max, 0.02))
    Z = model.predict(np.array([xx1.ravel(), xx2.ravel()]).T)
    Z = Z.reshape(xx1.shape)
    ax.contourf(xx1, xx2, Z, alpha=0.1, cmap=cmap)    
    
    x = np.linspace(xlim[0], xlim[1], 30)
    y = np.linspace(ylim[0], ylim[1], 30)
    Y, X = np.meshgrid(y, x)
    xy = np.vstack([X.ravel(), Y.ravel()]).T
    P = model.decision_function(xy).reshape(X.shape)
    
    ax.contour(X, Y, P, colors='k',
               levels=[-1, 0, 1], alpha=0.5,
               linestyles=['--', '-', '--'])
    
    ax.set_xlim(xlim)
    ax.set_ylim(ylim)

# Q1.1 - 1.4

Recall $C$ references [the parameter in](https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html#sklearn-svm-linearsvc) `sklearn's` linear support vector machine implementation. We understand that $C$ drives the idea of a spectrum between a hard margin and a soft margin SVM by associating cost to misclassifcation.

We carry out a leave-1-out cross-validation with an SVM, we show that SVM with  𝐶=1 can be improved by setting  𝐶=0.01. Reported is the train and test performance, we choose accuracy as the precise measure.

In [7]:
from sklearn.svm import SVC
from ipywidgets import interact
from sklearn.datasets import make_blobs
from sklearn.model_selection import LeaveOneOut
from sklearn.metrics import accuracy_score

features, labels = make_blobs(n_samples=100, centers=2,
                      random_state=0, cluster_std=1.1)
labels = np.where(labels==0, -1, 1)

def evaluatePerformance(model, features, labels):
    train_scores, test_scores = [], []

    for train_index, test_index in LeaveOneOut().split(features):
        training_features, testing_features = features[train_index], features[test_index]
        training_labels, testing_labels = labels[train_index], labels[test_index]

        model.fit(training_features, training_labels)

        training_labels_pred = model.predict(training_features)
        train_score = accuracy_score(training_labels, training_labels_pred)
        train_scores.append(train_score)

        testing_labels_pred = model.predict(testing_features)
        test_score = accuracy_score(testing_labels, testing_labels_pred)
        test_scores.append(test_score)
    
    return np.mean(train_scores), np.mean(test_scores)

def interactivePlottingOfSVM(C):
    model = SVC(kernel='linear', C=C).fit(features, labels)
    train_score, test_score = evaluatePerformance(model, features, labels)    
    
    print("Mean training accuracy: {:.2f}".format(train_score))
    print("Mean test accuracy: {:.2f}".format(test_score))
    plot_svc_decision_function(model, features, labels)

interact(interactivePlottingOfSVM, C=[0.1, 1000]);

interactive(children=(Dropdown(description='C', options=(0.1, 1000), value=0.1), Output()), _dom_classes=('wid…

# Q1.5

We've designed a dataset with a 100 points for which the selection of $C$ in a linear SVM makes a difference. To dramatise the impact of $C$ we choose a dataset that has some overlap, comparing SVM's of a larger margin (low $C$) and a smaller margin (high $C$).

Making performance generalisations of a differing $C$ value is dangerous, in this particular dataset though, a higher cost of misclassifcation (higher $C$) narrows the margin perhaps too much, overfitting a little too much on the training dataset (hence explaining the better training score). The lower $C$, yields a lower testing error, but looks to generalise better (higher testing score).

# Q2.1-2

The investment in utilities above makes the evaluation and plotting of SVM's with differing kernel's now a problem of the past... there seems to be no motivation re: changing the evaluation method in this case given the marking criteria given.

Hence we carry out the same leave-1-out cross-validation to show that `rbf` kernel is significantly more performant compared to the `linear` kernel. Hence we "choose" this kernel for this problem (directly answering Q2.2).

In [19]:
import pandas as pd
from sklearn.model_selection import train_test_split

dataset = pd.read_csv('datasets/D2.csv', header=None)
features, labels = dataset.iloc[:, :-1].to_numpy(), dataset.iloc[:, -1].to_numpy()

def interactivePlottingOfSVM(kernel):
    model = SVC(kernel=kernel).fit(features, labels)
    train_score, test_score = evaluatePerformance(model, features, labels)        
    print("Mean training accuracy: {:.2f}".format(train_score))
    print("Mean test accuracy: {:.2f}".format(test_score))
    plot_svc_decision_function(model, features, labels)

interact(interactivePlottingOfSVM, kernel=["linear", "poly", "rbf"]);

interactive(children=(Dropdown(description='kernel', options=('linear', 'poly', 'rbf'), value='linear'), Outpu…

# Q2.3
We are clearly dealing with a non-linearly separable dataset. Points from one class are surrounded by points from the other class, forming a pattern similar to... human lungs maybe?

It is well known just looking at the plot alone that the radial basis function (RBF) kernel has a higher chance of performing better. Because we know it maps the input into a higher dimensional space where the data might become linearly separable.

Clearly much more viable than a straight line, or even a polynomially fitted boundary... so seeing the difference in the mean training/testing errors here shouldn't come as a surprise.