###### The University of Melbourne, School of Computing and Information Systems
# COMP30027 Machine Learning, 2021 Semester 1

## Week 6 - Practical Workshop

Today, we will be examining the behaviour of some **Support Vector Machine classifiers.**

To do so, we are using the `IRIS` dataset again.


In [None]:
import numpy as np
from sklearn import datasets, svm
import matplotlib.pyplot as plt

iris = datasets.load_iris()

### Exercise 1. 
By only considering the first 2 features of this dataset (`'Sepal length'` and `'Sepal width'`) create a 2D projection of the iris dataset.

In [None]:
# Take the first two features. We could avoid this by using a two-dim dataset
X = iris.data[:, :2]
y = iris.target

x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5

# Plot the training points
plt.scatter(..., ..., c=..., cmap=plt.cm.Set1, edgecolor='k')
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')

plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.xticks(())
plt.yticks(())

plt.show()

#### Exercise 1. (a)
The following code shows how to plot the decision surface for four SVM classifiers with different kernels:
        
   * LinearSVC()                        # Linear SVM
   * SVC(kernel='linear')               # SVM with linear Kernel
   * SVC(kernel='rbf', gamma = 0.7, C=1)     # SVM with Radial Basis Function (RBF) kernel
   * SVC(kernel='poly', degree = 3, C=1)     # SVM with Polynomial Kernel
        
Examine the visualisations of the four different SVMs, paying close attention to the decision boundaries. Which do you think has the best expressitivity, based on the two–dimensional slice shown?
    
**Note:** Explaining the insight behind the kernels' hyper-parameters (e.g. $\gamma$ (gamma) as 'smoothing factor') is out of scope of this subject. 

In [None]:
# Derived from the scikit-learn documentation example from:
# https://scikit-learn.org/stable/auto_examples/svm/plot_iris_svc.html

def make_meshgrid(x, y, h=.02):
    """Create a mesh of points to plot in

    Parameters
    ----------
    x: data to base x-axis meshgrid on
    y: data to base y-axis meshgrid on
    h: stepsize for meshgrid, optional

    Returns
    -------
    xx, yy : ndarray
    """
    x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
    y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
    
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    return xx, yy


def plot_contours(ax, clf, xx, yy, **params):
    """Plot the decision boundaries for a classifier.

    Parameters
    ----------
    ax: matplotlib axes object
    clf: a classifier
    xx: meshgrid ndarray
    yy: meshgrid ndarray
    params: dictionary of params to pass to contourf, optional
    """
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    out = ax.contourf(xx, yy, Z, **params)
    return out


# Take the first two features. We could avoid this by using a two-dim dataset
X = iris.data[:, :2]
y = iris.target

# we create an instance of SVM and fit out data. We do not scale our
# data since we want to plot the support vectors

C = 1.0  # SVM regularization parameter

models = (svm.SVC(kernel='linear', C=C),
          svm.LinearSVC(C=C, max_iter=10000),
          svm.SVC(kernel='rbf', gamma=0.7, C=C),
          svm.SVC(kernel='poly', degree=3, gamma='auto', C=C))

models = (clf.fit(X, y) for clf in models)

# title for the plots
titles = ('SVC with linear kernel',
          'LinearSVC (linear kernel)',
          'SVC with RBF kernel',
          'SVC with polynomial (degree 3) kernel')

# Set-up 2x2 grid for plotting.
fig, sub = plt.subplots(2, 2, figsize=(15,15))
plt.subplots_adjust(wspace=0.4, hspace=0.4)

X0, X1 = X[:, 0], X[:, 1]
xx, yy = make_meshgrid(X0, X1)

for clf, title, ax in zip(models, titles, sub.flatten()):
    plot_contours(ax, clf, xx, yy,cmap=plt.cm.coolwarm, alpha=0.8)
    ax.scatter(X0, X1, c=y, cmap=plt.cm.coolwarm, s=20, edgecolors='k')
    ax.set_xlim(xx.min(), xx.max())
    ax.set_ylim(yy.min(), yy.max())
    ax.set_xlabel('Sepal length')
    ax.set_ylabel('Sepal width')
    ax.set_xticks(())
    ax.set_yticks(())
    ax.set_title(title)

plt.show()


#### Exercise 1. (b)
Lets use another pair of attributes from the Iris dataset (`'petal length'` and `'petal width'`) to create a differnt 2D projection of the iris dataset. 




In [None]:
# petal length and petal width are the final two attributes.
X = iris.data[:, 2:]
y = iris.target

x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5

# Plot the training points
plt.scatter(..., ..., c=..., cmap=plt.cm.Set1, edgecolor='k')
plt.xlabel('Petal length')
plt.ylabel('Petal width')

plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.xticks(())
plt.yticks(())

plt.show()

   Examine the visualisation of the four SVM models on this new mapping. What is your conclusion comparing results from part (a) and part (b) graphs?

In [None]:
# Derived from the scikit-learn documentation example from:
# https://scikit-learn.org/stable/auto_examples/svm/plot_iris_svc.html

# we create an instance of SVM and fit out data. We do not scale our
# data since we want to plot the support vectors

C = 1.0  # SVM regularization parameter

models = (svm.SVC(kernel='linear', C=C),
          svm.LinearSVC(C=C, max_iter=10000),
          svm.SVC(kernel='rbf', gamma=0.7, C=C),
          svm.SVC(kernel='poly', degree=3, gamma='auto', C=C))

models = (clf.fit(X, y) for clf in models)

# title for the plots
titles = ('SVC with linear kernel',
          'LinearSVC (linear kernel)',
          'SVC with RBF kernel',
          'SVC with polynomial (degree 3) kernel')


# Set-up 2x2 grid for plotting.
fig, sub = plt.subplots(2, 2, figsize=(15,15))
plt.subplots_adjust(wspace=0.4, hspace=0.4)

X0, X1 = ..., ...
xx, yy = make_meshgrid(X0, X1)

for clf, title, ax in zip(models, titles, sub.flatten()):
    plot_contours(ax, clf, xx, yy, cmap=plt.cm.coolwarm, alpha=0.8)
    ax.scatter(..., ..., c=..., cmap=plt.cm.coolwarm, s=20, edgecolors='k')
    ax.set_xlim(xx.min(), xx.max())
    ax.set_ylim(yy.min(), yy.max())
    ax.set_xlabel(...)
    ax.set_ylabel(...)
    ax.set_xticks(())
    ax.set_yticks(())
    ax.set_title(title)

plt.show();

#### Exercise 1. (c)
The default value of the $C$ parameter (the “penalty” for mis-classified examples OR the 'regularisation factor') is 1. Increase (or decrease) this value and observe how the decision boundaries change.

#### Exercise 1. (d)
Calculate the training accuracy of the various SVM classifiers that you graphed in part (b). Do you see any evidence that any of these classifiers might be overfitting this data?

In [None]:
#these models are already fitted to the whole iris dataset
#Note that they are trained only on 2 out of four features.

models = (svm.SVC(kernel='linear', C=C),
          svm.LinearSVC(C=C, max_iter=10000),
          svm.SVC(kernel='rbf', gamma=0.7, C=C),
          svm.SVC(kernel='poly', degree=3, gamma='auto', C=C))

models = (clf.fit(X, y) for clf in models)

# title for the plots
titles = ('SVC with linear kernel',
          'LinearSVC (linear kernel)',
          'SVC with RBF kernel',
          'SVC with polynomial (degree 3) kernel')

for title, model in zip(titles, models):
    acc = ...
    print(title, acc)

*The accuracy on the training data does look suspiciously high - especially in light of the different decision boundaries - but we need more information to be sure.*

### Exercise 2.
Let’s summarise some earlier work. Use all four attributes from the Iris data, and compare the training accuracy with the accuracy estimated by 10–fold (stratified) cross–validation, for the following models:

    (a) One-R
    (b) 1-Nearest Neighbour ( neighbors.KNeighborsClassifier )
    (c) 5-Nearest Neighbour
    (d) Decision Trees
    (e) LinearSVC()
    (f) SVMs with a cubic (polynomial degree 3) kernel 
    (g) SVMs with an RBF kernel

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score

#print(cross_val_score(zero_r, X, y, cv=10))
X = iris.data
y = iris.target

#try to change C to 1000 or 0.001
C = 1.0  # SVM regularization parameter

models = [DecisionTreeClassifier(max_depth=1),
          KNeighborsClassifier(n_neighbors=1),
          KNeighborsClassifier(n_neighbors=5),
          DecisionTreeClassifier(max_depth=None),
          svm.LinearSVC(C=C),
          svm.SVC(kernel='rbf', gamma=0.7, C=C),
          svm.SVC(kernel='poly', degree=3, C=C)]

titles = ['1-R',
          '1-Nearest Neighbour',
          '5-Nearest Neighbour',
          'Decision Tree',
          'LinearSVC',
          'SVM with a cubic kernel',
          'SVM with an RBF kernel']

title_training_acc = {}
for title, model in zip(titles, models):
    model.fit(..., ...)
    title_training_acc[title] = ...

title_crossvalidation_acc = {}
for title, model in zip(titles, models):
    title_crossvalidation_acc[title] = ...

for title in titles:
    print(title, ': Training Acc', title_training_acc[title], '; X-Val Acc', title_crossvalidation_acc[title])
    