# Kernel SVM - Digits (+open questions)
In the accompanying notebook the digits dataset is loaded. This dataset contains 8x8 pixel images of digits from 0-9. We train a kernel SVM in this exercise to predict the digits. 

The following code has already been provided:

In [None]:
# Standard scientific Python imports
import numpy as np
import matplotlib.pyplot as plt

#sklearn imports
import sklearn
from sklearn import datasets
from sklearn.model_selection import train_test_split

In [None]:
#Load the data
digits = datasets.load_digits()

_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
for ax, image, label in zip(axes, digits.images, digits.target):
    ax.set_axis_off()
    ax.imshow(image, cmap=plt.cm.gray_r, interpolation="nearest")
    ax.set_title("Label %i" % label)

In [None]:
# flatten the images
n = len(digits.images)
D = digits.images.reshape((n, -1))
y = digits.target

# Split data into 70% train and 30% test subsets
D_train, D_test, y_train, y_test = train_test_split(
    D, y, test_size=0.3, shuffle=False
)

## 7a (3pts)
Train an RBF kernel SVM with parameters `gamma=0.0008`, `C=0.9`. Use the SVC SVM model from sklearn to do so. Train the model on the `D_train` dataset (70-30 split) and test the model on the `D_test` dataset.

What is the accuracy of the model on the test data?

## 7b (5pts)
The multiclass SVM from sklearn uses a one-vs-one scheme: one SVM is learned for each combination of two classes. Correspondingly, we can interpret the support vectors based on our knowledge on what happens "under the hood". To that end, explain first how the prediction of a class $y\in\{−1,1\}$ is determined by the support vectors of that SVM. State the prediction formula for SVMs and explain where we find the support vectors in the formula and how the prediction works depending on the kernel, the support vectors and the learned parameters.

## 7c (4pts)
Sklearn has a peculiar way to denote the learned support vectors. To understand how this works read section 1.4.1.1 on [this website](https://scikit-learn.org/stable/modules/svm.html#svm-multi-class), including the multi-class strategies and answer the following questions for the model obtained in question part a.

How many support vectors are there to distinguish between classes 0 and 1?

## 7d (12pts)
1. Explain how you extract the support vectors for the SVM classifying between 0 and 1 from the sklearn model. Include screenshots of your code to make clear how you arrive at your result for Question c.


2. Use the plotting function from the notebook to plot four of the support vectors for each class (four support vectors for class 0 and four for class 1) that are most influential for the SVM discriminating between class 0 and 1. Explain how and why you chose the plotted support vectors.


3. Based on the role that the support vectors have in the prediction, what would you expect what the plotted support vectors look like, or what characteristic they would have? Do you see these characteristics in the plotted support vectors or are you surprised by the result?

## 7e (6pts)
Use the sklearn function `GridSearchCV` to determine the best combination for the parameters `gamma` and `C` according to a 5-fold cross validation of the SVC SVM with RBG kernel. Train the model on the whole dataset `D`, not just `D_train`. Use as the scoring method the accuracy and set as the candidate parameters $gamma \in \{0.0001,0.0006,0.001,0.006\}$ and $C \in \{0.6,0.8,1,2,3,4,6\}$.

What are the parameters resulting in the highest cross-validated scores?

What is the mean cross-validates accuracy of these parameters?