# Machine Learning 3 - Support Vector Machines

A SVM classifier builds a set of hyper-planes to try and separate the data by maximizing the distance between the borders and the data points.

![SVM](http://scikit-learn.org/stable/_images/sphx_glr_plot_separating_hyperplane_0011.png "Decision border in an SVM")

This separation is generally not possible to achieve in the original data space. Therefore, the first step of the SVM is to project the data into a high or infinite dimensions space in which this linear separation can be done. The projection can be done with linear, polynomial, or more comonly "RBF" kernels.

In [1]:
from lab_tools import CIFAR10, evaluate_classifier, get_hog_image

dataset = CIFAR10('./CIFAR10/')

Pre-loading training data
Pre-loading test data


**Build a simple SVM** using [the SVC (Support Vector Classfiication) from sklearn](http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC). 
**Train** it on the CIFAR dataset.

In [2]:
from sklearn.svm import SVC
from sklearn.model_selection import StratifiedKFold

def cross_validation(C=1.0, kernel='rbf', gamma='scale'):
    kf = StratifiedKFold(5)

    scores = []

    for train,test in kf.split(dataset.train['hog'], dataset.train['labels']):
        train_x = dataset.train['hog'][train]
        train_y = dataset.train['labels'][train]

        test_x = dataset.train['hog'][test]
        test_y = dataset.train['labels'][test]
    
        clf = SVC(C=C, kernel=kernel, gamma=gamma)
        
        clf.fit(train_x, train_y)
        
        pred = clf.predict(test_x)
        score = accuracy_score(test_y, pred)
        
        scores.append(score)
    return scores

In [13]:
clf = SVC(kernel='linear')
clf.fit(dataset.train["hog"], dataset.train["labels"])

SVC(kernel='linear')

In [9]:
print(len(dataset.train["hog"]))

15000


**Explore the classifier**. How many support vectors are there? What are support vectors?

In [14]:
all_support_vectors = clf.support_vectors_ #Each line = 1 "Support Vector" ; 1024 columns forming a 32x32 image 
vectors_per_class = clf.n_support_ #Number of "Support Vector" for each class

# -- Your code here -- #
print(len(all_support_vectors))
print(vectors_per_class)

12956
[3638 4757 4561]


**Try to find the best "C" (error penalty) and "gamma" parameters** using cross-validation. What influence does "C" have on the number of support vectors?

In [None]:

# -- Your code here -- #


# Comparing algorithms

Using the best hyper-parameters that you found for each of the algorithms (kNN, Decision Trees, Random Forests, MLP, SVM):

* Re-train the models on the full training set.
* Compare their results on the test set.

In [None]:

# -- Your code here -- #
