# 9. Support Vector Machines

## 9.1 Support Vector Machines and Kernel Methods

We have concentrated so far in extracting features from
our existing data to obtain a representation using fewer dimensions while retaining as much information as possible.
A different approach is to transform the data in such a way
that our learning tasks look easier to be carried out than in
the original feature space.

In a sense, this is what we have done when applying a
logarithmic transformation to the mammals dataset in
Section 4.5: In the transformed feature space, the relationship between a mammal’s body and brain is better
approximated by a straight line than when using the
untransformed features. Once we have carried out the
learning task, we need to invert the transformation to the
original feature space.

### 9.1.1 Support Vector Machines

A support vector machine is a binary linear classifier
where the classification boundary is built in such a manner
as to minimise the generalisation error in our task. Unlike
other classifiers we have discussed, the support vector machine boundary is obtained using geometrical reasoning
instead of algebraic. With that in mind, the generalisation
error is associated with the geometrical notion of a margin,
which can be defined as the region along the classification
boundary that is free of data points.

In that manner, a support vector machine (SVM) has the
goal of discriminating among classes using a linear decision
boundary that has the largest margin, giving rise to the so-called maximum margin hyperplane or (MMH). Having
the maximum margin is equivalent to minimising the
generalisation error. This is because using the MMH as the
classification boundary minimises the probability that a 
small perturbation in the position of a data point results in a
classification error. Intuitively, it is easy to see that a wider
margin results in having better defined and separate classes.

![alt text](images/svm_1.png "Title")

### 9.1.2 Kernel

A kernel is a function K(x, y) whose arguments x and y
can be real numbers, vectors, functions, etc. It is effectively
a map between these arguments and a real value. The
operation is independent of the order of the arguments. 
are familiar with at least one such kernel: The well-known
vector product. We This means that K(x, y) = K(y, x).

![alt text](images/svm_2.png "Title")

We have a choice of kernels to use. Some of the more
popular ones include:
- Linear 
- Polynomial
 - Gaussian
- Sigmoid

In [3]:
# SVM Regression
%pylab inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
mammals = pd.read_csv('Data/mammals.csv')

Populating the interactive namespace from numpy and matplotlib


In [5]:
body = mammals[['body']].values
brain = mammals[['brain']].values

In [17]:
from sklearn.linear_model import LinearRegression
from sklearn import svm
svm_lm = svm.SVR(kernel='linear',C=1e1) # regression for linear kernel
svm_rbf = svm.SVR(kernel='rbf',C=1e1) # gaussian kernel
logfit = LinearRegression().fit(np.log(body),np.log(brain)) # regression with log transf.

In [22]:
mammals['log_regr'] = np.exp(logfit.predict(np.log(body)))
#mammals['linear_svm'] = np.exp(svm_lm.predict(np.log(body)))
#mammals['rbf_svm'] = np.exp(rbf_svm.predict(np.log(body)))

In [21]:
logfit

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)

In [23]:
# SVM Classification
wine = pd.read_csv('Data/wine.csv')
X = wine.drop(['Wine'],axis=1).values
Y = wine['Wine'].values

In [26]:
X1 = wine[['Alcohol','Color.int']]

In [28]:
import sklearn.model_selection as ms
XTrain,XTest,YTrain,YTest = ms.train_test_split(X1,Y,test_size=0.3,random_state=7)

In [31]:
# SVC classifier with gaussian kernel
from sklearn import svm
SVMclassifier = svm.SVC()
# gridsearch
Cval = 2.**np.arange(-1,1.2,step=0.2)
n_grid = [{'C':Cval}]
from sklearn.model_selection import GridSearchCV
cv_svc = GridSearchCV(estimator=SVMclassifier ,param_grid=n_grid,cv=ms.KFold(n_splits=100))

In [32]:
# Train
cv_svc.fit(XTrain,YTrain)
best_c = cv_svc.best_params_['C']

In [33]:
print(best_c)

1.7411011265922478


In [35]:
svc_clf = svm.SVC(C=best_c)
svc_clf.fit(XTrain,YTrain)

SVC(C=1.7411011265922478, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

In [36]:
y_p = svc_clf.predict(XTest)
from sklearn import metrics
print(metrics.classification_report(y_p,YTest))

precision    recall  f1-score   support

           1       0.85      0.69      0.76        16
           2       0.92      0.92      0.92        24
           3       0.76      0.93      0.84        14

   micro avg       0.85      0.85      0.85        54
   macro avg       0.84      0.84      0.84        54
weighted avg       0.86      0.85      0.85        54



In [39]:
C = best_c
svc = svm.SVC(kernel='linear',C=C).fit(XTrain,YTrain)
y_p = svc.predict(XTest)
print(metrics.classification_report(y_p,YTest))

precision    recall  f1-score   support

           1       0.92      0.71      0.80        17
           2       0.96      0.96      0.96        24
           3       0.76      1.00      0.87        13

   micro avg       0.89      0.89      0.89        54
   macro avg       0.88      0.89      0.88        54
weighted avg       0.90      0.89      0.89        54



In [40]:
C = best_c
rbf_svc = svm.SVC(kernel='rbf',C=C).fit(XTrain,YTrain)
y_p = rbf_svc.predict(XTest)
print(metrics.classification_report(y_p,YTest))

precision    recall  f1-score   support

           1       0.85      0.69      0.76        16
           2       0.92      0.92      0.92        24
           3       0.76      0.93      0.84        14

   micro avg       0.85      0.85      0.85        54
   macro avg       0.84      0.84      0.84        54
weighted avg       0.86      0.85      0.85        54



In [41]:
C = best_c
poly_svc = svm.SVC(kernel='poly',degree=3,C=C).fit(XTrain,YTrain)
y_p = poly_svc.predict(XTest)
print(metrics.classification_report(y_p,YTest))

precision    recall  f1-score   support

           1       0.92      0.75      0.83        16
           2       0.92      0.96      0.94        23
           3       0.76      0.87      0.81        15

   micro avg       0.87      0.87      0.87        54
   macro avg       0.87      0.86      0.86        54
weighted avg       0.88      0.87      0.87        54



In [42]:
C = best_c
lin_svc = svm.SVC(C=C).fit(XTrain,YTrain)
y_p = lin_svc.predict(XTest)
print(metrics.classification_report(y_p,YTest))

precision    recall  f1-score   support

           1       0.85      0.69      0.76        16
           2       0.92      0.92      0.92        24
           3       0.76      0.93      0.84        14

   micro avg       0.85      0.85      0.85        54
   macro avg       0.84      0.84      0.84        54
weighted avg       0.86      0.85      0.85        54

