# Support Vector Machines



In [None]:
from sklearn import svm

## Classification

SVC (Support Vector Classification), NuSVC and LinearSVC are classes capable of performing binary and multi-class classification on a dataset.

In [2]:
X = [[0, 0], [1, 1]]
y = [0, 1]
clf = svm.SVC()
clf.fit(X, y)

SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

After being fitted, the model can then be used to predict new values:

In [3]:
clf.predict([[2., 2.]])

array([1])

SVMs decision functions depends on some subset of the training data, called the support vectors. Some properties of these support vectors can be found in attributes ```support_vectors_```, ```support_``` and ```n_support_```.



In [4]:
# get support vectors
clf.support_vectors_

array([[0., 0.],
       [1., 1.]])

In [5]:
# get indices of support vectors
clf.support_

array([0, 1], dtype=int32)

In [6]:
# get number of support vectors for each class
clf.n_support_

array([1, 1], dtype=int32)

### Multi-class classification

SVC and NuSVC implement the "one-versus-one" approach for multi-class classification. In total, `n_classes * (n_classes - 1) / 2` classifiers are constructed and each one trains data from two classes. To provide a consistent interface with other classifiers, the `decision_function_shape` option allows to monotonically transform the results of the "one-versus-one" classifiers to a "one-vs-rest" decision function of shape `(n_classes, n_classes)`.

In [7]:
X = [[0], [1], [2], [3]]
Y = [0, 1, 2, 3]
clf = svm.SVC(decision_function_shape='ovo')
clf.fit(X, Y)

SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovo', degree=3, gamma='scale', kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

In [8]:
dec = clf.decision_function([[1]])
dec.shape[1]

6

In [9]:
clf.decision_function_shape = "ovr"
dec = clf.decision_function([[1]])
dec.shape[1]

4

Linear SVC implements "one-vs-the-rest" multi-class strategy, thus training `n_classes` models.

In [13]:
lin_clf = svm.LinearSVC()
lin_clf.fit(X, Y)

LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
          intercept_scaling=1, loss='squared_hinge', max_iter=1000,
          multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
          verbose=0)

In [14]:
dec = lin_clf.decision_function([[1]])
dec.shape[1]

4

Linear SVC can also implement an alternative multi-class strategy: the so-called multi-class SVM formulated by Crammer and Singer. This is done by setting the option `multi_class='crammer_singer'`.

For "one-vs-rest" LinearSVC the attribues `coef_` and `intercept_` have the shape `(n_classes, n_features)` and `(n_classes,)` respectively. Each row of the coefficients corresponds to one of the `n_classes` "one-vs-rest" classifiers and similar for the intercepts, in the order of the "one" class.

The shape of `dual_coef_` is `(n_classes-1, n_SV)`.

### Scores and probabilities

The `decision_function` method of SVC and NuSVC gives per-class scores for each sample (or a single score per sample in the binary case). When the constructor option `probability` is set to `True`, class membership probability estimates (from the methods `predict_proba` and `predict_log_proba`) are enabled.

In the binary case, the probabilities are calibrated using Platt scaling, known to have theoretical issues. If confidence scores are required, but these do not have to be probabilities, then it's advisable to set `probability=False` and use `decision_function` instead of `predict_proba`.

### Unbalanced problems

In problems where it's desired to give more importance to certain classes or certain individual samples, the parameters `class_weight` and `sample_weight` can be used.

SVC implements the parameter `class_weight` in the `fit` method. It's a dictionary of the form `{class_label : value}`, where value is a floating point number > 0 that sets the parameter `C` of class `class_label` to `C * value`. 

SVC, NuSVC, SVR, NuSVR, LinearSVC, LinearSVR and OneClassSVM implement also weights for individual samples in the `fit` method through the `sample_weight` parameter. Similar to `class_weight`, this sets the parameter `C` for the i-th example to `C * sample_weight[i]`, which will encourage the classifier to get these samples right. (Check problem 1.7)

## Regression

The method of Support Vector Classification can be extended to solve regression problems. This method is called Support Vector Regression.

There are three different implementations of Support Vector Regression: SVR, NuSVR and LinearSVR. LinearSVR only considers the linear kernel.

As with classification classes, the fit method will take as argument vectors $X$, $y$, only that in this case $y$ is expected to have floating point values instead of integer values:

In [22]:
X = [[0, 0], [2, 2]]
y = [0.5, 2.5]
regr = svm.SVR()
regr.fit(X, y)

SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='scale',
    kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)

In [23]:
regr.predict([[1, 1]])

array([1.5])

## Tips on practical use

*   **Avoiding data copy:** Check that the data is C-ordered contiguous and double precision, else it will be copied when calling the C implementation. You can check whether a given numpy array is C-contiguous by inspecting its `flags` attribute.
*   **Kernel cache size:** The size of the kernel cache has a strong impact on run times for larger problems. `cache_size` can be modified to a higher value than the default of 200 MB.
*   **Setting C:** `C` is `1` by default. If you have a lot of noisy observations you should decrease it (more regularization).
*   ***Scale your data:** SVM algorithms are not scale invariant (don't forget to scale your test vector too). Check [here](https://scikit-learn.org/stable/modules/preprocessing.html#preprocessing) for more info on preprocessing with sklearn.



## Kernel functions

The *kernel* can be any of the following:
*   linear: $\langle x, x' \rangle$
*   polynomial: $(\varphi \langle x, x' \rangle + r)^d$, where $d$ is specified by parameter `degree`, r by `coef0`
*   rbf: $\exp(-\gamma ||x-x'||^2) $, where $\gamma$ is specified by `gamma` > 0
*   sigmoid: $ \tanh (\varphi \langle x, x' \rangle + r)$, where $r$ is specified by `coef0`

Kernels are specified by the `kernel` parameter:

In [24]:
linear_svc = svm.SVC(kernel="linear")
linear_svc.kernel

'linear'

### Custom Kernels

You can define your own kernels by either giving the kernel as a python function (e.g. `svm.SVC(kernel=function_name)`) or by precomputing the Gram matrix. 

Custom kernel example [here](https://scikit-learn.org/stable/auto_examples/svm/plot_custom_kernel.html).

#### Using the Gram matrix

* Use `kernel="precomputed"` and pass the Gram matrix instead of $X$ to the `fit` and `predict` methods. The kernel values between *all* training vectors and the test vectors must be provided.

# Reference
https://scikit-learn.org/stable/modules/svm.html