# Support Vector Machine Intro

SVM is powerful and versatile ML model, capable of:
- Linear & Non-linear Classification
- Linear & Non-linear Regression
- Outlier detection

🔥 - Unlike Logistic Regression, SVMs **do not** output probabilities for each class.

🔥 - Always Scale and Center training data

Native `Binary` classifier

When used for `Multiclass` classification, it uses OvO (one-versus-one) strategy

Advantages:

- Particularly well suited for classification of complex small- or medium-sized datasets

Disadvantages:
- Scales poorly with the size of the training set

In [1]:
from sklearn.svm import SVC

In [2]:
svm_c = SVC()

## SVM Classification

<div>
<img src="attachment:image.png" width="600"/>
</div>

## SVM Regression

To use SVMs for Regression, the trick is to reverse the objective:
- Instead of trying to fit the largest possible street between two classes, while limiting margin violations, SVM Regression tries to **fit as many instances as possible `on` the street while limiting margin violations** (ie. instances off- the street)

The width of the street is controlled by hyperparameter `epsilon`.

Adding more training instances does not affect the predictions, thus the model is e-insensitive.

<div>
<img src="attachment:image.png" width="500"/>
</div>

### Linear SVR

In [1]:
from sklearn.svm import LinearSVR

In [2]:
svm_reg = LinearSVR(epsilon=1.5)

### Nonlinear SVR

It's possible to use a kernelised SVM model.
- Hyperparameter `C`:
    - Large value of C for little regularisation
    - Small value of C for high regularisation

<div>
<img src="attachment:image.png" width="500"/>
</div>

In [3]:
from sklearn.svm import SVR

In [4]:
svm_poly_reg = SVR(kernel='poly',
                   degree=2,
                   C=100,
                   epsilon=0.1)

## Determining the Margin

If result of decision function:
- Positive -> Instance belongs to Positive (1) class
- Negative -> Instance belongs to Negative (0) class

<div>
<img src="attachment:image.png" width="300"/>
</div>

The decision boundary is the set of points where the decision function is equal to 0: it is the intersection of two planes, which is a straight line (rep‐ resented by the thick solid line).

The dashed lines represent the points where the decision function is equal to 1 or –1: they are parallel and at equal distance to the decision boundary, forming a margin around it. Training a linear SVM classifier means finding the value of w and b that make this margin as wide as possible while avoiding margin violations (hard margin) or limiting them (soft margin).

<div>
<img src="attachment:image.png" width="500"/>
</div>

### Using the Kernel trick