# Linear SVM Classification

The linear SVM Classifier will fit the widest street between the classes. It is called large margin classifier. This will set a "stree" between that classes that is "supported" (hence the name) by the two instances (vectors) located on the edge of the classes. 

This classifier is very sensitive to outliers and scaling values. 
To correct this problem, we use soft margin classifier.

## Soft Margin Classifier

If we strictly impose that all instances must be off the street and on the right side we call that hard margin classifier. This only works if the data is linearily separable and it is very sensitive to outliers. 

In soft margin we try to find a good balance between keeping the streets as large as possible and limiting the margin violations. This is done by the $C$ parameter. If the model is overfitting we can try to regularize by reducing C. 

In [3]:
import numpy as np
from sklearn import datasets
from sklearn.pipeline import Pipeline

from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

iris = datasets.load_iris()
X = iris["data"][:,(2,3)] # We get width and length
y = (iris["target"]==2).astype(np.float64)

svm_clf = Pipeline([
    ("scaler", StandardScaler()),
    ("linear_svc", LinearSVC(C=1, loss="hinge"))
])

svm_clf.fit(X,y)
svm_clf.predict([[5.5,1.7]])

array([1.])

## Non linear SVM Classifier 

One way to deal with non linear datasets is to add variables such as polynomial variables that will transform the data sets and make the linearily seperable. To implement this idea one could add a polynomial transformer to the data preprocessing pipeline. 

In [32]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
from sklearn.svm import LinearSVC
from sklearn.datasets import make_moons
import matplotlib.pyplot as plt

X,y = make_moons(n_samples = 100, noise = 0.15)

polynomial_svm_clf = Pipeline([
    
    ("polynomial", PolynomialFeatures(degree = 3)),
    ("scaler", StandardScaler()),
    ("svm_clf", LinearSVC(C=10, loss='hinge', max_iter = 2000))
    
])

polynomial_svm_clf.fit(X,y)

Pipeline(steps=[('polynomial', PolynomialFeatures(degree=3)),
                ('scaler', StandardScaler()),
                ('svm_clf', LinearSVC(C=10, loss='hinge', max_iter=2000))])

## Polynomial Kernel

The polynomial kernel allows the SVM to get the same result as adding many polynomial features without actually adding it. 

## Similarity Features 

Another technique to tackle nonlinear problems is to add features computed using a similarity function. By doing so you can add dimensions to the problem, and when turning each instance into its distance to a landmark. 

# Implementing SVM for Regression

Instead of trying to fit the larget street while keeping as much points out of it, in regression you try to fit as much points as possible in it. 

In [4]:
from sklearn.svm import SVR

svm_poly_reg = SVR(kernel="poly", degree = 2, C=100, epsilon=0.1)
svm_poly_reg.fit(X,y)

SVR(C=100, degree=2, kernel='poly')

# Under the hood

We use the convention with the weigths and biases. 

## Decision Function and Predictions

The linear SVMCLassifier computes the decision function $w^T.x + b$. If the result is positifve the predicted class is the positive class and otherwise it is the negative class. 

Training a linear SVM Classifier means finding the values of w to make this margin as wide a spossible while avoiding margin violations or limiting them. 

## Training objective

We need to minimize ||w|| in order to get a large margin. If we also want to avoid margin violations we need to add a constraint with regards to the sign of the decision funcion (not understood). 

## Quadratic Programming 

