# Chapter 5. Support Vector Machines
**SVMs are particularly well suited for classification of complex small-or medium-sized datasets**

## Linear SVM Classification
#### Large margin classification: 
* **The line not only separates the two classes but also stays fat away from the closest training instances as possible**
* **The marigin is fully determined(or "supported") by the instances locates on the edge of the street,these instances are calles *support vectors***
* **SVMs are sensitve to feature scales，needs to implement** StandardScaler **before applying the decision boundary**

#### Hard Margin Classification:
* **If we strictly impose that all instances must be off the street on the right side, this is called *hard margin calssification***
* **Hard margin classification only works if the data is linearly sepearable**
* **Hard margin classification is sensitive to outliers**

#### Soft Margin Classification:
* **The objective is to find a good balance between keeping the street as large as possible and limiting the *margin violations***
* **When creating an SVM model using Scikit-Learn, we can specify a number of hyperparameters。** .C **is one of those hyperparameters**
* **If your SVM model is overfitting, you can try regularizing it by reducing** C.


**Loads the iris dataset, scales the features, and then trains a linear SVM model to detect *Iris virginica* flowers**

In [1]:
import numpy as np
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

In [2]:
iris=datasets.load_iris()

In [3]:
X=iris["data"][:,(2,3)]#petal length, petal width

In [4]:
y=(iris["target"]==2).astype(np.float64)#Iris virginica

In [5]:
svm_clf=Pipeline([
    ("scaler",StandardScaler()),
    ("linear_svc",LinearSVC(C=1,loss="hinge")),
])

In [6]:
svm_clf.fit(X,y)

Pipeline(steps=[('scaler', StandardScaler()),
                ('linear_svc', LinearSVC(C=1, loss='hinge'))])

**Use the model to make predictions**

In [7]:
svm_clf.predict([[5.5,1.7]])

array([1.])

**Unlike Logistic Regression classifiers, SVM classifiers do not output probabilites for each class**

### Similarity Features
* **Another technique to tackle nonlinear problem is to add features computed using a similarity function, which measures how much each instance resembles a particular *landmark***
* **Gausssian Radial Basis Function(RBF): Bell shaped function varying from 0(very far away from the landmark) to 1(at the landmark), resulting transformation will be linearly separable**
* **How to select a landmark**
    * **The simplest approach is to create a landamark at the location of each every instance in the dataset**
    * **Doing that creates many dimensions and thus increases the chances that the trainsformed training set will be linearly separable**
    * **If your training set is bery large, you end up with an equally large number of features**

### Gaussian RBF Kernel
**The similarity features method can be useful with any Machine Learning algorithm, but it may be computationally expensive expensive to compute the additional features, especially on large training sets.**

In [8]:
from sklearn.svm import SVC

In [9]:
rbf_kernel_svm_clf=Pipeline([
    ("scaler",StandardScaler()),
    ("svm_clf",SVC(kernel="rbf",gamma=5,C=0.001))
])

In [10]:
rbf_kernel_svm_clf.fit(X,y)

Pipeline(steps=[('scaler', StandardScaler()),
                ('svm_clf', SVC(C=0.001, gamma=5))])

* **Increasing** gamma **makes the bell-shaped curve narrower. As a result, each instance's range of influence is smaller: the decision boundary ends up being more irregular, waggling around individual instances**
* **Conversely, a small** gamma **valye makes the bell-shaped curvewider: instances have a larger range of influence, and the decision boundary ends uo smoother**
* gamma **acts like a regularization hyperparameter: if your model is overfitting, you should reduce it; if it is underfitting, you should increase it**

### Selecting Kernels:
* **You should always try the linear kernel first, especially if the training set is very large of if it has plenty of features**（LinearSVC **is much faster than** SVC(kernel="linear")), **especially if the trainning set is very large or if they have plenty of features**
* **If the trainning set is not too large, you should also try the Gaussian RBF kernel.**

### Computational Complexity
* LinearSVC **class is based ont the** liblinear **library, which implements an optimized algorithm fot linear SVM. The training time complexity is roughly** O(m*n)
     * **The algorithm takes longer if you require very high precision(Controlled by tolerance parameter e)
* **The** SVC **class is based on the** libsvm **library, has a training time complexity between O(m^2*n) and O(m^3*n)**
    * **It scaled well with the number of features, especailly with *sparse features***

## SVM Regression
* **Instead of classification, the trick is to reverse the objective: instead of trying to fit the largest possible street between two classes while limiting margin violations, SVM Regression tries to fit as many instances as possible on the street while limiting margin violations**
* **The widith of the street is controlled by a hyperparameter, E.**
* **Adding more training instances within the margin does not affect the model's predicitons; thus, the model is said to be *E-sensitive***
* **You can use SK-Leanr's** LinearSVR **class to perform linear SVM Regression**

In [11]:
from sklearn.svm import LinearSVR

In [12]:
svm_reg=LinearSVR(epsilon=1.5)

In [13]:
svm_reg.fit(X,y)

LinearSVR(epsilon=1.5)

**To tackle nonlinear regression, you an use a kernelized SVM model.**
* **The** SVR **class is the regression equivalent of the** SVC **class, and the** LinearSVR **class is the regression equivalent of the** LinearSVC **class**
* **The** LinearSVR **class scales linearly with the size of the training set(just like the** LinearSVC **class), while the** SVR **class gets much too slow when the training set grows large**

In [14]:
from sklearn.svm import SVR

In [15]:
svm_poly_reg=SVR(kernel="poly",degree=2,C=100,epsilon=0.1)

In [16]:
svm_poly_reg.fit(X,y)

SVR(C=100, degree=2, kernel='poly')

## Under the Hood
