# Support Vector Machines

## Libraries

In [1]:
import numpy as np
import functions

#for importing iris dataset
from sklearn import datasets

# to make this notebook's output stable across runs
np.random.seed(42)

# To plot figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

#for modeling
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler


import warnings
warnings.filterwarnings("ignore")

## Linear SVM Classification

### Large Margin Classification

- All instances must be outside of the support vectors

In [None]:
iris = datasets.load_iris()
X = iris["data"][:, (2, 3)]  # petal length, petal width
y = iris["target"]

setosa_or_versicolor = (y == 0) | (y == 1)
X = X[setosa_or_versicolor]
y = y[setosa_or_versicolor]

# SVM Classifier model
svm_clf = SVC(kernel="linear", C=float("inf"))
svm_clf.fit(X, y)

In [None]:
functions.large_margin_classification(X,y)

For the graph on the left:
- The two iris classes can be easily seperated by a straight line(linear separable). 
- The decision line(green dashed line)is so bad that it doesn't separate the classes properly. 
- The two linear classifiers(represented by red or purple) work perfectly but since the decision boundary is so far off, it won't work well on new instances.

For the graph on the right:
- In contrast the solid line in the plot represents the decision boundary of an SVM classifier. It not only separates the two classes, but stays as far away from the closest training instance as possible. 
- SVM classifier can be thought of as fitting the widest possible street(represented by parallel dashed lines) between the classes. 
- Adding more instances(highlighted circles) will not affect the decision boundary at all because it is fully supported by the instances located at the edge of "the street"(support vectors).

#### Sensitivity to Feature Scaling

In [None]:
functions.feature_scaling_sensitivity()

### Soft Margin Classification

#### Sensitivity to Feature Outliers

In [None]:
functions.sensitivity_to_outliers(X,y)

### Large margin vs margin violations