In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Linear SVM Classification

![image.png](attachment:68d94293-6507-47a1-85b2-349f802833cc.png)

SVM Classifers try to find the widest possible street between classes without getting close to the data instances. <br>This is called <i>Large Margin Classification</i>

### Soft Margin Classification

If you impose that all instances must be off the street and on the correct side this is called <i>Hard Margin Classification</i><br>
Hard Margin Classification is poor for two reasons:<br>
1. Only will work on linearly separable data.
2. Is sensetive to outliers

Soft Margin Classification is when you try to keep the street as large as possible and limit margin violations. 

In Scikit-learn the SVM regularization hyperparameter C is used. If C is a low value then the street will become larger but have more margin violations, if C is a high value, then the street becomes smaller and you have less margin violations.

In [3]:
# Load Data
from sklearn.datasets import load_iris
iris = load_iris(as_frame=True)
X = iris.data[["petal length (cm)", "petal width (cm)"]].values
y = (iris.target == 2) # iris virginica

In [4]:
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

In [6]:
svm_clf = make_pipeline(
    StandardScaler(),
    LinearSVC(C=1, random_state=42),
)
svm_clf.fit(X, y)



In [7]:
X_new = [[5.5, 1.7], [5.0, 1.5]]
svm_clf.predict(X_new)

array([ True, False])

In [8]:
svm_clf.decision_function(X_new)

array([ 0.66163411, -0.22036063])