# SVM

SVM helps us to find the best decision boundary that will help us seperate our space into classes. 

![](svm1.png)


The decision boundary is searched using the maximum margin line.

![](svm2.png)

The decision boundary line is drawn equidistant from the red and green points and the margin i.e, the distance between the decision boundary and the points known as __margin__ must be maximised. The red and green points are known as __Support Vectors__ as the other red and green points do not contribute to the margin.

The decision boundary is also known as __maximum margin hyperplane__. (hyperplane - as its multidimensional)

The green dotted line is known as __Positive hyperplane__ and the red dotted line is known as __Negative hypeplane__. The names are given based on the position of these lines (as per convention), the line to the right of __maximum margin hyperplane__ is known as __Positive hyperplane__ and to the left is known as __Negative hypeplane__.

The essence of SVM is we are working with linearly seperable dataset where we can create a hyperplane to seperate two categories.

## What is so special about SVMs?


![](svm3.png)


Here we look at the points which are very much similar to the other class i.e, closer to the other class. These are support vectors and will be very close to the decision boundary. 

So it is a bit extreme type of algorithm as it works/targets the points at the extreme ends or boundary. This is what makes the SVM different to most of the other classification algorithms. 

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [2]:
df = pd.read_csv("..\Social_Network_Ads.csv")
df.head()

Unnamed: 0,Age,EstimatedSalary,Purchased
0,19,19000,0
1,35,20000,0
2,26,43000,0
3,27,57000,0
4,19,76000,0


In [3]:
X = df.iloc[:,:-1]
X.head()

Unnamed: 0,Age,EstimatedSalary
0,19,19000
1,35,20000
2,26,43000
3,27,57000
4,19,76000


In [4]:
y = df.iloc[:,-1]
y.head()

0    0
1    0
2    0
3    0
4    0
Name: Purchased, dtype: int64

## Splitting the dataset into the Training set and Test set

In [5]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
X_train.head()

Unnamed: 0,Age,EstimatedSalary
250,44,39000
63,32,120000
312,38,50000
159,32,135000
283,52,21000


## Feature Scaling

In [6]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

## [Support Vector Classifier](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html)

In [7]:
from sklearn.svm import SVC
clf = SVC(kernel = 'linear', random_state = 0)

In [8]:
clf.fit(X_train, y_train)

SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='linear',
    max_iter=-1, probability=False, random_state=0, shrinking=True, tol=0.001,
    verbose=False)

#### Predicting a new result

In [9]:
y_pred = clf.predict(X_test)

#### making the confusion matrix

In [10]:
from sklearn.metrics import confusion_matrix, accuracy_score

In [11]:
cm = confusion_matrix(y_test, y_pred)
print("Confusion matrix: ")
cm

Confusion matrix: 


array([[66,  2],
       [ 8, 24]], dtype=int64)

In [12]:
accuracy_score(y_test, y_pred)

0.9

![](svm4.png)

![](svm5.png)