### Support vector machine (SVM)

A supervised machine learning algorithm used for classification and regression. It finds the best boundary that separates data into classes.

A hyperplane is a decision boundary that separates data points.
- In 2D → a line
- In 3D → a plane
- In higher dimensions → a hyperplane

SVM tries to maximize the margin which is distance between the hyperplane and the closest data points from each class.
closest data points to the hyperplane are called support vectors.

 

In [1]:
import sklearn
from sklearn import datasets
import numpy as np
import pandas as pd

In [2]:
cancer = datasets.load_breast_cancer()
cancer.data.shape

(569, 30)

In [6]:
print(cancer.data[:5])     # First 5 rows
print(cancer.target[:5]) 

[[1.799e+01 1.038e+01 1.228e+02 1.001e+03 1.184e-01 2.776e-01 3.001e-01
  1.471e-01 2.419e-01 7.871e-02 1.095e+00 9.053e-01 8.589e+00 1.534e+02
  6.399e-03 4.904e-02 5.373e-02 1.587e-02 3.003e-02 6.193e-03 2.538e+01
  1.733e+01 1.846e+02 2.019e+03 1.622e-01 6.656e-01 7.119e-01 2.654e-01
  4.601e-01 1.189e-01]
 [2.057e+01 1.777e+01 1.329e+02 1.326e+03 8.474e-02 7.864e-02 8.690e-02
  7.017e-02 1.812e-01 5.667e-02 5.435e-01 7.339e-01 3.398e+00 7.408e+01
  5.225e-03 1.308e-02 1.860e-02 1.340e-02 1.389e-02 3.532e-03 2.499e+01
  2.341e+01 1.588e+02 1.956e+03 1.238e-01 1.866e-01 2.416e-01 1.860e-01
  2.750e-01 8.902e-02]
 [1.969e+01 2.125e+01 1.300e+02 1.203e+03 1.096e-01 1.599e-01 1.974e-01
  1.279e-01 2.069e-01 5.999e-02 7.456e-01 7.869e-01 4.585e+00 9.403e+01
  6.150e-03 4.006e-02 3.832e-02 2.058e-02 2.250e-02 4.571e-03 2.357e+01
  2.553e+01 1.525e+02 1.709e+03 1.444e-01 4.245e-01 4.504e-01 2.430e-01
  3.613e-01 8.758e-02]
 [1.142e+01 2.038e+01 7.758e+01 3.861e+02 1.425e-01 2.839e-01 2.414

In [7]:
x=cancer.data
y=cancer.target

In [8]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size=0.3,random_state=42)

SVM uses a number of kernel functions to map data to different dimensions.
- when data is linearly separable and has large number of features (e.g.text classification) we use kernel = linear
- when data has curved decision boundary and nteraction between features matters, kernel='poly'
- when data is non-linear, kernel='rbf' or kernel='sigmoid'

Another method is to use LinearSVC. It does NOT support kernel trick and is faster for large datasets.

from sklearn.svm import LinearSVC

model = LinearSVC(C=1.0)


In [9]:
from sklearn import svm
#Create a svm Classifier
clf = svm.SVC(kernel='linear') 
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

In [10]:
from sklearn import metrics
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

Accuracy: 0.9649122807017544


In [None]:
print("Precision:",metrics.precision_score(y_test, y_pred))

print("Recall:",metrics.recall_score(y_test, y_pred))

Precision: 0.9636363636363636
Recall: 0.9814814814814815


In [12]:
from sklearn.metrics import confusion_matrix
print(confusion_matrix(y_test, y_pred))

[[ 59   4]
 [  2 106]]


In [13]:
#Instead of separating classes, SVR finds a function that predicts continuous values while keeping errors within a tolerance margin.
from sklearn import svm
clf = svm.SVR() #Regression
clf.fit(x, y)

In [14]:
pred = clf.predict(X_test)

In [15]:
r2_score = clf.score(x,y)
print(r2_score)

0.7415680138929615
