# Support Vector Machine
A machine learning algorithm that uses supervised learning models to solve complex classification, regression, and outlier detection problems by performing optimal data transformations that determine boundaries between data points based on predefined classes, labels, or outputs. SVMs are widely adopted across disciplines such as healthcare, natural language processing, signal processing applications, and speech & image recognition fields.

Technically, the primary objective of the SVM algorithm is to identify a hyperplane that distinguishably segregates the data points of different classes. The hyperplane is localized in such a manner that the largest margin separates the classes under consideration.



In [2]:
#@title Imports

import sklearn
from sklearn import datasets
from sklearn import svm
from sklearn import metrics
import numpy as np
import matplotlib.pyplot as plt


In [3]:
cancer = datasets.load_breast_cancer()

X = cancer.data
y = cancer.target

print("Feature names: ", cancer.feature_names)
print("Label names: ", cancer.target_names)

x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size=0.1)

Feature names:  ['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']
Label names:  ['malignant' 'benign']


### Margin
The margin is the distance that separates all of the points in our test data. Typically the greater our margin the better our classification will be.
- hard margin: no points may exist inside the margin
- soft margin: we let outlier points exist inside the margin

### Kernels
Kernels can be used to bring our data to a higher dimension to create a hyperplane more easily. A kernel is a function that takes as input our features and returns a value equal to the (n+1) dimensional coordinate (n = num of features).

## Using sklearn to implement the SVM algorithm

In [4]:
model = svm.SVC(kernel="linear", C=2)
model.fit(x_train, y_train)

y_pred = model.predict(x_test)

acc = metrics.accuracy_score(y_test, y_pred)

print(acc)

0.9473684210526315


## Iris dataset

The Iris dataset is a multivariate dataset introduced by Ronald Fisher in 1936. It consists of 150 instances of 3 Iris species: Iris setosa, Iris virginica and Iris versicolor. The four considered variables are sepal length, sepal width, petal length and petal width. The classes of the dataset elements can be 0 in the case of Iris setosa, 1 in the case of Iris versicolor or 2 in the case of Iris virginica.

In [18]:
iris = datasets.load_iris()

X = iris.data    # 4 features
y = iris.target  # 0, 1, or 2

print("Feature names: ", iris.feature_names)
print("Label names: ", iris.target_names)

x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size=0.25)

Feature names:  ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Label names:  ['setosa' 'versicolor' 'virginica']


In [19]:
model = svm.SVC(kernel="rbf", C=2)
model.fit(x_train, y_train)

y_pred = model.predict(x_test)

acc = metrics.accuracy_score(y_test, y_pred)

print(acc)

1.0
