Introductions to the Support Vector Machine (SVM)

Adapted from University of Tokyo

In [1]:
import pandas as pd
import numpy as np
from sklearn.svm import SVC
from sklearn.metrics import classification_report
import matplotlib.pyplot as plt
from AuxFunctions import plot_confusion_matrix

## What is an SVM?

Its a supervised learning model that seeks to divide classes of data. In other words, if I want to build an SVM to classify cats and dogs, then it will need to figure out a way to draw a line that divides cat and dog datapoints

In more general terms, we can say that an SVM seeks to draw a hyperplane in an N-dimensional space that classifies data. What does this exactly mean?

For a 2-D space, the "hyperplane" will just be a line, as shown below:

<img src="https://pimages.toolbox.com/wp-content/uploads/2022/09/02134804/Diagram-depicting-SVM-example-with-hyperplane-for-classification-problem.jpg" width="400px">

For a 3-D space, the "hyperplane" is an actual plane, as shown below:

<img src="https://miro.medium.com/max/784/0*QDy2DTKEtPvoP_n_.png" width="400px">

However, what happens when problems get a bit more messy? Perhaps we can apply some sort of transformation? Let's first look at a simple transform.

What about going into higher dimensions?

Let's look at this problem child:

<img src="https://miro.medium.com/v2/resize:fit:720/format:webp/1*Wp8tGecatxHqUgHNaVQddg.png" width = 400px>

We can go from a 2-D space to a 3-D space

<img src="https://miro.medium.com/max/720/1*XhXJldwvZ9IpGNts41Mefw.gif" width = 400px>

So we had a problem: non-linearly separable data. To solve the problem, we moved into a higher dimensional space where we could then come up with a better hyperplane.

However, using simple functions can be costly, especially when we move into higher dimensional spaces. What if, for instance, we seek to move into an infinite dimensional space? The function mapping becomes a bit too difficult to work with. We need to find some sort of computational simplification.

## Kernels



We will not be looking at the precise math behind the SVM, so this next section may be a bit hand-wavy. A bit of trust is required.

Due to how SVMs are constructed, we do not actually need to know exactly how a function transforms a data point. In other we words for a given set of data with class x and class y, we don't need to know how $x \rightarrow f(x)$ or how $y \rightarrow f(y)$.

The only thing we really care about is how $f(x)$ and $f(y)$ compare. In other words, we only really want to know the result of the inner product of $f(x)$ and $f(y)$. Thus, we will designate some kernel function:

$k(x, y) = f(x) \cdot f(y)$

Let us see why this can simplify things. We'll examine a commonly used function, the polynomial function. For some set of x and y, we will have the polynomial function

$f(x, y) = x + y + xy+ x^2 + y^2$

However, the polynomial kernel is simply:
$k(x, y) = (1 + x \cdot y)^2$

This point can be illustrated with another common kernel, the Radial Basis Function (RBF)

$f(x, y) =$ some infinite dimensional mapping

$k(x, y) = e^{-\gamma||x - y||^2}$