In [1]:
from IPython.display import HTML
css_file = './custom.css'
HTML(open(css_file, "r").read())

# Support Vector Machine (SVM)

© 2018 Daniel Voigt Godoy

## 1. Definition

Support Vector Machine is a ***large margin classifier***. It means it tries to determine a ***decision boundary*** such that the points closer to it, belonging to opposite classes, are separate by the ***largest margin*** possible.

Effectively, it determines a ***line*** (for 1D data), a ***plane*** (for 2D data) or a ***hyper-plane*** (for 3+D data), such that its interval ***[-1, 1]*** is ***clear of data points***.

For 1D data:
$$
y' = b + w_1x_1
$$

Therefore:
- points in the ***negative class*** should fall ***below -1*** in the line (or plane/hyperplane)
- points in the ***positive class*** should fall ***above 1*** in the line (or plane/hyperplane)

For mathematical convenience, instead of using the traditional $y = 0$ for a ***negative class***, we use $y = -1$:

$$
\hat{y} = 
\begin{cases} -1 &\mbox{if } \boldsymbol{w^Tx} \leq -1 \\
1 & \mbox{if } \boldsymbol{w^Tx} \geq 1
\end{cases}
$$

### 1.1 Dot Product

Another way of represnting $y'$ is by using the ***dot product***:

$$
\boldsymbol{w^Tx} = p ||\boldsymbol{w}||^2
$$

In a nutshell, $p$ is the projection of a given sample $x$ onto the $\ell_2$ ***norm*** of the ***weights***.

### 1.2 Hinge Loss

You can think of the [-1, 1] interval as a ***demilitarized zone*** where ***no point should be***.
So, if a given point falls above (or below) its border, it will have a ***loss*** corresponding to ***how far "behind enemy lines"*** it is.

This is called a ***hinge loss***:

$$
loss = max(0, 1 - y * y')
$$

![](https://developers.google.com/machine-learning/glossary/images/hinge-loss.svg)
<center>Source: Google's Machine Learning Glossary</center>

### 1.3 Minimizing the Cost Function

For 1D, the ***largest margin*** corresponds to the ***smallest slope*** of the line (hence, the ***smallest*** $w_1$). 

In a multidimensional problem, the ***largest margin*** corresponds to the ***smallest*** (squared) $\ell_2$ ***norm***.

$$
J(\boldsymbol{w}) = \frac{1}{2}||\boldsymbol{w}||^2
$$

But we want to minimize it ***while keeping an empty margin***, so the restrictions are:

$$
subject \ to 
\begin{cases} \boldsymbol{w^Tx^{(i)}} \leq -1 &\mbox{if } y = -1 \\
\boldsymbol{w^Tx^{(i)}} \geq 1 & \mbox{if } y = 1
\end{cases}
$$

This is actually a ***different cost function***. It has a ***regularization term*** only.

The ***hinge loss*** is part of the ***constraint*** alone!


### 1.4 Soft Margin

Sometimes it is not possible to have a ***hard margin***, that is, a ***truly empty margin***. We can relax this assumption by cutting some ***slack*** to misclassified points - their ***hinge losses*** will be multiplied by a factor ***C*** and ***added to the cost***.

So, the cost function that allows ***soft margins*** is:

$$
J(\boldsymbol{w}) = \frac{1}{2}||\boldsymbol{w}||^2 + C \sum_i^m{max(0, 1 - y^{(i)}*y'^{(i)})}
$$

You can think of ***C*** as the ***hardness*** of your margin. Larger values of C correspond to a harder margin.

What if C is ***zero***? Isn't it the same as the original ***hard margin*** cost function? 

Yes, BUT the ***hardness*** was actually defined by the ***constraints*** in the original formulation.

## 2. Experiment

Time to try it yourself!

There are 8 points, 4 ***green*** (positive) and 4 ***red*** (negative) points.

There is only ***one*** feature $x$. So, the ***boundary*** is going to be the ***line*** shown on top of the data.

The red and green ***dashed vertical lines*** define the ***margin***, that is, when the ***boundary*** has a value of -1 or 1.

Points ***between the dashed lines*** are ***violating the margin***.

The upper right plot shows the ***slack or loss*** for each one of the data points.

The lower right plot shows the ***costs*** due to both terms: weights and losses.

The controls below allow you to:
- change ***bias*** and ***weights*** of the SVM line
- use ***soft margin*** for cost computation
- adjust the ***hardness*** of the margin ***C***

Use the controls to play with different configurations and answer the ***questions*** below.

In [2]:
from intuitiveml.supervised.classification.SVM import *

In [3]:
x, y = data()
mysvm = plotSVM(x, y)
vb1 = VBox(build_figure(mysvm), layout={'align_items': 'center'})


plotly.tools.make_subplots is deprecated, please use plotly.subplots.make_subplots instead



In [4]:
vb1

VBox(children=(FigureWidget({
    'data': [{'marker': {'color': 'green', 'line': {'color': 'black', 'width': 2…

#### Questions

1. Initially, there are ***two points*** violating the margin. Try different values for $b$ and $w_1$ to get a ***clear margin*** (no losses).


2. While clearing the margin, did you pay attention to the ***cost***? Make sure that your solution has the ***minimum cost*** possible.


3. What happens to the ***margin width*** as you ***increase*** $w_1$?


4. Set $b$ back to 0 and $w_1$ back to 1. Check ***Use Soft Margin***. What happened to your cost?


5. Try different values for $b$ and $w_1$ to ***minimize the cost***. This time, you ***may have points inside the margin***. Which value for $w_1$ did you get? Is it bigger or smaller than before? Why? Is the margin ***wider*** or ***narrower***?


6. Change ***C*** to 0.1 and repeat the process. How did $w_1$ and the margin change? What does it mean to the ***hardness*** of the margin?


7. Change ***C*** to 100.0 and repeat the process. How did $w_1$ and the margin change? What does it mean to the ***hardness*** of the margin?

## 3. Kernel Trick

What if the ***two right-most*** points were ***red***? 

How can you possible draw a ***line*** to separate the colors? You can achieve that by using the ***kernel trick***!

The idea is to ***increase dimensionality of the data*** (really!) to be able to achieve a linear separation. in ***higher dimensions***, data is more ***sparse*** and it is easier to find a ***hyper-plane*** that separates data ***linearly***.

In our modified example, we can map every 1D point $x$ to a 2D point ($x, x^2)$. This is a ***polynomial kernel***.

Going from 1D to 2D allows for a ***linear separation***, as you can see in the plot below:

In [5]:
x = np.array([-2.8, -2.2, -1.8, -1.3, -.4, 0.7, 1.1, 1.3, 1.9, 2.5])
x2 = x ** 2
x = np.concatenate([x.reshape(-1, 1), x2.reshape(-1, 1)], axis=1)
y = np.array([0., 0., 0., 0., 0., 1., 1., 1., 0., 0.])

mysvm3 = plotSVM(x=x, y=y)
mysvm3.fit(is_soft=False)
vb3 = VBox(build_3dfigure(mysvm3), layout={'align_items': 'center'})

In [6]:
vb3

VBox(children=(FigureWidget({
    'data': [{'marker': {'color': 'green', 'size': 5},
              'mode': 'ma…

### 3.1 Kernels

These are some popular kernels commonly used with SVMs:

1. ***Linear***: the same as ***no*** kernel
2. ***Polynomial***: increases dimensions by generating polynomial features
3. ***Sigmoid***: uses a sigmoid function as kernel
4. ***Gaussian RBF***: it works as a ***similarity*** measure or ***distance***, it theoretically maps the points to an ***infinite-dimensional*** space!

The last kernel actually raises a question: ***how to map into an infinite-dimensional*** space? It turns out, you ***don't***! 

Thanks to the mathematical properties of the ***dot product***, it is possible to compute the projection $p$ without performing the actual mapping. But this is ***beyond our scope*** now.

## 4. Scikit-Learn

[SVM](https://scikit-learn.org/stable/modules/svm.html)

Please check Aurelién Geron's "Hand-On Machine Learning with Scikit-Learn and Tensorflow" notebook on Support Vector Machines [here](https://github.com/ageron/handson-ml/blob/master/05_support_vector_machines.ipynb).

You can also find a 3D plot of one os his examples, using the Iris Dataset, below:

```python
from sklearn import datasets
iris = datasets.load_iris()

y = iris['target']
x = iris['data'][y != 2][:, (2, 3)]
y = y[y != 2]

# C = infinity -> Hard Margin
svc = LinearSVC(fit_intercept=True, C=np.inf, loss='hinge')
```

In [7]:
from sklearn import datasets
iris = datasets.load_iris()

y = iris['target']
x = iris['data'][y != 2][:, (2, 3)]
y = y[y != 2]

In [8]:
mysvm2 = plotSVM(x=x, y=y)
mysvm2.fit(is_soft=False)
vb2 = VBox(build_3dfigure(mysvm2), layout={'align_items': 'center'})

vb2

VBox(children=(FigureWidget({
    'data': [{'marker': {'color': 'green', 'size': 5},
              'mode': 'ma…

## 5. More Resources

[InfoGraphic](https://github.com/Avik-Jain/100-Days-Of-ML-Code/blob/master/Info-graphs/Day%2012.jpg)

[SVM Explorer](https://github.com/plotly/dash-svm)

[Support Vector Machines Tutorial](https://blog.statsbot.co/support-vector-machines-tutorial-c1618e635e93)

#### This material is copyright Daniel Voigt Godoy and made available under the Creative Commons Attribution (CC-BY) license ([link](https://creativecommons.org/licenses/by/4.0/)). 

#### Code is also made available under the MIT License ([link](https://opensource.org/licenses/MIT)).

In [9]:
from IPython.display import HTML
HTML('''<script>
  function code_toggle() {
    if (code_shown){
      $('div.input').hide('500');
      $('#toggleButton').val('Show Code')
    } else {
      $('div.input').show('500');
      $('#toggleButton').val('Hide Code')
    }
    code_shown = !code_shown
  }

  $( document ).ready(function(){
    code_shown=false;
    $('div.input').hide()
  });
</script>
<form action="javascript:code_toggle()"><input type="submit" id="toggleButton" value="Show Code"></form>''')