<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Why-Are-Support-Vector-Machines-Cool?" data-toc-modified-id="Why-Are-Support-Vector-Machines-Cool?-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Why Are Support Vector Machines Cool?</a></span><ul class="toc-item"><li><span><a href="#Q:-When-would-it-be-&quot;better&quot;-to-have-a-defined-boundary-over-accuracy?" data-toc-modified-id="Q:-When-would-it-be-&quot;better&quot;-to-have-a-defined-boundary-over-accuracy?-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Q: When would it be "better" to have a defined boundary over accuracy?</a></span></li></ul></li><li><span><a href="#Motivation" data-toc-modified-id="Motivation-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Motivation</a></span><ul class="toc-item"><li><span><a href="#Q:-Look-at-these-lines,-which-is-a-better-model?" data-toc-modified-id="Q:-Look-at-these-lines,-which-is-a-better-model?-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Q: Look at these lines, which is a better model?</a></span></li><li><span><a href="#Accuracy-isn't-everything" data-toc-modified-id="Accuracy-isn't-everything-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Accuracy isn't everything</a></span><ul class="toc-item"><li><span><a href="#Q:-How-can-we-define-a-&quot;better&quot;-boundary-?" data-toc-modified-id="Q:-How-can-we-define-a-&quot;better&quot;-boundary-?-2.2.1"><span class="toc-item-num">2.2.1&nbsp;&nbsp;</span>Q: How can we define a "better" boundary ?</a></span></li></ul></li><li><span><a href="#Where-do-we-go-from-here?" data-toc-modified-id="Where-do-we-go-from-here?-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Where do we go from here?</a></span></li></ul></li><li><span><a href="#Recall-using-a-linear-model" data-toc-modified-id="Recall-using-a-linear-model-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Recall using a linear model</a></span><ul class="toc-item"><li><span><a href="#Classification-error" data-toc-modified-id="Classification-error-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Classification error</a></span></li><li><span><a href="#Margin-error" data-toc-modified-id="Margin-error-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Margin error</a></span></li><li><span><a href="#Gradient-Descent-to-minimize" data-toc-modified-id="Gradient-Descent-to-minimize-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Gradient Descent to minimize</a></span></li></ul></li><li><span><a href="#Hyperparameter-$C$" data-toc-modified-id="Hyperparameter-$C$-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Hyperparameter $C$</a></span><ul class="toc-item"><li><span><a href="#Q:-What-happens-if-$C$-is-very-large?-(What-errors-do-we-care-about-more?)" data-toc-modified-id="Q:-What-happens-if-$C$-is-very-large?-(What-errors-do-we-care-about-more?)-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Q: What happens if $C$ is very large? (What errors do we care about more?)</a></span></li></ul></li></ul></div>

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import sklearn.datasets as datasets

# Why Are Support Vector Machines Cool?

!["I support vector machines" t-shirt with vector machine boundary and margin depicted with data](images/i_support_vector_machines.jpg)

> Available now for [purchase](https://www.amazon.com/CafePress-Support-Vector-Machines-T-Shirt/dp/B072VLSLNY) 😉

Another **supervised learning** technique to classify

We can sacrifice accuracy to get _better_ boundaries

## Q: When would it be "better" to have a defined boundary over accuracy?

# Motivation

## Q: Look at these lines, which is a better model?



In [None]:
# Loading in an example dataset
iris = datasets.load_iris()
iris_data = iris.data

# Only use two targets/classifications
iris_targets = np.where(iris.target == 0, 0, 1)

# Plotting different points
plt.scatter(x=iris_data[:,2], y=iris_data[:,1], c=iris_targets)

# Plotting lines to separate points
l1 = np.array([[1,2],[6.5,4.5]])
plt.plot(l1[:,0], l1[:,1], linestyle='--')
l2 = np.array([[2,2],[3.5,4.5]])
plt.plot(l2[:,0], l2[:,1], linestyle='--')

## Accuracy isn't everything

Could say each line classifies the same (accuracy), but you know there's more to it.

**Boundaries** are also important (think about overfitting)

### Q: How can we define a "better" boundary ?

> Use distances from the line

We can define this as the **margin**

In [None]:
# Plotting different points
plt.scatter(x=iris_data[:,2], y=iris_data[:,1], c=iris_targets)

# Plotting lines to separate points
plt.plot(l2[:,0], l2[:,1], linestyle='-')

# Small margin
margin_small = np.array([0.2,0])
l2_margin_pos_small = l2 + margin_small
l2_margin_neg_small = l2 - margin_small

margin_larger = np.array([0.5,0])
l2_margin_pos_big = l2 + margin_larger
l2_margin_neg_big = l2 - margin_larger

# Plot with margins
plt.plot(l2_margin_pos_small[:,0], l2_margin_pos_small[:,1], linestyle='--', color='orange')
plt.plot(l2_margin_neg_small[:,0], l2_margin_neg_small[:,1], linestyle='--', color='orange')
plt.plot(l2_margin_pos_big[:,0], l2_margin_pos_big[:,1], linestyle='--', color='red')
plt.plot(l2_margin_neg_big[:,0], l2_margin_neg_big[:,1], linestyle='--', color='red')

plt.xlim(1,5)

## Where do we go from here?

We minimize the two kinds of error:
 - how many are "misclassified" 
 - how many are in bad boundary (within margin)
 

This gives us something like this:

$Error_{total} = Error_{classification} + Error_{margin}$
 

# Recall using a linear model

The errors closest to the line, get punished more! → increases error linear to parallel lines

+ $Wx + b = 0$
+ $Wx + b = 1$
+ $Wx + b = 2$

## Classification error

- We start from our margin to count the error (instead of the center)

## Margin error

- $E = |W|^2 = ||W_1||+||W_2|| + …$ 
    + big vs small margin (we want very large)
- $M = \frac{2}{||W||}$ 
    + inverse proportion, large margin → small error

Turns out to the same as the L2 Regularization!

## Gradient Descent to minimize

# Hyperparameter $C$ 

Gives us a way to decide on which line is better (even if classification is worse)

$Error_{total} = C \cdot Error_{classification} + Error_{margin}$

## Q: What happens if $C$ is very large? (What errors do we care about more?)

Big $C$ will give us smaller $C$ (we like small margins to avoid errors)