# Support Vector Machines (SVM)

**SVM Definition (rough) ** - it finds data of two different classes and a support vectory machine 
outputs a line which separates the two classes. 

### Example 1:  
<img src="svm_images/example_1.png" alt = "SVM example" style ="width: 500px;"/>


### Margin 
* Maximize the distances of a line $M$ to nearest point $(a,b) \in \mathbb{R}$, such that, $|(a,b) - M|$. 
  This is refered to as the **Margin**. 
  
### Example 2: 
<img src="svm_images/example_2.png" alot = "Margin" style = "width: 500px;"/> 
Essentially we want to maximazie the robustness of your result. Hence, by chosing the maximize distance of a line 
$M$ we get a better result with less noise. 


## SVMs - Outliers 

### Example 3:
Sometimes you might have a data set below where nothing will seperate the two classes. Essentially, 
we just want to do the best you can of the outlier point. Tolerate the individual outliers. This mediates the seperator and ignore the outliers in (SVM). 
<img src="svm_images/example_3.png" alt = "Outliers" style ="width: 250px;"/>

In [3]:
# coding with sklearn 
# sample code 
from sklearn import svm #import svm

X = [[0,0], [1,1]] #training features
Y = [0,1] #training labels 
clf = svm.SVC() #classifier 
print {"training features": clf.fit(X,Y)} #fit using training features

"""After being fitted, the model can then be used to predict new values."""
print {"your prediction is":clf.predict([[2.,2.]])} #prediction

{'training features': SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)}
{'your prediction is': array([1])}


In [4]:
#Quiz 
import sys 
from time import time
sys.path.append("../tools/")
from email_preprocess import preprocess

### features_train and features_test are the features for the training
### and testing datasets, respectively
### labels_train and labels_test are the corresponding item labels
features_train, features_test, labels_train, labels_test = preprocess()

########################## SVM #################################
### we handle the import statement and SVC creation for you here
from sklearn.svm import SVC
clf = SVC(kernel="linear")
clf.fit(features_train, labels_train) # create a fit. 

pred = clf.predict(features_test) # prediction from test. 

from sklearn.metrics import accuracy_score #import accuracy.
acc = accuracy_score(pred, labels_test) #create accuracy

def submitAccuracy():
    return acc

print submitAccuracy() #should return 98% accuracy! 

no. of Chris training emails: 7936
no. of Sara training emails: 7884
0.984072810011


# Non-Linear SVM 

### Features 
so far we have assumed our features $(x,y)$ to have linear representation. We placed our 
features into a training $SVM$ and we output some label. For non-linear represenation we 
will be following a more eleptical space like $x^2 + y^2 = z$. $z$ is the hyperplane representation of the point. 
Within this plane we may find linear represenations.

### Example 1: 
<img src="svm_images/example_4.png" alt = "Linear Seperation" style ="width: 500px;"/> 

all the values of the "circle = $c$" and "exes = $\hat{x}$" take on some distance in the hyperplane where 
all the $\hat{x}$ take on a small value of $f(z)$ and all the $c$ take on a large value or $\hat{x} < c \in f(z)$. This is shown below 

### Example 2: 

<img src="svm_images/example_5.png" alt = "z-plane" style ="width: 300px;"/> 

Therefore, we have linear seperation in the hyperplane. 

### Kernel Trick 

Essentially we take two points $(x,y) \in \mathbb{R}$ and map them into $(x,y) \in \mathbb{R}^d$ were 
$d$ is the multi dimensional space. $(x,y)$ are not seperable inside $\mathbb{R}$ but are separable 
in the multi dimensional space $\mathbb{R}^d$.

# Parameters in ML

Definition: Parameters are arguments passed when you create your classifier. 
* Before fitting

 
These can make a **DIFFERENCE!!** in the decision boundary. 

## What are parameters for SVM?

* Kernel 
* C 
* Gamma 
<img src="svm_images/example_6.png" alt = "parameters" style ="width: 500px;"/> 

## SVM C Parameter

**C** - controls tradeoff between smooth decisions boundary and classifying training points correctly. 

### Quiz: 
Does a large C mean you expect a smooth boundary, or that you will get more training points correct? 
**ANSWER:** A low ``c`` makes the decision surface smooth, while a high ``c`` aims at classifying all training examples by giving the model freedom to select more samples as support vectors. So the answer is **more training points correct**

### SVM Gamma Parameter 

**$\gamma$** - defines how far teh influence of a single training examples reaches
* low values - far reach 
* high values - close reach 

<img src="svm_images/example_7.png" alt = "gamma" style ="width: 500px;"/> 


### Quiz: Overfitting

We really want to avoid overfitting. This is through the paramter of your algorithm. 
any paramteres make you more or less of overfitting your data. 

In [5]:
help(SVC())

Help on SVC in module sklearn.svm.classes object:

class SVC(sklearn.svm.base.BaseSVC)
 |  C-Support Vector Classification.
 |  
 |  The implementation is based on libsvm. The fit time complexity
 |  is more than quadratic with the number of samples which makes it hard
 |  to scale to dataset with more than a couple of 10000 samples.
 |  
 |  The multiclass support is handled according to a one-vs-one scheme.
 |  
 |  For details on the precise mathematical formulation of the provided
 |  kernel functions and how `gamma`, `coef0` and `degree` affect each
 |  other, see the corresponding section in the narrative documentation:
 |  :ref:`svm_kernels`.
 |  
 |  Read more in the :ref:`User Guide <svm_classification>`.
 |  
 |  Parameters
 |  ----------
 |  C : float, optional (default=1.0)
 |      Penalty parameter C of the error term.
 |  
 |  kernel : string, optional (default='rbf')
 |       Specifies the kernel type to be used in the algorithm.
 |       It must be one of 'linear', 'pol