# Support Vector Machines

![2017-09-15-16-23-onionesquereality.wordpress.com.png](attachment:2017-09-15-16-23-onionesquereality.wordpress.com.png)

We have a hyperplane that seperates the classes. We would like to choose a hyperplane that maximizes the margin between the classes. The margin lines that the vector points touch are known as the support vectors. This is where the name support vector machine comes from. 

We can expand the idea of an SVM thorugh use of the kernel trick for non-linearly seperable data. 

An SVM seeks a margin that seperates all positive and negative examples. However this can lead to poortly fit models if any examples are mislabeled or extremely unusual. To account for this they introduced a soft margin that allows some examples to be ignored or placed on the wrong side of the margin. 

**C** is the parameter for the soft margin cost function which controls the influence of each individual support vector; this process involves trading error penality for stability. 

**C** controls the cost of missclassification on the model, a large C value gives you a low variance and high bias because you penalize the cost of missclassification a lot, a smaller c gives you more variance and less bias. 

![2017-09-17-23-28-www.quora.com.png](attachment:2017-09-17-23-28-www.quora.com.png)

A typical SVM computes relationships with the dot product but in 1992, Boser and Guyan replaced the dot product with a nonlinear kernal function such as RBF. **Gamma** is the free parameter of the Gaussian radial basis function. 

**Gamma** is the free paramter in the RBF, a high gamma value is goin to lead a high bias low variance model. If the gamma is large, the variance is small its saying the support vector does not have a wide spread influence.

If gamma is large, then variance is small implying the support vector does not have wide-spread influence. Technically speaking, large gamma leads to high bias and low variance models, and vice-versa.

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

In [5]:
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
cancer.keys()
#print(cancer['DESCR'])



dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names'])

In [8]:
df_feat =pd.DataFrame(cancer['data'], columns=cancer['feature_names'])

In [10]:
from sklearn.cross_validation import train_test_split

In [18]:
X = df_feat
y= cancer['target']

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [22]:
from sklearn.svm import SVC
model=SVC()
model.fit(x_train, y_train)
predictions=model.predict(x_test)

from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, predictions))
print('\n')
print(classification_report(y_test, predictions))

#after looking at the output we will get a warning because we need to 
#calibrate out parameters, we can do this with what is called a grid 
#search where we try out all of the parameters

from sklearn.grid_search import GridSearchCV
#grid seach cv takes a dictionary of values and parameters and computes 
#the hyperparameters from those ranges

param_grid= {'C': [0.1, 1, 10, 100, 1000], 'gamma': [1, 0.1, 0.01, 0.001, 0.0001]}

grid = GridSearchCV(SVC(), param_grid, verbose=3)
# verbose is default set to 0 and should be set to someting
# so that you know your model is doing something, as grid search
# can take a long time. 

grid.fit(x_train, y_train)






[[  0  63]
 [  0 108]]


             precision    recall  f1-score   support

          0       0.00      0.00      0.00        63
          1       0.63      1.00      0.77       108

avg / total       0.40      0.63      0.49       171

Fitting 3 folds for each of 25 candidates, totalling 75 fits
[CV] C=0.1, gamma=1 ..................................................
[CV] ......................... C=0.1, gamma=1, score=0.624060 -   0.0s
[CV] C=0.1, gamma=1 ..................................................
[CV] ......................... C=0.1, gamma=1, score=0.624060 -   0.0s
[CV] C=0.1, gamma=1 ..................................................
[CV] ......................... C=0.1, gamma=1, score=0.628788 -   0.0s
[CV] C=0.1, gamma=0.1 ................................................
[CV] ....................... C=0.1, gamma=0.1, score=0.624060 -   0.0s
[CV] C=0.1, gamma=0.1 ................................................
[CV] ....................... C=0.1, gamma=0.1, score=0.62406

  'precision', 'predicted', average, warn_for)
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s


[CV] ..................... C=0.1, gamma=0.001, score=0.624060 -   0.0s
[CV] C=0.1, gamma=0.001 ..............................................
[CV] ..................... C=0.1, gamma=0.001, score=0.628788 -   0.0s
[CV] C=0.1, gamma=0.0001 .............................................
[CV] .................... C=0.1, gamma=0.0001, score=0.902256 -   0.0s
[CV] C=0.1, gamma=0.0001 .............................................
[CV] .................... C=0.1, gamma=0.0001, score=0.909774 -   0.0s
[CV] C=0.1, gamma=0.0001 .............................................
[CV] .................... C=0.1, gamma=0.0001, score=0.901515 -   0.0s
[CV] C=1, gamma=1 ....................................................
[CV] ........................... C=1, gamma=1, score=0.624060 -   0.0s
[CV] C=1, gamma=1 ....................................................
[CV] ........................... C=1, gamma=1, score=0.624060 -   0.0s
[CV] C=1, gamma=1 ....................................................
[CV] .

[CV] C=1000, gamma=0.001 .............................................
[CV] .................... C=1000, gamma=0.001, score=0.909774 -   0.0s
[CV] C=1000, gamma=0.001 .............................................
[CV] .................... C=1000, gamma=0.001, score=0.886364 -   0.0s
[CV] C=1000, gamma=0.0001 ............................................
[CV] ................... C=1000, gamma=0.0001, score=0.954887 -   0.0s
[CV] C=1000, gamma=0.0001 ............................................
[CV] ................... C=1000, gamma=0.0001, score=0.909774 -   0.0s
[CV] C=1000, gamma=0.0001 ............................................
[CV] ................... C=1000, gamma=0.0001, score=0.924242 -   0.0s


[Parallel(n_jobs=1)]: Done  75 out of  75 | elapsed:    1.3s finished


GridSearchCV(cv=None, error_score='raise',
       estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False),
       fit_params={}, iid=True, n_jobs=1,
       param_grid={'C': [0.1, 1, 10, 100, 1000], 'gamma': [1, 0.1, 0.01, 0.001, 0.0001]},
       pre_dispatch='2*n_jobs', refit=True, scoring=None, verbose=3)

In [23]:
print(grid.best_params_)
print(grid.best_estimator_)


{'C': 10, 'gamma': 0.0001}
SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma=0.0001, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)


In [26]:
grid_prediction = grid.predict(x_test)
print(confusion_matrix(y_test, grid_prediction))
print('\n')
print(classification_report(y_test, grid_prediction))

[[ 59   4]
 [  2 106]]


             precision    recall  f1-score   support

          0       0.97      0.94      0.95        63
          1       0.96      0.98      0.97       108

avg / total       0.96      0.96      0.96       171

