## Support Vector Machines

SVMs are supervised learning models with assoicated learning algorthim that analyze data and recognize patterns, used for classification and regression analysis.

The model that a SVM creates are mapped as point in space as categries that are clearly divided by gaps as wide as possible.

Then the map is used to classify new points and predict where these new points fall into. 


SVM's real use is it's ability to use __kernel trick__. The 'trick' is to increase the dimension of the data until we're able to create a distinct hyperplane that separates categories for modelling.



In [7]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline


from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()

We'll use a dataset builtin sklearn to classify if a dataset has a breast cancer or not.


In [10]:
df_feat = pd.DataFrame(cancer['data'], columns = cancer['feature_names'])

In [12]:
df_feat.head(2)

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902


## EDA 

We'll skip this part of the analysis and focus on creating a model through SVM.

In [16]:
from sklearn.model_selection import train_test_split


<function sklearn.model_selection._split.train_test_split(*arrays, **options)>

In [21]:
X = df_feat
y = cancer['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)

In [29]:
from sklearn.svm import SVC

model = SVC()
model.fit(X_train, y_train)



SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
    kernel='rbf', max_iter=-1, probability=False, random_state=None,
    shrinking=True, tol=0.001, verbose=False)

In [30]:
predictions = model.predict(X_test)

In [31]:
from sklearn.metrics import classification_report,confusion_matrix

In [32]:
print(classification_report(predictions, y_test))
print('\n')
print(confusion_matrix(predictions, y_test))

              precision    recall  f1-score   support

           0       0.00      0.00      0.00         0
           1       1.00      0.61      0.76       171

    accuracy                           0.61       171
   macro avg       0.50      0.31      0.38       171
weighted avg       1.00      0.61      0.76       171



[[  0   0]
 [ 66 105]]


  'recall', 'true', average, warn_for)


Our model gave us a warning, because it classified all data to one class. This means that a normal regression line is not able to create distinct boundaries for machine learning.

This is where we'll use __Grid-Search__ to train our model to loop through different values of __C__ and __gamma__ to get the best hyperplane possible.

 - __C__ - is the cost of misclassification.
     - A large C gives you low bias and high variance. Low bias because you penalize the cost of missclasification a lot
     - A small C gives you higher bias and lower variance
  
 - __gamma__ defines how far the influence of a single training example reaches.
     - low value gamma has a far reach and those points will affect the decision boundary
     - high value gamma as a low reach and those points will not affect the decision boundary
     

In [34]:
from sklearn.model_selection import GridSearchCV

In [35]:
param_grid = {'C':[.1,1,10,100,1000],'gamma':[1,0.1,0.01,0.001,0.0001]}

In [36]:
grid = GridSearchCV(SVC(),param_grid,verbose=3)

In [37]:
grid.fit(X_train,y_train)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s


Fitting 3 folds for each of 25 candidates, totalling 75 fits
[CV] C=0.1, gamma=1 ..................................................
[CV] ...................... C=0.1, gamma=1, score=0.632, total=   0.0s
[CV] C=0.1, gamma=1 ..................................................
[CV] ...................... C=0.1, gamma=1, score=0.632, total=   0.0s
[CV] C=0.1, gamma=1 ..................................................
[CV] ...................... C=0.1, gamma=1, score=0.636, total=   0.0s
[CV] C=0.1, gamma=0.1 ................................................
[CV] .................... C=0.1, gamma=0.1, score=0.632, total=   0.0s
[CV] C=0.1, gamma=0.1 ................................................
[CV] .................... C=0.1, gamma=0.1, score=0.632, total=   0.0s
[CV] C=0.1, gamma=0.1 ................................................
[CV] .................... C=0.1, gamma=0.1, score=0.636, total=   0.0s
[CV] C=0.1, gamma=0.01 ...............................................
[CV] ...........

[Parallel(n_jobs=1)]: Done  75 out of  75 | elapsed:    0.8s finished


GridSearchCV(cv='warn', error_score='raise-deprecating',
             estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
                           decision_function_shape='ovr', degree=3,
                           gamma='auto_deprecated', kernel='rbf', max_iter=-1,
                           probability=False, random_state=None, shrinking=True,
                           tol=0.001, verbose=False),
             iid='warn', n_jobs=None,
             param_grid={'C': [0.1, 1, 10, 100, 1000],
                         'gamma': [1, 0.1, 0.01, 0.001, 0.0001]},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring=None, verbose=3)

In [42]:
grid.best_params_


{'C': 10, 'gamma': 0.0001}

In [45]:
grid.best_estimator_

SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.0001, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

In [46]:
grid_predictions = grid.predict(X_test)

In [50]:
print(confusion_matrix(grid_predictions, y_test))
print('\n')
print(classification_report(grid_predictions, y_test))

[[ 60   3]
 [  6 102]]


              precision    recall  f1-score   support

           0       0.91      0.95      0.93        63
           1       0.97      0.94      0.96       108

    accuracy                           0.95       171
   macro avg       0.94      0.95      0.94       171
weighted avg       0.95      0.95      0.95       171



As seen above our model is performing much better with a recall value of 95% and precision of 95%. I'd say we did a pretty good job.

#### Credit: 
 - https://github.com/cmusatyalab/openface/issues/388
 - https://en.wikipedia.org/wiki/Support-vector_machine
 - https://scikit-learn.org/stable/modules/svm.html
 