# Lesson 6.05 Support Vector Machines

## What are Support Vector Machines (SVMs)?

Classifier that finds an optimal hyperplane that maximises margin between 2 classes.

* SVMs are fantastic models if all you care about is predictive ability
* They are complete and total black boxes i.e. siginificance of predictors is unknown
* You must **scale your data** since SVM tries to maximize the distance between the separating plane and the support vectors. If one feature (i.e. one dimension in this space) has very large values, it will dominate the other features when calculating the distance. If you rescale all features, they all have the same influence on the distance metric.
* SVMs with polynomial kernel degree = 2 has been shown to work really well for NLP data!


### Pros
- Exceptional perfomance (historically widely used)
- Effective in high-dimensional data
- Can work with non-linear boundaries
- Fast to compute with most datasets (kernel trick)

### Cons
- Black box method i.e. siginificance of predictors is unknown
- Can be slow on large datasets i.e. massive number of rows


### Import Library

Import [Support Vector Machines](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html) from `sklearn` and explore the hyperparameters.

In [1]:
import joblib
import pandas as pd
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
import warnings

# if you are keen to remove the warnings in the output
warnings.filterwarnings('ignore', category=FutureWarning)
warnings.filterwarnings('ignore', category=DeprecationWarning)

## Fit and evaluate a model

We will be using the Titanic dataset from [this](https://www.kaggle.com/c/titanic/overview) Kaggle competition.

In this section, we will fit and evaluate a simple Support Vector Machines model.

### Read in Data

In [2]:
tr_features = pd.read_csv('data/train_features.csv')
tr_labels = pd.read_csv('data/train_labels.csv', header=None)

In [3]:
tr_features

Unnamed: 0,Pclass,Sex,Age,Fare,Family_cnt,Cabin_ind
0,2,0,62.000000,10.5000,0,0
1,3,0,8.000000,29.1250,5,0
2,3,0,32.000000,56.4958,0,0
3,3,1,20.000000,9.8250,1,0
4,2,1,28.000000,13.0000,0,0
...,...,...,...,...,...,...
529,3,1,21.000000,7.6500,0,0
530,1,0,29.699118,31.0000,0,0
531,3,0,41.000000,14.1083,2,0
532,1,1,14.000000,120.0000,3,1


In [4]:
tr_labels

Unnamed: 0,0
0,1
1,0
2,1
3,0
4,1
...,...
529,1
530,0
531,0
532,1


In [5]:
svc_pipeline = Pipeline([
    ('scaler', StandardScaler()),  # Step 1: Scale features
    ('svc', SVC())                 # Step 2: Classifier
])

## Hyperparameter Tuning

### Hyperparameter tuning

![c](img/c.png)

In [6]:
# Display optimal param values after hyperparamter tuning using GridSearchCV
def print_results(results):
    print(f'BEST PARAMS: {results.best_params_} BEST SCORE: {results.best_score_:0.3f}')
    
    # mean accuracy of classification
    means = results.cv_results_['mean_test_score']
    
    # std deviation of classification accuracy
    stds = results.cv_results_['std_test_score']
    
    for mean, std, params in zip(means, stds, results.cv_results_['params']):
        print('{} (+/-{}) for {}'.format(round(mean, 3), round(std * 2, 3), params))

In [7]:
parameters = {
    'svc__C': [0.1, 1, 10],
    'svc__kernel': ["linear", "poly", "rbf"],
}

cv = GridSearchCV(svc_pipeline, parameters, cv=5)
cv.fit(tr_features, tr_labels.values.ravel())

print_results(cv)

BEST PARAMS: {'svc__C': 1, 'svc__kernel': 'rbf'} BEST SCORE: 0.830
0.796 (+/-0.115) for {'svc__C': 0.1, 'svc__kernel': 'linear'}
0.719 (+/-0.049) for {'svc__C': 0.1, 'svc__kernel': 'poly'}
0.803 (+/-0.099) for {'svc__C': 0.1, 'svc__kernel': 'rbf'}
0.796 (+/-0.115) for {'svc__C': 1, 'svc__kernel': 'linear'}
0.822 (+/-0.11) for {'svc__C': 1, 'svc__kernel': 'poly'}
0.83 (+/-0.098) for {'svc__C': 1, 'svc__kernel': 'rbf'}
0.794 (+/-0.11) for {'svc__C': 10, 'svc__kernel': 'linear'}
0.809 (+/-0.099) for {'svc__C': 10, 'svc__kernel': 'poly'}
0.824 (+/-0.071) for {'svc__C': 10, 'svc__kernel': 'rbf'}


### Save model to external file
Save your optimal model settings to a .pkl file so that it can be used to facilitate evaluation across other models, Jupyter Notebooks and stakeholders. 

Might be useful for projects when each member focuses on a separate set of models.

In [8]:
# Save model to file
joblib.dump(cv.best_estimator_, 'data/SVM_model.pkl')

['data/SVM_model.pkl']

In [9]:
# Load model from file

loaded_model = joblib.load('data/SVM_model.pkl')
loaded_model.predict(tr_features)

array([0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1,
       0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1,
       0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,
       0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1,
       0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0,
       0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0,
       1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0,
       0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0,
       1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1,
       1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1,

### Comparison with Gradient Boosting Classifier

In [10]:
from sklearn.ensemble import GradientBoostingClassifier

gb_classifier = GradientBoostingClassifier()

parameters = {
    'n_estimators': [100, 200, 300],
    'learning_rate': [0.01, 0.1, 0.01]
}

cv = GridSearchCV(gb_classifier, parameters, cv=5)
cv.fit(tr_features, tr_labels.values.ravel())

print_results(cv)

BEST PARAMS: {'learning_rate': 0.01, 'n_estimators': 300} BEST SCORE: 0.841
0.815 (+/-0.069) for {'learning_rate': 0.01, 'n_estimators': 100}
0.828 (+/-0.076) for {'learning_rate': 0.01, 'n_estimators': 200}
0.841 (+/-0.082) for {'learning_rate': 0.01, 'n_estimators': 300}
0.837 (+/-0.036) for {'learning_rate': 0.1, 'n_estimators': 100}
0.828 (+/-0.034) for {'learning_rate': 0.1, 'n_estimators': 200}
0.824 (+/-0.035) for {'learning_rate': 0.1, 'n_estimators': 300}
0.815 (+/-0.069) for {'learning_rate': 0.01, 'n_estimators': 100}
0.828 (+/-0.076) for {'learning_rate': 0.01, 'n_estimators': 200}
0.841 (+/-0.082) for {'learning_rate': 0.01, 'n_estimators': 300}
