## Support Vector Machines - Part 2

#### Table of Contents

* [Preliminaries](#Preliminaries)
* [Null Model](#Null-Model)
* [Polynomial Kernel](#Polynomial-Kernel)
* [Gaussian Radial Basis Function Kernel](#Gaussian-Radial-Basis-Function-Kernel)
* [Comparison](#Comparison)

Takeaways from this script:

1. complex kernels are computationally difficult to omptimize
2. a linear kernel is a polynomial kernel

****
# Preliminaries
[TOP](#Support-Vector-Machines---Part-2)

Unlike the SVM1 lecture, we will be using the `SVC()` for the more flexible kernels.
We will be predicting the multi-class `urate_bin`.

In [None]:
# utilities
import numpy as np
import pandas as pd

# processing
from sklearn.metrics import plot_confusion_matrix
from sklearn.model_selection import GridSearchCV, train_test_split

# algorithms
from sklearn.svm import SVC, LinearSVC

In [None]:
df = pd.read_csv('C:/Users/johnj/Documents/Data/aml in econ 02 spring 2021/class data/class_data.csv')
df.head(1)
df.set_index(['fips', 'GeoName'])
df.head(1)
df.set_index(['fips', 'GeoName'], inplace = True)
df.head(1)

In [None]:
df = pd.read_csv('C:/Users/johnj/Documents/Data/aml in econ 02 spring 2021/class data/class_data.csv',
           index_col = ['fips', 'GeoName'])

In [None]:
df.shape

In [None]:
df.drop(columns = ['year']).join([
    pd.get_dummies(df['year'], drop_first = True)
]).shape
# WTF!?

In [None]:
left  = df.drop(columns = ['year'])
right = pd.get_dummies(df.year, drop_first = True)

In [None]:
pd.merge(left, right, left_index = True, right_index = True).shape
# Whaaa?!?!

In [None]:
left.index.equals(right.index)
# WT double F?!?!!?!

In [None]:
pd.concat([left, right], axis = 1).shape
# FINE!!!!!!!!!!

In [None]:
df_prepped = pd.concat([left, right], axis = 1)

In [None]:
y = df_prepped['urate_bin']
x = df_prepped.drop(columns = 'urate_bin')

x_train, x_test, y_train, y_test = train_test_split(x, y,
                                                   train_size = 1e-2,
                                                   random_state = 490)

x_train = x_train.apply(lambda x: (x - np.mean(x))/np.std(x))
x_test  = x_test.apply(lambda x: (x - np.mean(x))/np.std(x))

**********
# Null Model 
[TOP](#Support-Vector-Machines---Part-1)

In [None]:
yhat_null = y_train.value_counts().index[0]
acc_null = np.mean(yhat_null == y_test)
acc_null

*************
# Polynomial Kernel 
[TOP](#Support-Vector-Machines---Part-2)

Recall that the polynomial kernel adds two additional hyperparameters:

- `d` - the degree of the polynomial kernel
- `coef0` - the progressive weight on larger terms

Here we are going to perform a grid search to identify the best value of these hyperparameters

In [None]:
np.linspace(1, 4, num = 4)

In [None]:
np.arange(1, 4, step = 1)

In [None]:
%%time
param_grid = {
    'C': 10.0**np.arange(-1, 4, step = 1),
    'degree': [1, 2],
    'coef0': 10.0**np.arange(-2, 0, step = 1)
}

svmc = SVC(kernel = 'poly')

grid_search = GridSearchCV(svmc, param_grid,
                          cv = 5,
                          scoring = 'accuracy')

grid_search.fit(x_train, y_train)
best = grid_search.best_params_
best

How many models did we fit?

In [None]:
len(param_grid['C'])*len(param_grid['degree'])*len(param_grid['coef0'])*5

# `coef0` is a corner solution. Why is this okay?

In [None]:
%%time
param_grid = {
    'C': 10.0**np.linspace(-5, 2, num = 20)
}

svc_cv = LinearSVC(dual = False)

grid_search = GridSearchCV(svc_cv, param_grid,
                          cv = 5,
                          scoring = 'accuracy')

grid_search.fit(x_train, y_train)
best_poly = grid_search.best_params_
best_poly

In [None]:
svmc_poly = LinearSVC(C = best_poly['C'],
                        dual = False).fit(x_train, y_train)
svmc_poly.score(x_test, y_test)

********
# Gaussian RBF Kernel
[TOP](#Support-Vector-Machines---Part-2)

Find the optimal values of `C` and `gamma` to the nearest magnitude ($10^n$)

In [None]:
%%time
param_grid = {
    'C': 10.0**np.arange(-1, 4, step = 1),
    'gamma': 10.0**np.arange(-7 , -3, step = 1)
}

svmc = SVC(kernel = 'rbf')

grid_search = GridSearchCV(svmc, param_grid,
                          cv = 5,
                          scoring = 'accuracy')

grid_search.fit(x_train, y_train)
best = grid_search.best_params_
best

How many models did we fit?

In [None]:
len(param_grid['C'])*len(param_grid['gamma'])*5

Refit the model on the full training data

In [None]:
svmc_rbf = SVC(kernel = 'rbf', C = best['C'], gamma = best['gamma'])
svmc_rbf.fit(x_train, y_train)

Print the model accuracy. 

In [None]:
svmc_rbf.score(x_test, y_test)

Is it better than the polynomial kernel?

Describe the tuned models flexibility and permitted margin violations

****
# Comparison
[TOP](#Support-Vector-Machines---Part-2)

Let's plot things this time.

In [None]:
plot_confusion_matrix(svmc_poly, x_test, y_test)

In [None]:
plot_confusion_matrix(svmc_rbf, x_test, y_test)