# Workshop 10: SVMs and Hyperparameter Tuning

In this lab, you'll be working with Support Vector Machines.

## 0) Imports

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
random_state = 42

In [None]:
from sklearn.model_selection import train_test_split

## 1) Loading the Data

First, load the data `svm_data_2020.csv`.

In [None]:
df = pd.read_csv('./data/svm_data_2020.csv')

In [None]:
df["Class"].value_counts()

## 2) Splitting the Data (Group)

Now, split the data into a training and test set. 75% of the data should be in the training set, and 25% should be in the testing set.

Report the number of positive and negative samples in both training and testing data.

In [None]:
# Split data

In [None]:
# Report positive and negative samples

## 3) Training the Model

Now, you will use sklearns [support vector classifier](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html) to fit a model to this data.

In [None]:
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report

### 3.1 Fitting the Model and Getting the Support Vectors (Group)

Fit the SVC to your split data (using the default hyperparams), and report back the number of support vectors. Use `clf.support_vectors_`, which returns a list of the actual support vectors.

In [None]:
# Create model and get the number of support vectors

### 3.2 C hyperparameter vs Support Vector Count (Group)

*C* is the regularization hyperparameter in SVMs, and in this problem you'll be looking at how changing *C* affects the number of support vectors.

Implement the function `plot_support_vectors` below, which will plot a line chart of the number of support vectors vs. the value of *C*.

**Before implementing the function, predict the answer the following questions**
1. As C increases, how will the number of support vectors change?
2. Why?

In [None]:
"""
Input:
    params: A list of floats, representing the value of C's to try
    
Output:
    None
    Print a line chart of the number of support vectors vs. C
    
Function:
    iterate through params
        create an SVC classifier for each c
        find the length of the support vectors and append to a list
    
    create a plot with c on the X axis and length of support vectors on Y
"""

def plot_support_vectors(params):
    return None

In [None]:
C = [0.1,0.2,0.3,0.5,1,2,3,5,10]
plot_support_vectors(C)

**Now that you have a plot, go back to the questions and explain more using the context of the data. If you were wrong, explain your misconception.**

## 4) Hyperparam Tuning

Compare  the  performance  of  four  different  kernel  functions:  linear (`linear`), polynomial (`poly`),  radial basis function (`rbf`), and `sigmoid`. Not only will you be changing the kernel function, you'll also be optimizing for the different hyperparams.

For each type of kernel functions, train your SVM classifiers using the training data and evaluate the resulting SVM classifer using testing data using accuracy, precision, recall and f-measure of the corresponding classification results.

### 4.1 Basic Hyperparameters (Group)

Write a function called `best_hyperparams` that when given a dictionary of params, runs a `GridSearchCV` on an SVC model using the training and test data.

Use a `cv` of 5.

This function will return the optimized classifier `clf`, and the set of best params (using `clf.best_params_`).

See the documentation for [GridSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html) for more info.

In [1]:
"""
Input:
    params_set: A dictionary of params to use for the grid search
Output:
    The classifier with the best hyperparams
    The dict of best params itself
    
Function:
    use gridsearch to find the best SVC classifier
    return the best parameters
"""

def best_hyperparams(param_set):
    return None

In [2]:
# Here are the value ranges for each of the params.
# We will tell you which of these to tune for which kernel.

# C is the regularization paramater we've discussed before
C = [0.1,0.2,0.3,0.5,1,2,3,5,10]

# degree is the degree of the polynomial used for the polynomial kernel
degree = [1,2,3,4,5]

# coef is the independent term in the kernel function, and is ony used by poly and sigmoid
coef0 = [0.0001,0.001,0.002,0.01,0.02,0.1,0.2,0.3,1,2,5,10]

# gamma is the kernel coefficent used for rbf, poly, and sigmoid
gamma = [0.0001,0.001,0.002,0.01,0.02,0.1,0.2,0.3,1,2,3]

In [None]:
from sklearn.metrics import classification_report

### Linear Kernel (Follow)
For the **linear** kernel, tune `C`.

In [None]:
params = [{
    "kernel":["linear"],
    "C":C
}]

params, model = best_hyperparams(params)
print(params)
predictions = model.predict(X_test)
print(classification_report(predictions,y_test))

### Poly Kernel (Group)
For the **polynomial** kernel, tune `C`, `degree` and `coef0`.

### RBF Kernel (Group)
For the **rbf** kernel, tune `C` and `gamma`.

### Sigmoid Kernel (Group)

For the **sigmoid** kernel, tune `C`, `coef0`, and `gamma`.

### Results

Consider the following visualizaion of how SVM predicts with different kernels on different datasets:

![Different Kernels](https://i.imgur.com/HKTLn35.png)

Given your results, answer the following questions:

1. Which kernel performed best?
2. What criteria are you using to define the best model?
3. Based on the best-performing model(s), what properties do you think the data have (e.g. is is close to linearly separable)?

**Answer here**
1. 
2. 
3. 