In [None]:
import warnings
warnings.filterwarnings('ignore')

# data and plotting
import pandas as pd
import numpy as np
from plotnine import *

# preprocessing
from sklearn.preprocessing import StandardScaler #Z-score variables
from sklearn.model_selection import train_test_split
# metrics
from sklearn.metrics import accuracy_score, confusion_matrix, mean_squared_error, plot_confusion_matrix, roc_auc_score, recall_score, precision_score

# models
from sklearn.svm import SVC


# Review
In the last lecture we learned about kernels and The Kernel Trick that allow us to use a linear classifier like SVCs on non-linearly separable data. 

Remember, a Kernel is a function that calculates the relationship (dot product) between two vectors *as if* they are in a higher dimensional space *without actually having to calculate that higher dimensional space*.

We talked about two main kernels:

## Polynomial Kernel

$$K(x,y) = (x*y + r)^d$$

The hyperparameters $r$ and $d$ can be chosen via hyperparameter tuning. $d$ controls the maximum dimensions of the space we're projecting our data to (e.g. when $d=2$ we're at most projecting to a 2D space). 

For example:

$$K(x,y) = (x*y + \frac{1}{2})^2$$
$$(x*y + \frac{1}{2})^2 = (x*y + \frac{1}{2})(x*y + \frac{1}{2})$$
$$ = xy + x^2y^2 + \frac{1}{4}$$

Notice that $xy + x^2y^2 + \frac{1}{4}$ is the same value we'd get if we took the dot product of the two vectors: $x' = (x,x^2, \frac{1}{2})$ and $y' = (y,y^2, \frac{1}{2})$. Thus, the kernel $K(x,y)$ calculates the same value as the dot product between two *transformed* points $x'$ and $y'$ which have 2 dimensions. The first dimension is the original variable, the second dimension is the variable squared (we ignore the 3rd dimension because it is the same for all data points, and thus does not give us any useful information).

BUT NOTICE, when we plug $x$ and $y$ into the kernel function $K(x,y)$, we NEVER actually had to calculate $x'$ and $y'$!! Which saves us a lot of time. This is true no matter the $r$ and $d$ values you use for the polynomial kernel.


## Radial Kernel (Radial Basis Function)

$$ K(x,y) = e^{-\gamma(x-y)^2}$$

This whole calculates-the-dot-product-without-calculating-the-transformed-data thing is especially useful when the transformation transforms the data into *infinite* dimensions like the Radial Kernel. 

We proved using Taylor Series Expansion that the radial kernel calculates the dot product of two data points *as if* they were projected into infinite dimensions. This causes the SVM to act like a weighted KNN model, and gives us a huge amount of flexibility with what our groups of data look like.

# SVC's in sklearn

We'll use this Pulsar Star Data from [Kaggle](https://www.kaggle.com/datasets/colearninglounge/predicting-pulsar-starintermediate?resource=download).

Download the data, then upload it using the upload button to Colab. 


In [None]:
from sklearn.pipeline import make_pipeline
from sklearn.compose import make_column_transformer
from sklearn.model_selection import GridSearchCV

In [None]:
star = pd.read_csv("/Users/chelseaparlett/Downloads/archive/pulsar_data_train.csv")
star.head()
star = star.dropna()

In [None]:
# organize and split data

In [None]:
# build parts of pipeline

# build pipeline

# parameters dict

# grid search

# fit and check

# Building an SVM with radial kernel

Let's look at that data we tried (and failed) to classify using a linear kernel last time.
- load in the data
- plot the data using a scatter plot
- do an 80/20 TTS
- zscore your data
- build an `SVC()` model with `kernel = "rbf"` and `C = 1`
- grab the train/test accuracy and ROC/AUC
- use the `plotSVM2D()` function I wrote for you to plot the decision boundary for your model for both train and test (just change the first two arguments, no need to refit the model)

In [None]:
# load data
d = pd.read_csv("https://raw.githubusercontent.com/cmparlettpelleriti/CPSC393ParlettPelleriti/main/Data/svmcw.csv")
d.head()

Unnamed: 0,X1,X2,y
0,-1.203055,-1.309099,a
1,-1.486382,-0.159741,a
2,-0.832017,0.246909,a
3,-1.41123,1.565836,a
4,-1.154422,-0.144092,a


In [None]:
# scatter plot


In [None]:
# Do your train test split, and z score


In [None]:
### YOUR CODE HERE ###


In [None]:
# DONT CHANGE ANYTHING JUST RUN

def plotSVM2D(Xdf,y,model):
    x0_name = Xdf.columns[0]
    x1_name = Xdf.columns[1]
    #grab the range of features for each feature
    x0_range = np.linspace(min(Xdf[x0_name]) - np.std(Xdf[x0_name]),
                           max(Xdf[x0_name]) + np.std(Xdf[x0_name]), num = 100)
    x1_range = np.linspace(min(Xdf[x1_name]) - np.std(Xdf[x1_name]),
                           max(Xdf[x1_name]) + np.std(Xdf[x1_name]), num = 100)

    #get all possible points on graph
    x0 = np.repeat(x0_range,100)
    x1 = np.tile(x1_range,100)
    x_grid = pd.DataFrame({x0_name: x0,x1_name: x1})

    # bredict all background points
    p = model.predict(x_grid)
    x_grid["p"] = p #add to dataframe
    
    #build the plot
    bound = (ggplot(x_grid, aes(x = x0_name, y = x1_name, color = "factor(p)")) +
                 geom_point(alpha = 0.2, size = 0.2) + theme_minimal() +
                 scale_color_manual(name = "Class", values = ["#E69F00", "#0072B2"]) +
                 geom_point(data = Xdf, mapping = aes(x = x0_name, y = x1_name, color = "factor(y)"), size = 2))
    print(bound)

In [None]:
# plotSVM2D

### Question
<img src="https://drive.google.com/uc?export=view&id=1ghyQPx1N8dmU3MV4TrANvqNhGwnLni72" alt="Q" width = "200"/>

How does the model do? How does it differ from the model you built using a linear kernel last class?


## The Gamma Hyperparameter
the Gamma parameter scales the amount of influence two data points have on each other. The bigger $\gamma$ is, the smaller the influence two data points have on each other. 

- Re-run your model above but set `gamma = 25`  

### Question
<img src="https://drive.google.com/uc?export=view&id=1ghyQPx1N8dmU3MV4TrANvqNhGwnLni72" alt="Q" width = "200"/>

How did changing `gamma` change the decision boundary? Did the boundary get smoother or more jagged? Do jagged boundaries more often lead to **overfitting** or **underfitting**

In [None]:
### YOUR CODE HERE ###




In [None]:
# plotSVM2D

- Now re-run your model with `gamma = 0.1`.

### Question
<img src="https://drive.google.com/uc?export=view&id=1ghyQPx1N8dmU3MV4TrANvqNhGwnLni72" alt="Q" width = "200"/>

How is your model doing?

In [None]:
### YOUR CODE HERE ###


In [None]:
# plotSVM2D

# Building an SVM with a polynomial kernel
Let's see if we can classify this data with a polynomial kernel.

- build an `SVC()` model with `kernel = "poly"`, `degree = 2`, `gamma = 1`, and `C = 1` (FYI for some odd reason sklearn calles the $r$ coefficient hyperparameter `gamma`)
- grab the train/test accuracy and ROC/AUC
- use the `plotSVM2D()` function I wrote for you to plot the decision boundary for your model for both train and test (just change the first two arguments, no need to refit the model)


### Question
<img src="https://drive.google.com/uc?export=view&id=1ghyQPx1N8dmU3MV4TrANvqNhGwnLni72" alt="Q" width = "200"/>

How does your model do? What does the decision boundary look like?

In [None]:
### YOUR CODE HERE ###


In [None]:
# plotSVM2D

- Play around with the `C`, `degree` and `gamma` parameters.

### Question
<img src="https://drive.google.com/uc?export=view&id=1ghyQPx1N8dmU3MV4TrANvqNhGwnLni72" alt="Q" width = "200"/>

How do they change the performance and decision boundary of the model?

## Now with Messier Data

While the previous data was CLEARLY non-linear and needed the help of a kernel, it was still fairly separable. Let's try a messier dataset [here](https://raw.githubusercontent.com/cmparlettpelleriti/CPSC393ParlettPelleriti/main/Data/svmcw2.csv).

- load the data
- plot the data with a scatterplot
- do a 80/20 TTS
- use a `rbf` kernel, and use GridSearch to chooose `C` (options should be `[0.001, 0.01, 0.1, 0.5, 1]`), and `gamma` (options should be `[0.001, 0.01, 0.1, 0.5, 1,2,5,10,25,50]`)
- (make sure you z score *appropriately*)
- print out the train/test accuracy and ROC/AUC.
- plot the decision boundary for the full model for train and test using `plotSVM2D()`

### Question
<img src="https://drive.google.com/uc?export=view&id=1ghyQPx1N8dmU3MV4TrANvqNhGwnLni72" alt="Q" width = "200"/>

What hyperparameter values did GridSearch choose? How did your model perform? What does the decision boundary look like?

In [None]:
from sklearn.pipeline import make_pipeline
from sklearn.compose import make_column_transformer
from sklearn.model_selection import GridSearchCV


In [None]:
# load data

In [None]:
# Do your train test split, and z score


In [None]:
### YOUR CODE HERE ###




In [None]:
# plotSVM2D