# Applying an SVM Model to the Data

### Goals:
1. Use the insights generated from `exploring_data.ipynb`.
    * In particular, we need to standardize the data and reduce the data to 8 features.
    * Note that this functionality was added to `data_loader.py`.
2. Determine how effective the model is with different kernels.

### Load the Data, Apply PCA, Perform a Train/Test Split

In [5]:
import numpy as np
from models.data_loader import DataLoader
from sklearn.svm import SVC

rs = np.random.RandomState(42069)

dl = DataLoader("../data/winequality-red.csv")

dl.apply_pca_to_dataset()

X_train, X_test, y_train, y_test = dl.train_test_split()

### Building and Applying the SVM
* The goal here is to determine how effective the model is with different kernels
    * Kernels to explore: linear (will be bad), polynomial, sigmoid, and Gaussian (probably okay).
* No hypertuning of parameters are done, purely to explore what seems to be playing well with the data.

Note: the dataset is very imbalanced (as shown in `exploring_data.ipynb`; we have to take this into account when
working with an SVM. In particular, we use `class_weight='balanced'` to account for this.

In [12]:
import itertools  # Allows us to take a set product with another.

kernels = ['linear', 'poly', 'rbf', 'sigmoid']
class_balance = [None, 'balanced']

print("Kernel\tClass Balance\tAccuracy Score")
for kernel, balance in list(itertools.product(kernels, class_balance)):
    # Create a model with the appropriate kernel.
    svc = SVC(kernel=kernel, class_weight=balance, random_state=rs)

    # Fit the model to the training data.
    svc.fit(X_train, y_train)

    # Determine how well the model performed and report it.
    print(f"{kernel}\t{balance if balance is not None else 'unit weight'}\t{svc.score(X_test, y_test)}")

Kernel	Class Balance	Accuracy Score
linear	unit weight	0.5604166666666667
linear	balanced	0.3854166666666667
poly	unit weight	0.5520833333333334
poly	balanced	0.5625
rbf	unit weight	0.59375
rbf	balanced	0.47291666666666665
sigmoid	unit weight	0.5104166666666666
sigmoid	balanced	0.26666666666666666
