### Codio Activity 16.8: Tuning the `SVC` Classifier

**Expected Time = 60 minutes**

**Total Points = 40**

This activity focuses on tuning the `SVC` classifier parameters to improve its performance using the wine data.  Typically, the `SVC` will need some parameter tuning.  In practice, you will want to be deliberate about the tuning parameters and not be too exhaustive as the grid searches can be energy intensive.  Here, you will compare different kernels and the `gamma` parameter of the classifier.

#### Index

- [Problem 1](#-Problem-1)
- [Problem 2](#-Problem-2)
- [Problem 3](#-Problem-3)
- [Problem 4](#-Problem-4)

In [3]:
import pandas as pd
import seaborn as sns
from sklearn.datasets import load_breast_cancer
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV

In [4]:
from sklearn.linear_model import LogisticRegression
from sklearn.dummy import DummyClassifier
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import confusion_matrix
from sklearn.datasets import load_wine

In [7]:
X, y = load_wine(return_X_y=True, as_frame=True)

In [9]:
y.value_counts(normalize = True)

target
1    0.398876
0    0.331461
2    0.269663
Name: proportion, dtype: float64

In [11]:
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                   random_state = 42)

[Back to top](#-Index)

### Problem 1

#### Baseline for Classifier

**10 Points**

Below, determine the baseline score for the classifier by using the `DummyClassifier` to fit the the training data `X_train` and `y_train`.  Assign this to the variable `dummy_clf`.

Next, use the `score` function on `dummy_clf` with arguments `X_test` and `y_test` and assign the result to `baseline_score`.


**Note**: The `DummyClassifier` works just as all other estimators you have encountered and has a `.fit` and `.score` method.

In [19]:
### GRADED
dummy_clf = DummyClassifier().fit(X_train, y_train)
baseline_score = dummy_clf.score(X_test, y_test)

### ANSWER CHECK
print(baseline_score)

0.4


[Back to top](#-Index)

### Problem 2

#### Default Settings with `SVC`

**10 Points**

Now, define an `SVC` estimator with default parameters and fit it to the training data `X_train` and `y_train`. Assign this estimator to `svc` below.

Next, use the function `score` on `svc` with arguments `X_test` and `y_test`. Assign your answer as a float to `svc_defaults` below.

In [21]:
### GRADED
svc = SVC().fit(X_train, y_train)
svc_defaults = svc.score(X_test, y_test)

### ANSWER CHECK
print(svc_defaults)

0.7111111111111111


[Back to top](#-Index)

### Problem 3

#### Grid Searching with `SVC`

**10 Points**

While your `svc` should improve upon the baseline score, there is possible room for improvement.  Below, use `GridSearchCV` to grid search the different kernels available with the `SVC` estimator and some different parameters defined by the `params` dictionary below. Fit this estimator to the training data. Assign this result to `grid`.

Next, use the function `score` on `grid` with arguments `X_test` and `y_test`. Assign your answer as a float to `grid_score` below.


In [25]:
params = {'kernel': ['rbf', 'poly', 'linear', 'sigmoid'],
         'gamma': [0.1, 1.0, 10.0, 100.0],}

In [27]:
### GRADED
svc = SVC()

grid = GridSearchCV(estimator=svc, param_grid=params, cv=5).fit(X_train, y_train)

grid_score = grid.score(X_test, y_test)

### ANSWER CHECK
print(grid_score)

1.0


[Back to top](#-Index)

### Problem 4

#### Optimal Kernel Function

**10 Points**

Based on your grid search above what is the best performing kernel function?  Assign your answer as a string -- `linear`, `poly`, `rbf`, or `sigmoid` -- to `best_kernel` below.  

In [29]:
### GRADED
#best_params = grid.best_params_
#print(best_params)

best_kernel = 'poly'

### ANSWER CHECK
print(best_kernel)

{'gamma': 10.0, 'kernel': 'poly'}

