## Hyperparameters

When we define the model, we can specify the hyperparameters. As we've seen in this section, the most common ones are

* `C`: The C parameter.
* `kernel`: The kernel. The most common ones are 'linear', 'poly', and 'rbf'.
* `degree`: If the kernel is polynomial, this is the maximum degree of the monomials in the kernel.
* `gamma` : If the kernel is rbf, this is the gamma parameter.

The data file can be found under the "data.csv" tab in the quiz below. It includes three columns, the first 2 comprising of the coordinates of the points, and the third one of the label.

The data will be loaded for you, and split into features X and labels y.

You'll need to complete each of the following steps:
1. Build a support vector machine model

Create a support vector machine classification model using scikit-learn's SVC and assign it to the variablemodel.
2. Fit the model to the data

If necessary, specify some of the hyperparameters. The goal is to obtain an accuracy of 100% in the dataset. Hint: Not every kernel will work well.
3. Predict using the model

Predict the labels for the training set, and assign this list to the variable y_pred.
4. Calculate the accuracy of the model

For this, use the function sklearn function accuracy_score.
When you hit Test Run, you'll be able to see the boundary region of your model, which will help you tune the correct parameters, in case you need them.

Note: This quiz requires you to find an accuracy of 100% on the training set. Of course, this screams overfitting! If you pick very large values for your parameters, you will fit the training set very well, but it may not be the best model. Try to find the smallest possible parameters that do the job, which has less chance of overfitting, although this part won't be graded.

In [33]:
# Import statements 
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.model_selection import GridSearchCV
import pandas as pd
import numpy as np

In [145]:
# Read the data
data = np.asarray(pd.read_csv('testData/svcData.csv', header=None))
X = data[:, 0:2]
y = data[:, 2]

In [146]:
# Find the right parameters for this model to achieve 100% accuracy on the dataset.
param_grid = {'kernel':['rbf'], 'gamma': np.arange(18, 29)}
# Train the model
model = GridSearchCV(SVC(), param_grid)
# tree = DecisionTreeClassifier()
# model = GridSearchCV(tree, param_grid)
# model.fit(X_train, y_train)
# model = SVC(gamma=27, kernel='rbf')

In [147]:
# Fit the model
model.fit(X, y)

GridSearchCV(cv=None, error_score=nan,
             estimator=SVC(C=1.0, break_ties=False, cache_size=200,
                           class_weight=None, coef0=0.0,
                           decision_function_shape='ovr', degree=3,
                           gamma='scale', kernel='rbf', max_iter=-1,
                           probability=False, random_state=None, shrinking=True,
                           tol=0.001, verbose=False),
             iid='deprecated', n_jobs=None,
             param_grid={'gamma': array([18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28]),
                         'kernel': ['rbf']},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring=None, verbose=0)

In [148]:
# Make predictions
y_pred = model.predict(X)

In [149]:
# Calculate the accuracy
acc = accuracy_score(y, y_pred)
print(acc)
print(model.best_params_)

1.0
{'gamma': 27, 'kernel': 'rbf'}
