# Hyperparameter Tuning
**Hyperparameters**: parameters that control the learning process of a predictive model. Examples include `C` for classifiers, `l1_ratio` for regression, `gamma` for SVC, etc.

Hyperparameters can be listed for any estimator when using the `get_params()` method. They are named in the format of `<model_name>__<hyperparameter_name>` and can be set with the `set_params()` method.

## Grid Search
**GridSearchCV**: exhaustive approach that generates hyperparameter candidates from a fixed grid of parameter values. It becomes computationally expensive quickly, and cannot capture hyperparameter values that fall between the gaps of the defined values.

In [1]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import GridSearchCV, KFold
from sklearn.svm import SVC

data, target = load_breast_cancer(return_X_y=True)
param_grid = {"C": [0.1, 1.0, 10.0], "gamma": [0.01, 0.1]}
model = GridSearchCV(SVC(), param_grid=param_grid, cv=KFold(n_splits=5, shuffle=True), n_jobs=2)

## Randomized Search
**RandomizedSearchCV**: implements a randomized search over parameters where each parameter is sampled from a distribution of possible values. It scales efficiently as it uses a stochastic search to find the optimum faster than grid searches.

In [2]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import RandomizedSearchCV, KFold
from sklearn.svm import SVC
from scipy.stats import loguniform

data, target = load_breast_cancer(return_X_y=True)
param_distributions = {"C": loguniform(1e-1, 1e1), "gamma": loguniform(10e-2, 10e-1)}
model = RandomizedSearchCV(SVC(), param_distributions=param_distributions, cv=KFold(n_splits=5, shuffle=True), n_jobs=2)

## Nested Cross-Validation
The process of tuning hyperparameters is a form of machine learning, thus, it needs to be properly cross-validated. The tuning step has *only* generated the best hyperparameters from the cross-validation it did with the data.

With nested cross-validation, an additional cross-validation needs to be performed only on models with these new hyperparameters. Therefore, the outer cross-validation (which evaluates the model *after* hyperparameters have been selected) will generate various train-test splits. Then, the inner cross-validation will generate its own train-test splits from the train dataset of the outer cross-validation.

In [3]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import GridSearchCV, KFold, cross_val_score
from sklearn.svm import SVC

data, target = load_breast_cancer(return_X_y=True)
param_grid = {"C": [0.1, 1.0, 10.0], "gamma": [0.01, 0.1]}

inner_cv = KFold(n_splits=5, shuffle=True, random_state=0)
outer_cv = KFold(n_splits=3, shuffle=True, random_state=0)

model = GridSearchCV(SVC(), param_grid=param_grid, cv=inner_cv, n_jobs=2)
test_score = cross_val_score(model, data, target, cv=outer_cv, n_jobs=2)
test_score

array([0.64210526, 0.63157895, 0.60846561])