# Scikit Learn Tutorial #11 - Hyperparameter Tuning

<table align="left"><td>
  <a target="_blank"  href="https://colab.research.google.com/github/TannerGilbert/Tutorials/blob/master/Scikit-Learn-Tutorial/11.%20Hyperparameter%20Tuning.ipynb">
    <img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab
  </a>
</td><td>
  <a target="_blank"  href="https://github.com/TannerGilbert/Tutorials/blob/master/Scikit-Learn-Tutorial/11.%20Hyperparameter%20Tuning.ipynb">
    <img width=32px src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
</td></table>

![Scikit Learn Logo](http://scikit-learn.org/stable/_static/scikit-learn-logo-small.png)

## What are Hyperparameters?

Hyperparameters are parameters that are not directly learnt within estimators. Hyperparameters are set before the learning process begins. There are different hyperparameters for each algorithm which change how the algorithm behaves and performs.

In Scikit Learn hyperparameters are passed as arguments to the constructor of the algorithm. Scikit Learns algorithms have default values for each of there hyperparameters. These defaults are a good starting point but can sometimes not be right for your specific problem.

## What is Hyperparameter Tuning?

Hyperparameter Tuning is the process of chaning parameters so you get the best possible result for your specific problem. In real word examples hyperparameter tuning will only make up a little bit of the process of increasing accuracy. Most of the accuracy increase will come from proper feature engineering and preprocessing.

## Hyperparameter Tuning in Scikit Learn

Scikit Learn provides two types of hyperparameter tuning. A generic type which can be used for all algorithms and an algorithm specific type. 

### Loading in Dataset

In [1]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder

iris = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', names=['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'label'])
le = LabelEncoder()
iris['label'] = le.fit_transform(iris['label'])
iris.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,label
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [2]:
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

X = np.array(iris.drop(['label'], axis=1))
y = np.array(iris['label'])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Exhaustive Grid Search (GridSearchCV)
Called exhaustive because it searches through each combination of parameters from the parameter grid.

In [3]:
%%time
from sklearn.model_selection import GridSearchCV

parameters = {'kernel':['linear', 'rbf', 'poly'], 'C':[0.1, 0.5, 1, 5, 10]}

clf = GridSearchCV(SVC(), parameters)
clf.fit(X_train, y_train)
print('score',clf.score(X_test, y_test))
print(clf.best_params_)

score 1.0
{'C': 1, 'kernel': 'linear'}
Wall time: 105 ms


### Randomized Parameter Optimization (RandomizedSearchCV)

RandomizedSearchCV implements a randomized search over parameters, where each setting is sampled from a distribution over possible parameter values.

In [4]:
%%time
from sklearn.model_selection import RandomizedSearchCV

parameters = {'kernel':['linear', 'rbf', 'poly'], 'C':[0.1, 0.5, 1, 5, 10]}

clf = RandomizedSearchCV(SVC(), parameters)
clf.fit(X_train, y_train)
print('score',clf.score(X_test, y_test))
print(clf.best_params_)

score 1.0
{'kernel': 'linear', 'C': 0.5}
Wall time: 80.1 ms


We can also use multiple evaluation metrics to determine what hyperparameter combination is the best by passing a <i>scoring list</i>.

## Resources

<ul>
    <li><a href="http://scikit-learn.org/stable/modules/grid_search.html">Tuning the hyper-parameters of an estimator (Scikit Learn Documentation)</a></li>
    <li><a href="https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)">Hyperparameter (Wikipedia)</a></li>
    <li><a href="http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV">GridSearchCV (Scikit Learn Documentation)</a></li>
    <li><a href="http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html#sklearn.model_selection.RandomizedSearchCV">RandomizedSearchCV (Scikit Learn Documentation)</a></li>
</ul>

## Conclusion

That was a quick overview of hyperparameter tuning and how to implement it in Scikit Learn. 
I hope you liked this tutorial if you did consider subscribing on my <a href="https://www.youtube.com/channel/UCBOKpYBjPe2kD8FSvGRhJwA">Youtube Channel</a> or following me on Social Media. If you have any question feel free to contact me.