# Algorithm Selection and Hyperparameter Tuning with `scikit-learn`

This chapter contains code examples for model selection and hyperparameter tuning with `scikit-learn`.



In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn
import pandas

In [None]:
seaborn.set_style("ticks")
plt.rcParams["axes.grid"] = True

## Model Selection Tools in `scikit-learn`

### Parameter Search

A relatively simple ML algorithm, such as the *decision tree algorithm*, already has a large number of parameters with which we could configure it before it sees the training data. All of these parameters can potentially influence the performance of the learned model. Which parameters to tweak is a matter of understanding the algorithm and understanding the data. 

Remembering the section on **model complexity**, we conclude that the **depth of a decision tree** (i.e. the maximum number of steps from the root to a leaf) is an important parameter: The shallower the tree, the fewer criteria it can check before arriving at a prediction - possibly risking _underfitting_. On the other hand, the deeper the tree, the higher the risk for _overfitting_.



In [None]:
from sklearn.tree import DecisionTreeClassifier

In [None]:
DecisionTreeClassifier?


There is only one way to really know the optimal depth: **Experiment with different parameters and measure performance**. Fortunately `scikit-learn` has helpful tools to make this possible in a few lines of code

- [GridSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html): Tries out every combination of parameters from a given "grid" and evalutes them using cross-validation.
- [RandomizedSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html): Randomly tries some of the possible combinations of parameters - necessary for large search spaces.

In [None]:
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV

In [None]:
from sklearn.metrics import precision_score, make_scorer

In [None]:
param_search = GridSearchCV(
    estimator=DecisionTreeClassifier(),
    param_grid={
        "max_depth": range(1,10)
    },
    scoring=make_scorer(precision_score, average="micro")
)

In [None]:
import data_science_learning_paths

In [None]:
data = data_science_learning_paths.datasets.read_iris()

In [None]:
data.head()

In [None]:
X, y = data[data.columns.difference(["species"])], data["species"]

In [None]:
param_search.fit(X, y)

The fitted search object will tell you the best parameters found in the experiments:

In [None]:
param_search.best_params_

And conveniently, the fitted search estimator is already able to make predictions using the best model found:

In [None]:
y_pred = param_search.predict(X)

### Exercise: Algorithm Search

**Rather than tuning the parameters of one algorithm, we can also use the search tools to try out differnt types of algorithms. This can be done using a `Pipeline`. For this we treat the name of a pipeline stage as a parameter. Try it out!**

In [None]:
from sklearn.pipeline import Pipeline

In [None]:
# TODO: your code here

---
_This notebook is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/). Copyright © 2018-2025 [Point 8 GmbH](https://point-8.de)_

