# Finding the best hyper-parameters

#### Problem: most ML algorithms need to setup with parameters -> how to find the "best" parameters ?

We return to the "Digits" subset of MNIST and optimize our clustering approach ...

In [2]:
from sklearn.datasets import load_digits
data, labels = load_digits(return_X_y=True)

## Task 1: building pipelines in Scikit Learn
Use Scikit-Learns [pipeline](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html) mechanism to organize your clustering workflow.
* build a pipeline with data normalization and a clustering algorithm
* train and evaluate the pipeline
* [visualize](https://scikit-learn.org/stable/auto_examples/miscellaneous/plot_pipeline_display.html#sphx-glr-auto-examples-miscellaneous-plot-pipeline-display-py) the pipeline

## Task 2: optimize the pipeline with ParameterGrid
Now we want to automatically search fpr the best parameters:
* use a [parameter grid](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.ParameterGrid.html) to optimize over 
    * different cluster algorithms
    * and their hyper parameters (e.g. 'k' in K-means)
* use a cluster metric to evaluate and rank the different combinations
* search over all clustering algorithms we have discussed so far

HINT: use the Python [unpacking opperator **](https://towardsdatascience.com/unpacking-operators-in-python-306ae44cd480) to pass arguments to your pipeline