# Grid Optimizer
## How to perform cross validation and hiperparameter optimization with Grid search

If you have seen the tutorial [Reuse Data](notebooks/caching_heavy_data.ipynb) you mai noticed that we've use a standard sklean optimizer for hyperparameter tuning. This is fine for many uses cases, but it might not be the best choice for somo others. 

### We will use a simple pipeline for the iris dataset.

In [1]:
from framework3.utils.patch_type_guard import patch_inspect_for_notebooks

patch_inspect_for_notebooks()

✅ Patched inspect.getsource using dill.


In [2]:
from sklearn import datasets
from framework3.base.base_clases import XYData

iris = datasets.load_iris()


X_train, X_test, y_train, y_test = XYData(
    _hash="Iris ", _path="/dataset", _value=[]
).train_test_split(
    iris.data,
    iris.target,
    test_size=0.2,
    random_state=42,  # type: ignore
)

### Then we will configure Grid Search for hyperparameter tuning and a Sklearn splitter for cross validation.

In [3]:
from framework3 import (
    F1,
    Cached,
    F3Pipeline,
    KnnFilter,
    Precission,
    StandardScalerPlugin,
)
from framework3.plugins.metrics.classification import Recall, XYData
from framework3.plugins.optimizer.grid_optimizer import GridOptimizer
from framework3.plugins.splitter.cross_validation_splitter import KFoldSplitter


wandb_pipeline = (
    F3Pipeline(
        filters=[
            Cached(StandardScalerPlugin()),
            KnnFilter().grid({"n_neighbors": [2, 3, 4, 5, 6]}),
        ],
        metrics=[F1(), Precission(), Recall()],
    )
    .splitter(
        KFoldSplitter(
            n_splits=2,
            shuffle=True,
            random_state=42,
        )
    )
    .optimizer(GridOptimizer(scoring=F1()))
)

In [4]:
wandb_pipeline.fit(X_train, y_train)
_y = wandb_pipeline.predict(x=X_test)

{'KnnFilter': {'n_neighbors': [2, 3, 4, 5, 6]}}
{'KnnFilter': {'n_neighbors': 2}}
{'KnnFilter': {'n_neighbors': 3}}
{'KnnFilter': {'n_neighbors': 4}}
{'KnnFilter': {'n_neighbors': 5}}
{'KnnFilter': {'n_neighbors': 6}}


In [5]:
y_test.value

array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
       0, 2, 2, 2, 2, 2, 0, 0])

In [6]:
_y.value

array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
       0, 2, 2, 2, 2, 2, 0, 0])

In [7]:
wandb_pipeline.evaluate(X_test, y_test, _y)

{'F1': 1.0, 'Precission': 1.0, 'Recall': 1.0}

### grid results


In [8]:
wandb_pipeline._results

Unnamed: 0,KnnFilter,score
2,{'n_neighbors': 4},0.933723
4,{'n_neighbors': 6},0.932844
1,{'n_neighbors': 3},0.925411
3,{'n_neighbors': 5},0.916946
0,{'n_neighbors': 2},0.90865
