# Wandb Optimizer
## How to perform cross validation and hiperparameter optimization with WandB

If you have seen the tutorial [Reuse Data](notebooks/caching_heavy_data.ipynb) you mai noticed that we've use a standard sklean optimizer for hyperparameter tuning. This is fine for many uses cases, but it might not be the best choice for somo others. For those how need a more advanced optimization strategy, Wandb is a great choice.

### We will use a simple pipeline for the iris dataset.

In [1]:
import wandb

wandb.login()

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Currently logged in as: [33mmanu-couto1k[0m ([33mcitius-irlab[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


True

In [2]:
from framework3.utils.patch_type_guard import patch_inspect_for_notebooks

patch_inspect_for_notebooks()

✅ Patched inspect.getsource using dill.


In [3]:
from sklearn import datasets
from framework3.base.base_clases import XYData
from sklearn.model_selection import train_test_split

iris = datasets.load_iris()


X_train, X_test, y_train, y_test = train_test_split(
    iris.data,
    iris.target,
    test_size=0.2,
    random_state=42,  # type: ignore
)


X_train = XYData(
    _hash="Iris X train data",
    _path="/datasets",
    _value=X_train,
)
y_train = XYData(
    _hash="Iris y train data",
    _path="/datasets",
    _value=y_train,  # type: ignore
)

X_test = XYData(
    _hash="Iris X train data",
    _path="/datasets",
    _value=X_test,
)
y_test = XYData(
    _hash="Iris y train data",
    _path="/datasets",
    _value=y_test,  # type: ignore
)

### Then we will configure wandb for hyperparameter tuning and a Sklearn splitter for cross validation.

Wandb provides a dashboard to visualize the results of the experiments. For this to work, you need to define project name and login to the wandb services.

In [4]:
from framework3 import F1, F3Pipeline, KnnFilter, Precission, StandardScalerPlugin
from framework3.plugins.metrics.classification import Recall, XYData
from framework3.plugins.optimizer.wandb_optimizer import WandbOptimizer
from framework3.plugins.splitter.cross_validation_splitter import KFoldSplitter


wandb_pipeline = (
    F3Pipeline(
        filters=[
            StandardScalerPlugin(),
            KnnFilter().grid({"n_neighbors": [2, 3, 4, 5, 6]}),
        ],
        metrics=[F1(), Precission(), Recall()],
    )
    .splitter(
        KFoldSplitter(
            n_splits=2,
            shuffle=True,
            random_state=42,
        )
    )
    .optimizer(
        WandbOptimizer(
            project="test_project",
            sweep_id=None,
            scorer=F1(),
        )
    )
)

In [5]:
wandb_pipeline.fit(X_train, y_train)
_y = wandb_pipeline.predict(x=X_test)

Create sweep with ID: kr0p2w24
Sweep URL: https://wandb.ai/citius-irlab/test_project/sweeps/kr0p2w24


[34m[1mwandb[0m: Agent Starting Run: qvhp71sa with config:
[34m[1mwandb[0m: 	filters: {'KnnFilter': {'n_neighbors': 2}}
[34m[1mwandb[0m: 	pipeline: {'clazz': 'KFoldSplitter', 'params': {'n_splits': 2, 'pipeline': {'clazz': 'F3Pipeline', 'params': {'filters': [{'clazz': 'StandardScalerPlugin', 'params': {}}, {'_grid': {'n_neighbors': [2, 3, 4, 5, 6]}, 'clazz': 'KnnFilter', 'params': {'algorithm': 'auto', 'leaf_size': 30, 'metric': 'minkowski', 'metric_params': None, 'n_jobs': None, 'n_neighbors': [2, 3, 4, 5, 6], 'p': 2, 'weights': 'uniform'}}], 'log': False, 'metrics': [{'clazz': 'F1', 'params': {'average': 'weighted'}}, {'clazz': 'Precission', 'params': {'average': 'weighted'}}, {'clazz': 'Recall', 'params': {'average': 'weighted'}}], 'overwrite': False, 'store': False}}, 'random_state': 42, 'shuffle': True}}
[34m[1mwandb[0m: 	x_dataset: Iris X train data
[34m[1mwandb[0m: 	y_dataset: Iris y train data


0,1
F1,▁

0,1
F1,0.90865


[34m[1mwandb[0m: Agent Starting Run: bv8epurg with config:
[34m[1mwandb[0m: 	filters: {'KnnFilter': {'n_neighbors': 3}}
[34m[1mwandb[0m: 	pipeline: {'clazz': 'KFoldSplitter', 'params': {'n_splits': 2, 'pipeline': {'clazz': 'F3Pipeline', 'params': {'filters': [{'clazz': 'StandardScalerPlugin', 'params': {}}, {'_grid': {'n_neighbors': [2, 3, 4, 5, 6]}, 'clazz': 'KnnFilter', 'params': {'algorithm': 'auto', 'leaf_size': 30, 'metric': 'minkowski', 'metric_params': None, 'n_jobs': None, 'n_neighbors': [2, 3, 4, 5, 6], 'p': 2, 'weights': 'uniform'}}], 'log': False, 'metrics': [{'clazz': 'F1', 'params': {'average': 'weighted'}}, {'clazz': 'Precission', 'params': {'average': 'weighted'}}, {'clazz': 'Recall', 'params': {'average': 'weighted'}}], 'overwrite': False, 'store': False}}, 'random_state': 42, 'shuffle': True}}
[34m[1mwandb[0m: 	x_dataset: Iris X train data
[34m[1mwandb[0m: 	y_dataset: Iris y train data


0,1
F1,▁

0,1
F1,0.92541


[34m[1mwandb[0m: Agent Starting Run: y6ebmuh1 with config:
[34m[1mwandb[0m: 	filters: {'KnnFilter': {'n_neighbors': 4}}
[34m[1mwandb[0m: 	pipeline: {'clazz': 'KFoldSplitter', 'params': {'n_splits': 2, 'pipeline': {'clazz': 'F3Pipeline', 'params': {'filters': [{'clazz': 'StandardScalerPlugin', 'params': {}}, {'_grid': {'n_neighbors': [2, 3, 4, 5, 6]}, 'clazz': 'KnnFilter', 'params': {'algorithm': 'auto', 'leaf_size': 30, 'metric': 'minkowski', 'metric_params': None, 'n_jobs': None, 'n_neighbors': [2, 3, 4, 5, 6], 'p': 2, 'weights': 'uniform'}}], 'log': False, 'metrics': [{'clazz': 'F1', 'params': {'average': 'weighted'}}, {'clazz': 'Precission', 'params': {'average': 'weighted'}}, {'clazz': 'Recall', 'params': {'average': 'weighted'}}], 'overwrite': False, 'store': False}}, 'random_state': 42, 'shuffle': True}}
[34m[1mwandb[0m: 	x_dataset: Iris X train data
[34m[1mwandb[0m: 	y_dataset: Iris y train data


0,1
F1,▁

0,1
F1,0.93372


[34m[1mwandb[0m: Agent Starting Run: srq9t0tk with config:
[34m[1mwandb[0m: 	filters: {'KnnFilter': {'n_neighbors': 5}}
[34m[1mwandb[0m: 	pipeline: {'clazz': 'KFoldSplitter', 'params': {'n_splits': 2, 'pipeline': {'clazz': 'F3Pipeline', 'params': {'filters': [{'clazz': 'StandardScalerPlugin', 'params': {}}, {'_grid': {'n_neighbors': [2, 3, 4, 5, 6]}, 'clazz': 'KnnFilter', 'params': {'algorithm': 'auto', 'leaf_size': 30, 'metric': 'minkowski', 'metric_params': None, 'n_jobs': None, 'n_neighbors': [2, 3, 4, 5, 6], 'p': 2, 'weights': 'uniform'}}], 'log': False, 'metrics': [{'clazz': 'F1', 'params': {'average': 'weighted'}}, {'clazz': 'Precission', 'params': {'average': 'weighted'}}, {'clazz': 'Recall', 'params': {'average': 'weighted'}}], 'overwrite': False, 'store': False}}, 'random_state': 42, 'shuffle': True}}
[34m[1mwandb[0m: 	x_dataset: Iris X train data
[34m[1mwandb[0m: 	y_dataset: Iris y train data


0,1
F1,▁

0,1
F1,0.91695


[34m[1mwandb[0m: Agent Starting Run: za54vh64 with config:
[34m[1mwandb[0m: 	filters: {'KnnFilter': {'n_neighbors': 6}}
[34m[1mwandb[0m: 	pipeline: {'clazz': 'KFoldSplitter', 'params': {'n_splits': 2, 'pipeline': {'clazz': 'F3Pipeline', 'params': {'filters': [{'clazz': 'StandardScalerPlugin', 'params': {}}, {'_grid': {'n_neighbors': [2, 3, 4, 5, 6]}, 'clazz': 'KnnFilter', 'params': {'algorithm': 'auto', 'leaf_size': 30, 'metric': 'minkowski', 'metric_params': None, 'n_jobs': None, 'n_neighbors': [2, 3, 4, 5, 6], 'p': 2, 'weights': 'uniform'}}], 'log': False, 'metrics': [{'clazz': 'F1', 'params': {'average': 'weighted'}}, {'clazz': 'Precission', 'params': {'average': 'weighted'}}, {'clazz': 'Recall', 'params': {'average': 'weighted'}}], 'overwrite': False, 'store': False}}, 'random_state': 42, 'shuffle': True}}
[34m[1mwandb[0m: 	x_dataset: Iris X train data
[34m[1mwandb[0m: 	y_dataset: Iris y train data


0,1
F1,▁

0,1
F1,0.93284


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Sweep Agent: Exiting.
[34m[1mwandb[0m: Sorting runs by -summary_metrics.F1


In [6]:
wandb_pipeline.evaluate(X_test, y_test, _y)

{'F1': 1.0, 'Precission': 1.0, 'Recall': 1.0}

### Wandb dashboard

![Wandb Dashboard](img/wandb_sweep.png)

Similar to Optuna, we can analyze the influence of each parameter on the selected metric. However, unlike Optuna, WandB offers a paid version with additional and more advanced features.