## AutoML Sweepable API

This Notebook shows how to use `Sweepable` API to fully customize the pipeline or search space in your AutoML task. In this notebook, you will learn
- use `AutoTrainer` to simplify your work.
- how to use `AutoML().CreateSweepableEstimator` to create `SweepableEstimator`.
- how to create `SweepablePipeline` for multiple trainer candidates.


### Install Nuget packages and add using statement

In [None]:
// using nightly-build
#i "nuget:https://pkgs.dev.azure.com/dnceng/public/_packaging/MachineLearning/nuget/v3/index.json"
#r "nuget: Plotly.NET.Interactive, 3.0.2"
#r "nuget: Plotly.NET.CSharp, 0.0.1"
#r "nuget: Microsoft.ML.AutoML, 0.20.0-preview.22470.1"
#r "nuget: Microsoft.Data.Analysis, 0.20.0-preview.22470.1"

In [None]:
using static Microsoft.DotNet.Interactive.Formatting.PocketViewTags;
using Microsoft.Data.Analysis;
using System;
using System.IO;
using Microsoft.ML;
using Microsoft.ML.AutoML;
using Microsoft.ML.AutoML.CodeGen;
using Microsoft.ML.Trainers.LightGbm;
using Microsoft.ML.Data;
using Plotly.NET;
using Microsoft.ML.Transforms.TimeSeries;
using Microsoft.ML.SearchSpace;
using System.Diagnostics;

#### Use `AutoTrainer` for built-in sweepable estimator candidates.

`AutoTrainer` provides built-in sweepable estimator candidates for binary-classification, multi-class classification and regression. For those scenarios, you can simply use those candidates instead of creating `SweepableEstimator` from scratch.

In [None]:
var context = new MLContext();
var regressionTrainerCandidates = context.Auto().Regression();
var binaryClassificationTrainerCandidates = context.Auto().BinaryClassification();
var multiclassClassificationTrainerCandidates = context.Auto().MultiClassification();

#### Use `AutoML().CreateSweepableEstimator` to create `SweepableEstimator`

In case the built-in `SweepableEstimator` doesn't satisfy your requirement, you can call `CreateSweepableEstimator` to create a customized `SweepableEstimator`. A `SweepableEstimator` is nothing different than a normal `Estimator` plus `SearchSpace`. The following code shows how to create a sweepable `LightGbm` and `SDCA`.

For simplicity, the built-in search space for `LightGbm` and `SDCA` is used but you can fully customize the search space however way you want. For more details on how to do that, please check [Parameter And SearchSpace](./Parameter%20and%20SearchSpace.ipynb)

In [None]:
var lgbmSearchSpace = new SearchSpace<LgbmOption>();
var sweepableLgbm = context.Auto().CreateSweepableEstimator((context, param) => {
    var option = new LightGbmRegressionTrainer.Options()
    {
        NumberOfLeaves = param.NumberOfLeaves,
        NumberOfIterations = param.NumberOfTrees,
        MinimumExampleCountPerLeaf = param.MinimumExampleCountPerLeaf,
        LearningRate = param.LearningRate,
        LabelColumnName = "Label",
        FeatureColumnName = "Features",
        Booster = new GradientBooster.Options()
        {
            SubsampleFraction = param.SubsampleFraction,
            FeatureFraction = param.FeatureFraction,
            L1Regularization = param.L1Regularization,
            L2Regularization = param.L2Regularization,
        },
        MaximumBinCountPerFeature = param.MaximumBinCountPerFeature,
    };

    return context.Regression.Trainers.LightGbm(option);
}, lgbmSearchSpace);

var sdcaSearchSpace = new SearchSpace<SdcaOption>();
var sweepableSdca = context.Auto().CreateSweepableEstimator((context, param) => {
    return context.Regression.Trainers.Sdca("Label", "Features", l1Regularization: param.L1Regularization, l2Regularization: param.L2Regularization);
}, sdcaSearchSpace);

#### Create `SweepablePipeline` with multiple trainer candidates.

`SweepablePipeline` allows you to put multiple estimators as candidates to a certain stage. During AutoML sweeping, these candidates will be evaluated seperatly and the one with best metric will be picked. Note that the estimator doesn't necessarily need to be a trainer, it can be a trainer, transformer or even a `SweepablePipeline`, as long as they all have the same input and output schema.

The following code shows how to create a `SweepablePipeline` with `sweepableSdca` and `sweepableLgbm` we created above.

In [None]:
var sweepablePipeline = context.Transforms.Concatenate("Features", "X1", "X2")
                            .Append(sweepableSdca, sweepableLgbm);

#### Config `AutoMLExperiment` using `sweepablePipeline`
In the next step, we are going to train `sweepablePipeline` on a generated non-linear dataset using `AutoMLExperiment`, which will sweeping both `sdca` and `lightGbm` on configured search space. Considering that `sdca` is a linear classifier, the winning model should be `lightGbm`.

In [None]:
var rand = new Random(0);
var context =new MLContext(seed: 1);
var x1 = Enumerable.Range(0, 1000).Select(_x => rand.NextSingle() * 100).ToArray();
var x2 = x1.Select(_x => rand.NextSingle() * 100).ToArray();
var y = Enumerable.Zip(x1, x2).Select(_x => _x.Second * _x.First + (rand.NextSingle() - 0.5f) * 10).ToArray();
var df = new DataFrame();
df["X1"] = DataFrameColumn.Create("X1", x1);
df["X2"] = DataFrameColumn.Create("X2", x2);
df["Label"] = DataFrameColumn.Create("Label", y);
var trainTestSplit = context.Data.TrainTestSplit(df);
df.Head(10)

In [None]:
var monitor = new NotebookMonitor(sweepablePipeline);
var experiment = context.Auto().CreateExperiment();
experiment.SetDataset(df, 5)
          .SetPipeline(sweepablePipeline)
          .SetTrainingTimeInSeconds(50)
          .SetRegressionMetric(RegressionMetric.RootMeanSquaredError)
          .SetMonitor(monitor);

// Configure Visualizer			
monitor.SetUpdate(monitor.Display());

var res = await experiment.RunAsync();

// check the type of last trainer for winning model, which should be lightGbm
(res.Model as TransformerChain<ITransformer>).Last().GetType()

## Continue learning

> [⏩ Next Module - AutoML Tuner](./06-AutoML%20HPO%20and%20tuner.ipynb)  
> [⏪ Last Module - AutoML Sweepable API](./05%20-%20AutoML%20Sweepable%20API.ipynb)

## See also
- [Training and AutoML](./03-Training%20and%20AutoML.ipynb)
- [Regression with Taxi Dataset](./E2E-Regression%20with%20Taxi%20Dataset.ipynb)
- [Classification with Iris Dataset](./E2E-Classification%20with%20Iris%20Dataset.ipynb)
- [Kaggle with Titanic Dataset](./REF-Kaggle%20with%20Titanic%20Dataset.ipynb)