[Tutorial here](https://juliaai.github.io/DataScienceTutorials.jl/getting-started/model-tuning/)

# Tuning a single hyperparameter

Tuning is implemented as a model wrapper. We wrap a model in a tuning strategy 9e.g. cross-validation) and bind the wrapped model to data in a machine. Fitting the machine searches for optimal model hyperparameters.

In [1]:
using Pkg
Pkg.activate(".")

[32m[1m  Activating[22m[39m project at `~/Repos/mike_scratch/mlj_tutorial/A-model-tuning`


In [2]:
using MLJ
using PrettyPrinting
X, y = @load_iris
DecisionTreeClassifier = @load DecisionTreeClassifier pkg=DecisionTree

┌ Info: For silent loading, specify `verbosity=0`. 
└ @ Main /Users/mph/.julia/packages/MLJModels/tMgLW/src/loading.jl:168


import MLJDecisionTreeInterface ✔


MLJDecisionTreeInterface.DecisionTreeClassifier

We specify a range of values using the `range` function.

In [3]:
dtc = DecisionTreeClassifier()
r = range(dtc, :max_depth, lower=1, upper=5)

NumericRange(1 ≤ max_depth ≤ 5; origin=3.0, unit=2.0)

For hyperparameters of other types (e.g. `Symbol`), you use values=... keyword.

We wrap the model in a `TunedModel` specifying the tuning strategy.

In [4]:
tm = TunedModel(model=dtc, ranges=[r, ], measure=cross_entropy)

ProbabilisticTunedModel(
    model = DecisionTreeClassifier(
            max_depth = -1,
            min_samples_leaf = 1,
            min_samples_split = 2,
            min_purity_increase = 0.0,
            n_subfeatures = 0,
            post_prune = false,
            merge_purity_threshold = 1.0,
            pdf_smoothing = 0.0,
            display_depth = 5,
            rng = Random._GLOBAL_RNG()),
    tuning = Grid(
            goal = nothing,
            resolution = 10,
            shuffle = true,
            rng = Random._GLOBAL_RNG()),
    resampling = Holdout(
            fraction_train = 0.7,
            shuffle = false,
            rng = Random._GLOBAL_RNG()),
    measure = LogLoss(tol = 2.220446049250313e-16),
    weights = nothing,
    operation = nothing,
    range = MLJBase.NumericRange{Int64, MLJBase.Bounded, Symbol}[NumericRange(1 ≤ max_depth ≤ 5; origin=3.0, unit=2.0)],
    selection_heuristic = MLJTuning.NaiveSelection(nothing),
    train_best = true,
    repeats =

Wrapping this in a tuning strategy above creates a "self-tuning" version of the model, `tuned_model = TunedModel(model=...)`. Some further key-word arguments include:
1. the algorithm (i.e. tuning strategy) for searching the hyper-parameter space of the model. E.g. `tuning = Random(rng=123)` or `tuning = Grid(goal=100)`.
2. the resampling strategy, used to evaluate performance for each value of the hyper-parameters (e.g. `resampling=CV(nfolds=9, rng=123)` or `resampling=Holdout(fraction_train=0.7)`).
3. the measure (or measures) on which to base performance evaluations (and for reporting purposes) (e.g. `measure = rms` or `measures = [rms, mae]`).
4. the range, usually describing the "space" of hyperparameters to be searched (but more generallly whatever extra information is required to complete the search specification, e.g. initial values in gradient-decent).

## Fitting and inpsecting a tuned model

Fit a tuned model like this.

In [5]:
m = machine(tm, X, y)
fit!(m)

┌ Info: Training Machine{ProbabilisticTunedModel{Grid,…},…}.
└ @ MLJBase /Users/mph/.julia/packages/MLJBase/MuLnJ/src/machines.jl:464
┌ Info: Attempting to evaluate 5 models.
└ @ MLJTuning /Users/mph/.julia/packages/MLJTuning/Al9yX/src/tuned_models.jl:680


[33mEvaluating over 5 metamodels:   0%[>                        ]  ETA: N/A[39m[K

[33mEvaluating over 5 metamodels:  20%[=====>                   ]  ETA: 0:00:18[39m[K







Machine{ProbabilisticTunedModel{Grid,…},…} trained 1 time; caches data
  model: MLJTuning.ProbabilisticTunedModel{Grid, MLJDecisionTreeInterface.DecisionTreeClassifier}
  args: 
    1:	Source @450 ⏎ `Table{AbstractVector{Continuous}}`
    2:	Source @906 ⏎ `AbstractVector{Multiclass{3}}`


Check the misclassification rate for the best model.

In [6]:
r = report(m)
r.best_history_entry.measurement[1]

2.3978952727983693

All the plotting vectors you need are in r.plotting. I can't get Plots to compile right now though.

## Tuning nested hyperparameters

Start with dummy regression data.

In [8]:
X = (x1=rand(100), x2=rand(100), x3=rand(100))
y = 2X.x1 - X.x2 + 0.05 * randn(100);

Build a simple model with decision tree regressors.

In [9]:
DecisionTreeRegressor = @load DecisionTreeRegressor pkg=DecisionTree
forest = EnsembleModel(model=DecisionTreeRegressor())

import MLJDecisionTreeInterface ✔


┌ Info: For silent loading, specify `verbosity=0`. 
└ @ Main /Users/mph/.julia/packages/MLJModels/tMgLW/src/loading.jl:168


DeterministicEnsembleModel(
    model = DecisionTreeRegressor(
            max_depth = -1,
            min_samples_leaf = 5,
            min_samples_split = 2,
            min_purity_increase = 0.0,
            n_subfeatures = 0,
            post_prune = false,
            merge_purity_threshold = 1.0,
            rng = Random._GLOBAL_RNG()),
    atomic_weights = Float64[],
    bagging_fraction = 0.8,
    rng = Random._GLOBAL_RNG(),
    n = 100,
    acceleration = CPU1{Nothing}(nothing),
    out_of_bag_measure = Any[])

Here, the ensemble (bagging function) has a set of hyperparameters (`atomic_weights`, `bagging_fraction`, `n`). But so does the atomic decision tree model (`max_depth`, `min_sample_leaf`, etc.).

Just as before, we need to specify the range to tune the hyperparameters. To do this with nested hyperparameters, we can use the dot syntax.

In [11]:
r1 = range(forest, :(model.n_subfeatures), lower=1, upper=3)
r2 = range(forest, :bagging_fraction, lower=0.4, upper=1.0)
tm = TunedModel(model=forest, tuning=Grid(resolution=12),
                resampling=CV(nfolds=6), ranges=[r1, r2],
                measures=rms)
m = machine(tm, X, y)
fit!(m);

┌ Info: Training Machine{DeterministicTunedModel{Grid,…},…}.
└ @ MLJBase /Users/mph/.julia/packages/MLJBase/MuLnJ/src/machines.jl:464
┌ Info: Attempting to evaluate 36 models.
└ @ MLJTuning /Users/mph/.julia/packages/MLJTuning/Al9yX/src/tuned_models.jl:680


[33mEvaluating over 36 metamodels:   0%[>                        ]  ETA: N/A[39m[K

[33mEvaluating over 36 metamodels:   3%[>                        ]  ETA: 0:01:08[39m[K

[33mEvaluating over 36 metamodels:   6%[=>                       ]  ETA: 0:00:37[39m[K

[33mEvaluating over 36 metamodels:   8%[==>                      ]  ETA: 0:00:24[39m[K

[33mEvaluating over 36 metamodels:  11%[==>                      ]  ETA: 0:00:18[39m[K

[33mEvaluating over 36 metamodels:  14%[===>                     ]  ETA: 0:00:14[39m[K

[33mEvaluating over 36 metamodels:  17%[====>                    ]  ETA: 0:00:11[39m[K

[33mEvaluating over 36 metamodels:  19%[====>                    ]  ETA: 0:00:09[39m[K

[33mEvaluating over 36 metamodels:  22%[=====>                   ]  ETA: 0:00:08[39m[K

























































Now inspect the model.

In [12]:
r = report(m)
r.best_history_entry.measurement[1]

0.17825570569802807

In [14]:
r.plotting.parameter_values

36×2 Matrix{Any}:
 2  1.0
 3  0.945455
 1  0.836364
 3  1.0
 1  0.781818
 2  0.563636
 2  0.672727
 1  0.890909
 3  0.781818
 3  0.836364
 ⋮  
 1  0.672727
 2  0.836364
 1  0.618182
 1  0.4
 3  0.454545
 1  0.454545
 3  0.509091
 3  0.890909
 1  0.727273

The first column corresponds with the sub-features parameter. The second corresponds wit the bagging fraction. If I could plot this, we'd use color to denote `r.plotting.measurements`.

In [15]:
r.plotting.measurements

36-element Vector{Float64}:
 0.18452788340498058
 0.19088183453012386
 0.2858790075691232
 0.21781686623825347
 0.2896629044540661
 0.20952788650417342
 0.1951252257374061
 0.2845816168040688
 0.18157145313270476
 0.18020907715082793
 ⋮
 0.2949112622907113
 0.17979269055524033
 0.30936970156981486
 0.3274552339296205
 0.20787884281047223
 0.32213146199045795
 0.19784923395691326
 0.18239215809509213
 0.29653275879300794