[Link to tutorial](https://juliaai.github.io/DataScienceTutorials.jl/getting-started/composing-models/)

In [1]:
using Pkg
Pkg.activate(".")
Pkg.instantiate()

[32m[1m  Activating[22m[39m project at `~/Repos/mike_scratch/mlj_tutorial/A-ensembles`


└ @ nothing /Users/mph/Repos/mike_scratch/mlj_tutorial/A-ensembles/Manifest.toml:0


First, generate dummy data.

In [2]:
using MLJ
using PrettyPrinting

In [3]:
KNNRegressor = @load KNNRegressor
# input
X = (age    = [23, 45, 34, 25, 67],
     gender = categorical(['m','m','f','m','f']))
# target
height = [178, 194, 165, 173, 168];

┌ Info: For silent loading, specify `verbosity=0`. 
└ @ Main /Users/mph/.julia/packages/MLJModels/tMgLW/src/loading.jl:168


import NearestNeighborModels ✔




In [4]:
scitype(X.age)

AbstractVector{Count}[90m (alias for [39m[90mAbstractArray{Count, 1}[39m[90m)[39m

We want a pipeline that processes the data. Specifically, we want to:
- `coerce` age into a continuous variable instead of a count.
- One hot encode the categorical features in X.

Using `Pipeline` will apply these changes in order.

In [5]:
pipe = Pipeline(
    coercer = X -> coerce(X, :age=>Continuous),
    one_hot_encoder = OneHotEncoder(),
    transformed_target_model = TransformedTargetModel(
        model = KNNRegressor(K=3);
        target=UnivariateStandardizer()
    )
)

DeterministicPipeline(
    coercer = var"#9#10"(),
    one_hot_encoder = OneHotEncoder(
            features = Symbol[],
            drop_last = false,
            ordered_factor = true,
            ignore = false),
    transformed_target_model = TransformedTargetModelDeterministic(
            model = KNNRegressor,
            target = UnivariateStandardizer,
            inverse = nothing,
            cache = true),
    cache = true)

The `TransformedTargetModelDeterministic` will learn the a `UnivariateStandardizer` and apply it. We can access hyperparameters of the pipe.

In [6]:
pipe.transformed_target_model.model.K = 2
pipe.one_hot_encoder.drop_last = true;

Using `evaluate!` on a pipe will construct machines that will contain the fitter parameters, etc.

In [7]:
evaluate(
    pipe,
    X,
    height,
    resampling=Holdout(),
    measure=rms
) |> pprint

│ scitype(y) = AbstractVector{Count}
│ target_scitype(model) = AbstractVector{Continuous}.
└ @ MLJBase /Users/mph/.julia/packages/MLJBase/MuLnJ/src/machines.jl:140


PerformanceEvaluation(11.5,)