In [None]:
%matplotlib inline

# Advanced mixture of experts.


In [None]:
from __future__ import annotations

from gemseo import create_benchmark_dataset
from gemseo.mlearning import create_regression_model
from gemseo.mlearning.classification.quality.f1_measure import F1Measure
from gemseo.mlearning.clustering.quality.silhouette_measure import SilhouetteMeasure
from gemseo.mlearning.regression.quality.mse_measure import MSEMeasure

In this example,
we seek to estimate the Rosenbrock function
using the function [create_benchmark_dataset][gemseo.create_benchmark_dataset]
for generating the datasets.



In [None]:
dataset = create_benchmark_dataset("RosenbrockDataset", opt_naming=False)

For that purpose,
we will use an [MOERegressor][gemseo.mlearning.regression.algos.moe.MOERegressor] in an advanced way:
we will not set the clustering, classification and regression algorithms
but select them according to their performance
from several candidates that we will provide.
Moreover,
for a given candidate,
we will propose several settings,
compare their performances
and select the best one.

## Initialization

First,
we initialize an [MOERegressor][gemseo.mlearning.regression.algos.moe.MOERegressor] with soft classification
by means of the high-level machine learning function [create_regression_model()][gemseo.mlearning.create_regression_model].



In [None]:
model = create_regression_model("MOERegressor", dataset, hard=False)

## Clustering

Then,
we add two clustering algorithms
with different numbers of clusters (called *components* for the Gaussian Mixture)
and set the [SilhouetteMeasure][gemseo.mlearning.clustering.quality.silhouette_measure.SilhouetteMeasure] as clustering measure
to be evaluated from the training dataset.
During the learning stage,
the mixture of experts will select the clustering algorithm
and the number of clusters
minimizing this measure.



In [None]:
model.set_clustering_measure(SilhouetteMeasure)
model.add_clusterer_candidate("KMeans", n_clusters=[2, 3, 4])
model.add_clusterer_candidate("GaussianMixture", n_clusters=[3, 4, 5])

## Classification

We also add classification algorithms
with different settings
and set the [F1Measure][gemseo.mlearning.classification.quality.f1_measure.F1Measure] as classification measure
to be evaluated from the training dataset.
During the learning stage,
the mixture of experts will select the classification algorithm and the settings
minimizing this measure.



In [None]:
model.set_classification_measure(F1Measure)
model.add_classifier_candidate("KNNClassifier", n_neighbors=[3, 4, 5])
model.add_classifier_candidate("RandomForestClassifier", n_estimators=[100])

## Regression

We also add regression algorithms
and set the [MSEMeasure][gemseo.mlearning.regression.quality.mse_measure.MSEMeasure] as regression measure
to be evaluated from the training dataset.
During the learning stage, for each cluster,
the mixture of experts will select the regression algorithm minimizing this measure.



In [None]:
model.set_regression_measure(MSEMeasure)
model.add_regressor_candidate("LinearRegressor")
model.add_regressor_candidate("RBFRegressor")

!!! note

    We could also add candidates for some learning stages,
    e.g. clustering and regression,
    and set the machine learning algorithms for the remaining ones,
    e.g. classification.

## Training

Lastly,
we learn the data
and select the best machine learning algorithm
for both clustering, classification and regression steps.



In [None]:
model.learn()

## Result

We can get information on this model,
on the sub-machine learning models selected among the candidates
and on their selected settings.
We can see that
a [MKeans][gemseo.mlearning.clustering.algos.kmeans.KMeans] with four clusters has been selected for the clustering stage,
as well as a [RandomForestClassifier][gemseo.mlearning.classification.algos.random_forest.RandomForestClassifier] for the classification stage
and a [RBFRegressor][gemseo.mlearning.regression.algos.rbf.RBFRegressor] for each cluster.



In [None]:
model

!!! note

    By adding candidates,
    and depending on the complexity of the function to be approximated,
    one could obtain different regression models according to the clusters.
    For example,
    one could use a [PolynomialRegressor][gemseo.mlearning.regression.algos.polyreg.PolynomialRegressor] with order 2
    on a sub-part of the input space
    and a [GaussianProcessRegressor][gemseo.mlearning.regression.algos.gpr.GaussianProcessRegressor]
    on another sub-part of the input space.

Once built,
this mixture of experts can be used as any [BaseRegressor][gemseo.mlearning.regression.algos.base_regressor.BaseRegressor].

!!! info "See also"

    [Another example][mixture-of-experts]
    proposes a standard use of [MOERegressor][gemseo.mlearning.regression.algos.moe.MOERegressor].

