In [None]:
%matplotlib inline

# Plug a surrogate discipline in a Scenario.

In this section we describe the usage of surrogate model in GEMSEO,
which is implemented in the
[SurrogateDiscipline][gemseo.disciplines.surrogate.SurrogateDiscipline] class.

A [SurrogateDiscipline][gemseo.disciplines.surrogate.SurrogateDiscipline]
can be used to substitute a
[Discipline][gemseo.core.discipline.discipline.Discipline] within a
[BaseScenario][gemseo.scenarios.base_scenario.BaseScenario]. This
[SurrogateDiscipline][gemseo.disciplines.surrogate.SurrogateDiscipline]
is an evaluation of the [Discipline][gemseo.core.discipline.discipline.Discipline]
and is faster to compute than the original one. It relies on a
[BaseRegressor][gemseo.mlearning.regression.algos.base_regressor.BaseRegressor].
This comes at the price of computing a DOE
on the original [Discipline][gemseo.core.discipline.discipline.Discipline],
and validating the approximation. The
computations from which the approximation is built can be available, or can be
built using GEMSEO's DOE capabilities.
See these [Sobieski][mdf-based-doe-on-the-sobieski-ssbj-test-case]
and [Sellar][a-from-scratch-example-on-the-sellar-problem] examples.

In GEMSEO, the data used to build the surrogate model is taken from a
[Dataset][gemseo.datasets.dataset.Dataset] containing both inputs and
outputs of the DOE. This
[Dataset][gemseo.datasets.dataset.Dataset] may have been generated by
GEMSEO from a cache, using the
[BaseCache.to_dataset()][gemseo.caches.base_cache.BaseCache.to_dataset] method,
from a database, using the
[OptimizationProblem.to_dataset()][gemseo.algos.optimization_problem.OptimizationProblem.to_dataset] method,
or from a NumPy array or
a text file using the
[Dataset.from_array()][gemseo.datasets.dataset.Dataset.from_array] and
[Dataset.from_txt()][gemseo.datasets.dataset.Dataset.from_txt].

Then, the surrogate discipline can be used as any other discipline in a
[MDOScenario][gemseo.scenarios.mdo_scenario.MDOScenario],
a [DOEScenario][gemseo.scenarios.doe_scenario.DOEScenario],
or a [BaseMDA][gemseo.mda.base_mda.BaseMDA].


In [None]:
from __future__ import annotations

from numpy import array
from numpy import hstack
from numpy import vstack

from gemseo import create_discipline
from gemseo import create_scenario
from gemseo import create_surrogate
from gemseo import sample_disciplines
from gemseo.datasets.io_dataset import IODataset
from gemseo.problems.mdo.sobieski.core.design_space import SobieskiDesignSpace

## Create a surrogate scenario

### Create the training dataset

If you already have available data from a DOE produced externally,
it is possible to create a [Dataset][gemseo.datasets.dataset.Dataset] and Step 1
ends here.
For example, let us consider a synthetic dataset, with $x$
as input and $y$ as output, described as a NumPy array.
Then, we store these data in a [Dataset][gemseo.datasets.dataset.Dataset]:




In [None]:
variables = ["x", "y"]
sizes = {"x": 1, "y": 1}
groups = {"x": "inputs", "y": "outputs"}
data = vstack((
    hstack((array([1.0]), array([1.0]))),
    hstack((array([2.0]), array([2.0]))),
))
synthetic_dataset = IODataset.from_array(data, variables, sizes, groups)

If you do not have available data,the following paragraphs of Step 1 concern you.

Here, we illustrate the generation of the training data using a
[DOEScenario][gemseo.scenarios.doe_scenario.DOEScenario],
similarly to [this example][mdf-based-doe-on-the-sobieski-ssbj-test-case],
where more details are given.

In this basic example, a [Discipline][gemseo.core.discipline.discipline.Discipline]
computing the mission
performance (range) in the [Sobieski's SSBJ problem][sobieskis-ssbj-test-case] is
sampled with a [DOEScenario][gemseo.scenarios.doe_scenario.DOEScenario]. Then,
the generated database is used to
build a [SurrogateDiscipline][gemseo.disciplines.surrogate.SurrogateDiscipline].

But more complex scenarios can be used in the same way: complete optimization
processes or MDAs can be replaced by their surrogate counterparts. The right
cache or database shall then be used to build the
[SurrogateDiscipline][gemseo.disciplines.surrogate.SurrogateDiscipline],
but the main logic won't differ from this
example.

Firstly, we create the [Discipline][gemseo.core.discipline.discipline.Discipline]
by means of the API function
[create_discipline()][gemseo.create_discipline]:




In [None]:
discipline = create_discipline("SobieskiMission")

Then, we read the [DesignSpace][gemseo.algos.design_space.DesignSpace] of the
[Sobieski's SSBJ problem][sobieskis-ssbj-test-case]
and keep only the inputs of the mission discipline as inputs of the DOE,
namely `"x_shared"`, `"y_24"` and `"y_34"`:




In [None]:
design_space = SobieskiDesignSpace()
design_space = design_space.filter(["x_shared", "y_24", "y_34"])

From this [Discipline][gemseo.core.discipline.discipline.Discipline] and this
[DesignSpace][gemseo.algos.design_space.DesignSpace],
we can generate 30 samples by means of the
[sample_disciplines()][gemseo.sample_disciplines] function
with the LHS algorithm:



In [None]:
mission_dataset = sample_disciplines(
    [discipline], design_space, "y_4", algo_name="PYDOE_LHS", n_samples=30
)

!!! info "See also"

    In this tutorial, the DOE is based on [pyDOE](https://pythonhosted.org/pyDOE/),
    However, several other designs are available,
    based on the package or [OpenTURNS](https://openturns.github.io/www/).
    Some examples of these designs are plotted
    in [this page][doe-algorithms].  To list the available DOE algorithms in the
    current GEMSEO configuration, use
    [get_available_doe_algorithms()][gemseo.get_available_doe_algorithms].



### Create the [SurrogateDiscipline][gemseo.disciplines.surrogate.SurrogateDiscipline]

From this [Dataset][gemseo.datasets.dataset.Dataset],
we can build a [SurrogateDiscipline][gemseo.disciplines.surrogate.SurrogateDiscipline]
of the [Discipline][gemseo.core.discipline.discipline.Discipline].

Indeed, by means of the API function [create_surrogate()][gemseo.create_surrogate],
we create the [SurrogateDiscipline][gemseo.disciplines.surrogate.SurrogateDiscipline]
from the dataset,
which can be executed as any other discipline.

Precisely,
by means of the API function [create_surrogate()][gemseo.create_surrogate],
we create a [SurrogateDiscipline][gemseo.disciplines.surrogate.SurrogateDiscipline]
relying on a [LinearRegressor][gemseo.mlearning.regression.algos.linreg.LinearRegressor]
and inheriting from [Discipline][gemseo.core.discipline.discipline.Discipline]:



In [None]:
synthetic_surrogate = create_surrogate("LinearRegressor", synthetic_dataset)

!!! info "See also"

    Note that a subset of the inputs and outputs to be used to build the
    [SurrogateDiscipline][gemseo.disciplines.surrogate.SurrogateDiscipline]
    may be specified by the user if needed,
    mainly to avoid unnecessary computations.

Then, we execute it as any [Discipline][gemseo.core.discipline.discipline.Discipline]:



In [None]:
input_data = {"x": array([2.0])}
out = synthetic_surrogate.execute(input_data)
out["y"]

In our study case, from the DOE built at Step 1,
we build a [RBFRegressor][gemseo.mlearning.regression.algos.rbf.RBFRegressor]
of $y_4$
representing the range in function of $L/D$:



In [None]:
range_surrogate = create_surrogate("RBFRegressor", mission_dataset)

## Use the [SurrogateDiscipline][gemseo.disciplines.surrogate.SurrogateDiscipline]
in MDO

The obtained [SurrogateDiscipline][gemseo.disciplines.surrogate.SurrogateDiscipline]
can be used in any
[BaseScenario][gemseo.scenarios.base_scenario.BaseScenario], such as a
[DOEScenario][gemseo.scenarios.doe_scenario.DOEScenario]
or [MDOScenario][gemseo.scenarios.mdo_scenario.MDOScenario].
We see here that the
[Discipline.execute()][gemseo.core.discipline.discipline.Discipline.execute]
method can be used as in
any other discipline to compute the outputs for given inputs:



In [None]:
for i in range(5):
    lod = i * 2.0
    y_4_pred = range_surrogate.execute({"y_24": array([lod])})["y_4"]

And we can build and execute an optimization scenario from it.
The design variables are $y_24$. The Jacobian matrix is computed by finite
differences by default for surrogates, except for the
[SurrogateDiscipline][gemseo.disciplines.surrogate.SurrogateDiscipline] relying on
[LinearRegressor][gemseo.mlearning.regression.algos.linreg.LinearRegressor] which has
an analytical (and constant) Jacobian.



In [None]:
design_space = design_space.filter(["y_24"])
scenario = create_scenario(
    range_surrogate,
    "y_4",
    design_space,
    formulation_name="DisciplinaryOpt",
    maximize_objective=True,
)
scenario.execute(algo_name="L-BFGS-B", max_iter=30)

## Available surrogate models

Currently, the following surrogate models are available:

- Linear regression,
  based on the [Scikit-learn](http://scikit-learn.org/stable/) library,
  for that use the
  [LinearRegressor][gemseo.mlearning.regression.algos.linreg.LinearRegressor] class.
- Polynomial regression,
  based on the [Scikit-learn](http://scikit-learn.org/stable/) library,
  for that use the
  [PolynomialRegressor][gemseo.mlearning.regression.algos.polyreg.PolynomialRegressor]
  class,
- Gaussian processes (also known as Kriging),
  based on the [Scikit-learn](http://scikit-learn.org/stable/) library,
  for that use the
  [GaussianProcessRegressor][gemseo.mlearning.regression.algos.gpr.GaussianProcessRegressor]
  class,
- Mixture of experts, for that use the
  [MOERegressor][gemseo.mlearning.regression.algos.moe.MOERegressor] class,
- Random forest models,
  based on the [Scikit-learn](http://scikit-learn.org/stable/) library,
  for that use the
  [RandomForestRegressor][gemseo.mlearning.regression.algos.random_forest.RandomForestRegressor]
  class.
- RBF models (Radial Basis Functions),
  using the [SciPy](http://scipy.org/) library,
  for that use the
  [RBFRegressor][gemseo.mlearning.regression.algos.rbf.RBFRegressor] class.
- PCE models (Polynomial Chaos Expansion),
  based on the [OpenTURNS](https://openturns.github.io/www/) library,
  for that use the
  [PCERegressor][gemseo.mlearning.regression.algos.pce.PCERegressor] class.

To understand the detailed behavior of the models, please go to the
documentation of the used packages.

## Extending surrogate models

All surrogate models work the same way: the
[BaseRegressor][gemseo.mlearning.regression.algos.base_regressor.BaseRegressor] base
class shall be extended. See [this page][extend-gemseo-features] to learn how to run
GEMSEO with external Python modules. Then, the
[RegressorFactory][gemseo.mlearning.regression.algos.factory.RegressorFactory] can
build the new
[BaseRegressor][gemseo.mlearning.regression.algos.base_regressor.BaseRegressor]
automatically from its regression
algorithm name and options. This factory is called by the constructor of
[SurrogateDiscipline][gemseo.disciplines.surrogate.SurrogateDiscipline].

!!! info "See also"

    More generally, GEMSEO provides extension mechanisms to integrate external DOE
    and optimization algorithms, disciplines, MDAs and surrogate models.

