AutoML: offer the possibility to specify the order in which training steps will be executed #8793

exalate-issue-sync · 2023-05-12T09:43:10Z

After discussing Epsilon's needs with [~accountid:557058:9328661f-241f-4a0f-9d9a-d4e78ef05ba0] and [~accountid:557058:afd6e9a4-1891-4845-98ea-b5d34a2bc42c] regarding case 94685, we decided for now to provide the possibility for AutoML to specify the order in which training steps will be executed.
This can be done at higher/coarse-grained level (order of default algos, default grids):

XGB_defaults, GBM_defaults, … XGB_grid, …

Or at a more fine-grained level (order of each hardcoded model):

XGB_default_1, XGB_default_2, …., GBM_def_1, ….

h1. Proposal

The suggested parameter name for this specification is {{modeling_plan}}.

Here is the suggested JSON representation to specify those steps in an ordered way:

{code:json}[
{"name":"XGBoost", "steps":[{"id":"def_1"}, {"id":"def_2"}, {"id":"def_3"}],
{"name":"GLM"},
{"name":"DRF", "alias":"all"},
{"name":"GBM", "alias":"defaults"},
{"name":"XRT"},
{"name":"XGBoost", "steps":[{"id":"grid_1"}]},
{"name":"GBM", "alias":"grids"],
{"name":"StackedEnsemble", "steps":[{"id":"best"}, {"id":"all"}]}
]{code}

Unfortunately, JSON doesn’t guarantee conservation of object keys so we can’t use a JSON object for this but have to use only arrays.

The semantic of the example above goes as follow:

starts with {{XGBoost}} algorithm, but only hardcoded models with ids {{def_1}}, {{def_2}}, {{def_3}} in the given order.
then train all the {{GLM }} models (default models and/or grids), followed by all {{DRF}} models (using alias {{all}} in the latter case).
then train all the default {{GBM}} models (using alias {{defaults}} to avoid typing all the model ids explicitly).
then train all the {{XRT}} models
then train {{XGBoost}} step with id {{grid_1}} (probably a grid…)
then train all the {{GBM}} grids (using alias {{grids}} to avoid listing them explicitly).
then train the {{StackedEnsemble}} models with ids {{best}} and {{all}} in this order.
{{DeepLearning}} algo hasn’t been mentioned in this example, so it will be skipped.

If an algo or a model id (e.g. {{def_3}}) is present in this order specification but the id doesn’t exist anymore in the new {{AutoML}} version, then it will be ignored with a warning message.

The representation is also easily extensible: we can add new algos, new default models, new grids, new hyperparameter search methods…

If user also specifies {{exclude_algos}} parameter, this one will apply on top of the order specification: this allows user to keep this specification in one variable, without having to change it later. For example {{exclude_algos=[“XRT“]}}in combination with {{modeling_plan=the_example_above}} will execute the steps defined in the example except {{XRT}}. Same thing if using {{include_algos}} instead.

After running {{AutoML}}, the detailed {{modeling_steps}} specification (with all step ids) will be available from the automl instance so that the user can save it for later use.

Python representation examples (can use list or tuples):

{code:python}# the JSON example translated to Python using simple syntax:
modeling_plan=[
('XGBoost', ['def_1', 'def_2', 'def_3']),
('GLM'),
('DRF', 'all'),
('GBM', 'defaults'),
'XRT',
('XGBoost', ['grid_1']),
('GBM', 'grids'),
('StackedEnsemble', ['best', 'all'])
]

specify only algos ordering: in this case it will always execute

all default models first (if any)

immediately followed by the algo grids (if any):

modeling_plan=['XGBoost', 'GLM', 'DRF', 'GBM', 'DeepLearning']

only specify algos order, making the distinction between default models and grids (the order of each individual model is the default one defined by backend):

modeling_plan=[
('XGBoost', 'defaults'),
('GLM', 'grids'),
('DRF', 'defaults'),
('GBM', 'defaults'),
('XGBoost', 'grids'),
('GBM', 'grids'),
('StackedEnsemble', 'all')
]{code}

And an equivalent representation in R:

{code:r}modeling_plan=list(
list(name='XGBoost', steps=c('def_1', 'def_2', 'def_3')),
list(name='GLM'),
list(name='DRF', alias='all'),
list(name='GBM', alias='defaults'),
'XRT',
list(name='XGBoost', steps=c('grid_1')),
list(name='GBM', alias='grids'),
list(name='StackedEnsemble', steps=c('best', 'all'))
){code}

exalate-issue-sync · 2023-05-12T09:43:11Z

Sebastien Poirier commented: [~accountid:557058:afd6e9a4-1891-4845-98ea-b5d34a2bc42c] [~accountid:557058:9328661f-241f-4a0f-9d9a-d4e78ef05ba0] this is the new ticket for training order specification.

Please have a look at the detailed proposal.

exalate-issue-sync · 2023-05-12T09:43:13Z

Ruslan Dautkhanov commented: Thank you Erin and Sebastien

h2o-ops · 2023-05-14T23:41:31Z

JIRA Issue Migration Info

Jira Issue: PUBDEV-6840
Assignee: Sebastien Poirier
Reporter: Sebastien Poirier
State: Resolved
Fix Version: 3.28.0.1
Attachments: N/A
Development PRs: Available

Linked PRs from JIRA

#3867

h2o-ops closed this as completed May 14, 2023

h2o-ops added the fixVersion/3.28.0.1 label May 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AutoML: offer the possibility to specify the order in which training steps will be executed #8793

AutoML: offer the possibility to specify the order in which training steps will be executed #8793

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

h2o-ops commented May 14, 2023

AutoML: offer the possibility to specify the order in which training steps will be executed #8793

AutoML: offer the possibility to specify the order in which training steps will be executed #8793

Comments

exalate-issue-sync bot commented May 12, 2023

specify only algos ordering: in this case it will always execute

all default models first (if any)

immediately followed by the algo grids (if any):

only specify algos order, making the distinction between default models and grids (the order of each individual model is the default one defined by backend):

exalate-issue-sync bot commented May 12, 2023

exalate-issue-sync bot commented May 12, 2023

h2o-ops commented May 14, 2023