You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After discussing Epsilon's needs with [~accountid:557058:9328661f-241f-4a0f-9d9a-d4e78ef05ba0] and [~accountid:557058:afd6e9a4-1891-4845-98ea-b5d34a2bc42c] regarding case 94685, we decided for now to provide the possibility for AutoML to specify the order in which training steps will be executed.
This can be done at higher/coarse-grained level (order of default algos, default grids):
XGB_defaults, GBM_defaults, … XGB_grid, …
Or at a more fine-grained level (order of each hardcoded model):
XGB_default_1, XGB_default_2, …., GBM_def_1, ….
h1. Proposal
The suggested parameter name for this specification is {{modeling_plan}}.
Here is the suggested JSON representation to specify those steps in an ordered way:
Unfortunately, JSON doesn’t guarantee conservation of object keys so we can’t use a JSON object for this but have to use only arrays.
The semantic of the example above goes as follow:
starts with {{XGBoost}} algorithm, but only hardcoded models with ids {{def_1}}, {{def_2}}, {{def_3}} in the given order.
then train all the {{GLM }} models (default models and/or grids), followed by all {{DRF}} models (using alias {{all}} in the latter case).
then train all the default {{GBM}} models (using alias {{defaults}} to avoid typing all the model ids explicitly).
then train all the {{XRT}} models
then train {{XGBoost}} step with id {{grid_1}} (probably a grid…)
then train all the {{GBM}} grids (using alias {{grids}} to avoid listing them explicitly).
then train the {{StackedEnsemble}} models with ids {{best}} and {{all}} in this order.
{{DeepLearning}} algo hasn’t been mentioned in this example, so it will be skipped.
If an algo or a model id (e.g. {{def_3}}) is present in this order specification but the id doesn’t exist anymore in the new {{AutoML}} version, then it will be ignored with a warning message.
The representation is also easily extensible: we can add new algos, new default models, new grids, new hyperparameter search methods…
If user also specifies {{exclude_algos}} parameter, this one will apply on top of the order specification: this allows user to keep this specification in one variable, without having to change it later. For example {{exclude_algos=[“XRT“]}}in combination with {{modeling_plan=the_example_above}} will execute the steps defined in the example except {{XRT}}. Same thing if using {{include_algos}} instead.
After running {{AutoML}}, the detailed {{modeling_steps}} specification (with all step ids) will be available from the automl instance so that the user can save it for later use.
Python representation examples (can use list or tuples):
{code:python}# the JSON example translated to Python using simple syntax:
modeling_plan=[
('XGBoost', ['def_1', 'def_2', 'def_3']),
('GLM'),
('DRF', 'all'),
('GBM', 'defaults'),
'XRT',
('XGBoost', ['grid_1']),
('GBM', 'grids'),
('StackedEnsemble', ['best', 'all'])
]
specify only algos ordering: in this case it will always execute
only specify algos order, making the distinction between default models and grids (the order of each individual model is the default one defined by backend):
Sebastien Poirier commented: [~accountid:557058:afd6e9a4-1891-4845-98ea-b5d34a2bc42c] [~accountid:557058:9328661f-241f-4a0f-9d9a-d4e78ef05ba0] this is the new ticket for training order specification.
After discussing Epsilon's needs with [~accountid:557058:9328661f-241f-4a0f-9d9a-d4e78ef05ba0] and [~accountid:557058:afd6e9a4-1891-4845-98ea-b5d34a2bc42c] regarding case 94685, we decided for now to provide the possibility for AutoML to specify the order in which training steps will be executed.
This can be done at higher/coarse-grained level (order of default algos, default grids):
Or at a more fine-grained level (order of each hardcoded model):
h1. Proposal
The suggested parameter name for this specification is {{modeling_plan}}.
Here is the suggested JSON representation to specify those steps in an ordered way:
{code:json}[
{"name":"XGBoost", "steps":[{"id":"def_1"}, {"id":"def_2"}, {"id":"def_3"}],
{"name":"GLM"},
{"name":"DRF", "alias":"all"},
{"name":"GBM", "alias":"defaults"},
{"name":"XRT"},
{"name":"XGBoost", "steps":[{"id":"grid_1"}]},
{"name":"GBM", "alias":"grids"],
{"name":"StackedEnsemble", "steps":[{"id":"best"}, {"id":"all"}]}
]{code}
Unfortunately, JSON doesn’t guarantee conservation of object keys so we can’t use a JSON object for this but have to use only arrays.
The semantic of the example above goes as follow:
If an algo or a model id (e.g. {{def_3}}) is present in this order specification but the id doesn’t exist anymore in the new {{AutoML}} version, then it will be ignored with a warning message.
The representation is also easily extensible: we can add new algos, new default models, new grids, new hyperparameter search methods…
If user also specifies {{exclude_algos}} parameter, this one will apply on top of the order specification: this allows user to keep this specification in one variable, without having to change it later. For example {{exclude_algos=[“XRT“]}}in combination with {{modeling_plan=the_example_above}} will execute the steps defined in the example except {{XRT}}. Same thing if using {{include_algos}} instead.
After running {{AutoML}}, the detailed {{modeling_steps}} specification (with all step ids) will be available from the automl instance so that the user can save it for later use.
Python representation examples (can use list or tuples):
{code:python}# the JSON example translated to Python using simple syntax:
modeling_plan=[
('XGBoost', ['def_1', 'def_2', 'def_3']),
('GLM'),
('DRF', 'all'),
('GBM', 'defaults'),
'XRT',
('XGBoost', ['grid_1']),
('GBM', 'grids'),
('StackedEnsemble', ['best', 'all'])
]
specify only algos ordering: in this case it will always execute
all default models first (if any)
immediately followed by the algo grids (if any):
modeling_plan=['XGBoost', 'GLM', 'DRF', 'GBM', 'DeepLearning']
only specify algos order, making the distinction between default models and grids (the order of each individual model is the default one defined by backend):
modeling_plan=[
('XGBoost', 'defaults'),
('GLM', 'grids'),
('DRF', 'defaults'),
('GBM', 'defaults'),
('XGBoost', 'grids'),
('GBM', 'grids'),
('StackedEnsemble', 'all')
]{code}
And an equivalent representation in R:
{code:r}modeling_plan=list(
list(name='XGBoost', steps=c('def_1', 'def_2', 'def_3')),
list(name='GLM'),
list(name='DRF', alias='all'),
list(name='GBM', alias='defaults'),
'XRT',
list(name='XGBoost', steps=c('grid_1')),
list(name='GBM', alias='grids'),
list(name='StackedEnsemble', steps=c('best', 'all'))
){code}
The text was updated successfully, but these errors were encountered: