-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PUBDEV-5975: Proposal for a consistent behaviour of AutoML reruns #3907
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
65e62fc
to
8a9a9f4
Compare
a453042
to
d561b68
Compare
ledell
approved these changes
Oct 14, 2019
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great improvement. LGTM.
d561b68
to
87c47d8
Compare
…ing fields that should not change during lifetime of an automl object
…sic Java API to be able to read leaderboard
…trics, fixing remaining NPEs with empty leaderboards
… column for better leaderboard readability
87c47d8
to
3b64b7e
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
https://0xdata.atlassian.net/browse/PUBDEV-5975
Due to the inherent issues with the current Python API for AutoML, user can obtain surprising results when looking at the leaderboard when doing reruns in AutoML, especially in the following cases:
aml.train
on a singleaml
(AutoML) instance.AutoML
instances without anyproject_name
.On backend side, it seems that it was also designed with a feature in mind that was never completed: the notion of
project
distinct from theautoml_id
. But it's very difficult to guess what designers had in mind as there is not true support for the concept of a project that would include multipleAutoML
instances for example.The contract for this proposal is detailed in the
pyunit_automl_reruns.py
test suite, so I encourage to look at it first.To sum it up, the idea is:
project
(and therefore leaderboard), each time the user creates andAutoML
instance without specifying theproject_name
.train
multiple times on the same project name, with compatible data (same training_frame, same response column).or the training_frame(previous leaderboard is still accessible by id).More issues
https://0xdata.atlassian.net/browse/PUBDEV-6708
This issue is due to reruns as well.
Current behaviour allows changing the
leaderboard_frame
and the new models will still be appended to the existing leaderboard (ignoring the new frame), this is just plain wrong!The leaderboard should be identified uniquely by
project_name
+leaderboard_frame
, making this rerun logic still more complicated.Rerun behaviour contract
see https://github.com/h2oai/h2o-3/pull/3907/files#diff-1281d5db9141adc08c3047885255f970