-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AutoML reruns (same project name, no project name...) #12823
Comments
Sebastien Poirier commented: [~accountid:557058:afd6e9a4-1891-4845-98ea-b5d34a2bc42c] do we want this one fixed for 3.22.0.x ? |
Erin LeDell commented: [~accountid:5b153fb1b0d76456f36daced] Maybe instead of Since we already use the date + (seconds?) for auto-naming the models in the leaderboard, it seems like it would make sense to use the same timestamp. e.g {{StackedEnsemble_AllModels_AutoML_20181127_075221"}} Currently project name looks like this: Also since we use {{"AutoML_{date}_{seconds}"}} with a capital "AutoML", maybe we should also change from lower case "automl". Let me know your thoughts. |
Sebastien Poirier commented: [~accountid:557058:afd6e9a4-1891-4845-98ea-b5d34a2bc42c] after having a second look at this, we could use a simple default project_name (based on timestamp as you suggest) or a more complex one (based on hashes) depending on which constraints we want to impose to user... but I don't think that relying on project_name is enough anyway. I'll try to sum up current behaviour and propose alternatives about how it should work. Today: no project name givenaml <- h2o.automl(x=x, y=y1, training_frame=train, max_models=3) #3+2 models visible in leaderboard If we add a timestamp when generating the project_name (when not provided by user, or even adding it to user's provided name), then it will fix the issue, but at the cost of preventing user from doing reruns on the same AutoML instance... We have multiple alternatives to fix this:
The last one looks the most complete and predictable imo. The whole API looks broken for reruns in my opinion... the only parameters I would allow for reruns are things like max_models, max_runtime_secs, exclude_algos... basically those parameters that have none or minimal impact on how the individual models are built. If reruns are popular, then I think API should be properly repaired. It's a bigger task however. Sorry for being so long :) |
Sebastien Poirier commented: [~accountid:557058:afd6e9a4-1891-4845-98ea-b5d34a2bc42c], I'm having third thoughts on this:
that's why for now, I think we should just fix the automatically generated project name to this would fix issue PUBDEV-5791, as it's really a bug in this case where AutoML instance was reused in spite of them changing the target. |
JIRA Issue Migration Info Jira Issue: PUBDEV-5975 Linked PRs from JIRA |
multiple issues currently:
{{project_name = "{training_frame_id}_{hash(y)}_{hash(x)}"}}
For an example of issue related with this project_name auto-generation, cf. https://groups.google.com/forum/#!topic/h2ostream/3KQSY4BNdvY
The text was updated successfully, but these errors were encountered: