-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add experiment heuristics to automl module (variant of Avanika PR 1362) #1507
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the new heuristics. Added some comments, let me know what you think.
|
||
encoder_defaults = { | ||
"text": { | ||
"bert": os.path.join(CONFIG_DIR, "text/bert_config.yaml"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While BERT gets very good performance, I feel like it's cost/performance ratio is pretty bad (running on a single v100, for example, the batch size ends up being so small you'd be better off using something else like DistilBERT). Finding a batch size small enough to work for BERT in such cases is also a challenge. But curious if you've observed something different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. It looks like Avanika put this code here as a placeholder; her comment indicates BERT is default
and that more robust heuristics should be added later. I personally have not done any text dataset testing.
Given that is is default, maybe it can stay until there is some updates in this area?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, we can try it out and see how it goes.
num_samples = base_automl_config["hyperopt"]["sampler"]["num_samples"] | ||
if num_samples is not None: | ||
# allow trials to get 2x even division, since some trials perform better than avg | ||
base_automl_config["hyperopt"]["sampler"]["scheduler"][ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious what happens if we let the per trial max_t equal the total training time. Would this result in some of the num_samples
never getting tried? I would hope that hyperband would be fair in its scheduling so that it would try all the samples for a small amount of time before letting the best trials run indefinitely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I thought I tried this and observed starvation of trials, but let me try it again to double-check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I just confirmed that for an example Ray Tune async hyperband run, if specify num_samples=20
and set max_t == time_limit_s, then the first 3 trials are each run for max_t time and no other trials are run.
We can talk with the Ray Tune folks about this, but it is the behavior I observe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for confirming, sounds like this is the right thing to do for now, then. Would be great if we could find a way to both explore all trials and let trials run as long as they can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just posted this question to the ray tune slack channel
ludwig/automl/automl.py
Outdated
# TODO (ASN): add image heuristics | ||
|
||
# override and constrain automl config based on user specified values | ||
if user_specified_config is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if something like OmegaConf would help simplify the merging process, particularly now that we also have a base_config and combiner config separation in addition to the user_config here.
https://omegaconf.readthedocs.io/en/2.1_branch/usage.html#omegaconf-merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reference to OmegaConf, looks nice!
My understanding of the complexity of this code is that it is removing config from
the hyperopt parameters section that is overlapping with the config_section.key
that is included in the user_config. I don't see a particular feature of OmegaConf
that can handle this specialized case. Apologies if I am missing something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will end up being a refactor of some things. I can take a look in a follow-up. But my goal would be to just be able to eventually reduce the complexity to "merge a bunch of independent configs".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…2) (#1507) Co-authored-by: Anne Holler <anne@vmware.com>
This PR is a variant of Avanika's PR 1362, generated against the tf-legacy branch.
The changes relative to PR 1362 are:
This PR has been run successfully on the following datasets: