Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add experiment heuristics to automl module (variant of Avanika PR 1362) #1507

Merged
merged 7 commits into from
Nov 26, 2021
Merged

Add experiment heuristics to automl module (variant of Avanika PR 1362) #1507

merged 7 commits into from
Nov 26, 2021

Conversation

amholler
Copy link
Collaborator

@amholler amholler commented Nov 21, 2021

This PR is a variant of Avanika's PR 1362, generated against the tf-legacy branch.
The changes relative to PR 1362 are:

  1. Corrected issue when generating config for concat model type (missing merge_dict, commented on PR)
  2. Updated to use tune.with_parameters to support ray tune on large datasets
  3. Ensure trial limit is consistently time-based (not epoch based) as intended
  4. omitted python style changes that reviewer wanted removed

This PR has been run successfully on the following datasets:

  • ames_housing, forest_cover, mushroom_edibility, synthetic_fraud [tabnet]
  • higgs[concat, large]
  • otto_group_product [concat]

Copy link
Collaborator

@tgaddair tgaddair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the new heuristics. Added some comments, let me know what you think.

ludwig/automl/automl.py Outdated Show resolved Hide resolved

encoder_defaults = {
"text": {
"bert": os.path.join(CONFIG_DIR, "text/bert_config.yaml"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While BERT gets very good performance, I feel like it's cost/performance ratio is pretty bad (running on a single v100, for example, the batch size ends up being so small you'd be better off using something else like DistilBERT). Finding a batch size small enough to work for BERT in such cases is also a challenge. But curious if you've observed something different.

Copy link
Collaborator Author

@amholler amholler Nov 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. It looks like Avanika put this code here as a placeholder; her comment indicates BERT is default
and that more robust heuristics should be added later. I personally have not done any text dataset testing.
Given that is is default, maybe it can stay until there is some updates in this area?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, we can try it out and see how it goes.

num_samples = base_automl_config["hyperopt"]["sampler"]["num_samples"]
if num_samples is not None:
# allow trials to get 2x even division, since some trials perform better than avg
base_automl_config["hyperopt"]["sampler"]["scheduler"][
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious what happens if we let the per trial max_t equal the total training time. Would this result in some of the num_samples never getting tried? I would hope that hyperband would be fair in its scheduling so that it would try all the samples for a small amount of time before letting the best trials run indefinitely.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I thought I tried this and observed starvation of trials, but let me try it again to double-check.

Copy link
Collaborator Author

@amholler amholler Nov 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I just confirmed that for an example Ray Tune async hyperband run, if specify num_samples=20
and set max_t == time_limit_s, then the first 3 trials are each run for max_t time and no other trials are run.
We can talk with the Ray Tune folks about this, but it is the behavior I observe.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming, sounds like this is the right thing to do for now, then. Would be great if we could find a way to both explore all trials and let trials run as long as they can.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just posted this question to the ray tune slack channel

# TODO (ASN): add image heuristics

# override and constrain automl config based on user specified values
if user_specified_config is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if something like OmegaConf would help simplify the merging process, particularly now that we also have a base_config and combiner config separation in addition to the user_config here.

https://omegaconf.readthedocs.io/en/2.1_branch/usage.html#omegaconf-merge

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reference to OmegaConf, looks nice!

My understanding of the complexity of this code is that it is removing config from
the hyperopt parameters section that is overlapping with the config_section.key
that is included in the user_config. I don't see a particular feature of OmegaConf
that can handle this specialized case. Apologies if I am missing something here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will end up being a refactor of some things. I can take a look in a follow-up. But my goal would be to just be able to eventually reduce the complexity to "merge a bunch of independent configs".

Copy link
Collaborator

@tgaddair tgaddair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tgaddair tgaddair merged commit e26bdce into ludwig-ai:tf-legacy Nov 26, 2021
tgaddair pushed a commit that referenced this pull request Nov 29, 2021
…2) (#1507)

Co-authored-by: Anne Holler <anne@vmware.com>
tgaddair added a commit that referenced this pull request Nov 29, 2021
…2) (#1507) (#1527)

Co-authored-by: Anne Holler <anne@vmware.com>

Co-authored-by: amholler <86269492+amholler@users.noreply.github.com>
Co-authored-by: Anne Holler <anne@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants