Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tune] Automatically detect/activate proper SyncConfig given autoscaler #11867

Closed
richardliaw opened this issue Nov 6, 2020 · 4 comments
Closed
Assignees
Labels
enhancement Request for new feature and/or capability P2 Important issue, but not time-critical tune Tune-related issues
Milestone

Comments

@richardliaw
Copy link
Contributor

Describe your feature request

This is a common failure mode.

def train_func(config, checkpoint_dir):
    ...
    with tune.checkpoint_dir(...) as ckpt:
       ...
    tune.report()

One common problem is that the user is using K8s or Docker with the autoscaler, and they do not remember to set DockerSyncer or KubernetesSyncer in the SyncConfig.

Instead, we should automatically detect the autoscaler configuration presence and activate these syncers without the user knowing about it.

@krfricke

cc @mkoh-asapp @richardrl

@richardliaw richardliaw added enhancement Request for new feature and/or capability triage Needs triage (eg: priority, bug/not-bug, and owning component) autoscaler P2 Important issue, but not time-critical tune Tune-related issues and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Nov 6, 2020
@richardliaw
Copy link
Contributor Author

This requires a utility function for identifying whether it's using an autoscaling cluster.

@mkoh-asapp
Copy link

I missed the part in the docs explaining this so I had to have Richard explain it to me. But even after I set the syncer on tune.run, it wasn't working for me. I realized that this is because we are passing Experiment directly to tune.run, instead of passing a function or class. Once I set the syncer on the Experiments, then it worked, but that was not very clear from the docs.

Maybe it isn't a standard way to use tune (creating Experiments manually), but it might be nice to have that explained somewhere.

Just wanted to bring up a point to consider. Thanks 🎉

@bllchmbrs
Copy link
Contributor

bllchmbrs commented Nov 17, 2020

does setting sync_to_driver=False in tune.run(...) silence the error?

The answer to 👆 is no

@krfricke
Copy link
Contributor

Closed via #12108

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for new feature and/or capability P2 Important issue, but not time-critical tune Tune-related issues
Projects
None yet
Development

No branches or pull requests

5 participants