Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dask-on-Ray] Propagate Dask-on-Ray scheduler config to (rest of) cluster #17943

Open
clarkzinzow opened this issue Aug 19, 2021 · 3 comments
Open
Labels
core Issues that should be addressed in Ray Core core-util enhancement Request for new feature and/or capability P2 Important issue, but not time-critical size:small usability
Milestone

Comments

@clarkzinzow
Copy link
Contributor

clarkzinzow commented Aug 19, 2021

When setting the Dask-on-Ray scheduler as the Dask scheduler via a global config within the driver

dask.config.set(scheduler=ray_dask_get)
# The following will use the Dask-on-Ray scheduler
df.compute()

this config won't be propagated to any workers. Therefore, if you do some Dask computation within a Ray task or actor, it won't use the Dask-on-Ray scheduler:

dask.config.set(scheduler=ray_dask_get)

@ray.remote
def foo(df):
    # The following will NOT use the Dask-on-Ray scheduler.
    df.compute()

This can be extremely counter-intuitive, since to users, setting a global config implies that the config is set everywhere, including on other workers in their cluster.

We should find a way to either automatically propagate the Dask-on-Ray scheduler config to downstream tasks and actors, or at least provide an API that users can manually invoke that will do said propagation.

  • [Gold] Automatic config propagation.
  • [Silver] API for config propagation that the user can use.
  • [Bronze] Documentation on the need to set the global config within tasks and actors.
@clarkzinzow clarkzinzow added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Aug 19, 2021
@clarkzinzow
Copy link
Contributor Author

Somewhat related to #17927 and #16743, where we wish to propagate the Dask-on-Ray scheduler config to the Ray Client server.

@clarkzinzow clarkzinzow added P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Aug 19, 2021
@clarkzinzow
Copy link
Contributor Author

Note that Dask supports environment variables and serialized configs, but neither will suffice since we need to give a callable as the scheduler.

@amogkam
Copy link
Contributor

amogkam commented Aug 20, 2021

One option could be to have Dask on Ray be its own importable library that's just a wrapper around dask, but sets all the necessary configurations during import.

@ericl ericl added this to the Core Backlog milestone Aug 23, 2021
@ericl ericl added enhancement Request for new feature and/or capability size:small usability and removed bug Something that is supposed to be working; but isn't labels Sep 7, 2021
@ericl ericl added P2 Important issue, but not time-critical usability and removed usability P1 Issue that should be fixed within a few weeks labels Sep 30, 2021
@rkooo567 rkooo567 added the core Issues that should be addressed in Ray Core label Dec 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Issues that should be addressed in Ray Core core-util enhancement Request for new feature and/or capability P2 Important issue, but not time-critical size:small usability
Projects
None yet
Development

No branches or pull requests

5 participants