Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tree-reduce the combine for open_mfdataset(..., parallel=True, combine="nested") #8523

Open
dcherian opened this issue Dec 5, 2023 · 4 comments

Comments

@dcherian
Copy link
Contributor

dcherian commented Dec 5, 2023

Is your feature request related to a problem?

When parallel=True and a distributed client is active, Xarray reads every file in parallel, constructs a Dataset per file with indexed coordinates loaded, and then sends all of that back to the "head node" for the combine.

Instead we can tree-reduce the combine (example) by switching to dask.bag instead of dask.delayed and skip the overhead of shipping 1000s of copies of an indexed coordinate back to the head node.

  1. The downside is the dask graph is "worse" but perhaps that shouldn't stop us.
  2. I think this is only feasible for combine="nested"

cc @TomNicholas

@TomNicholas
Copy link
Contributor

Oh this is an interesting idea...

How much faster is this? What does the graph look like? (The notebook in the gist doesn't seem to show either)

skip the overhead of shipping 1000s of copies of an indexed coordinate back to the head node.

What is this proposal doing instead? Don't the coordinates still ultimately get shipped to be on the same node in order to do the alignment?

@dcherian
Copy link
Contributor Author

dcherian commented Dec 5, 2023

How much faster is this?

Haven't tested, happy to say I don't use open_mfdataset any more :) . I am just posting this experiment so someone else can pursue it if they want.

Don't the coordinates still ultimately get shipped to be on the same node in order to do the alignment?

No it'll execute the combine 8 datasets at a time, then combine the results of that step 8 datasets at a time, and so on remotely and ship the final combined dataset back to the head node.

@TomNicholas
Copy link
Contributor

Haven't tested, happy to say I don't use open_mfdataset any more :)

I used it the first time today in a while 😅 Mostly because of fsspec/kerchunk#386

No it'll execute the combine 8 datasets at a time, then combine the results of that step 8 datasets at a time, and so on remotely and ship the final combined dataset back to the head node.

I'm definitely missing something, but like won't the same amount of data still need to get moved around in the end? This is potentially faster just because the communication doesn't all clobber the lone head node at once?

@dcherian
Copy link
Contributor Author

but like won't the same amount of data still need to get moved around in the end?

In the coiled pattern where you orchestrate remote workers but download results to the user's machine machine, this is a lot of copies moving to the user's machine. I agree that this is less of a concern in remote JupyterHub deployments, or in HPC environments; but I bet you'll still see an improvement when opening O(10,000) files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants