-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combine2 #122
Combine2 #122
Conversation
martindurant
commented
Feb 3, 2022
•
edited
Loading
edited
- combine on multiple dimensions at once
- allow for files with variable number of chunks (so long as the chunk size is consistent)
- allow merging into multiple variable names
- derive the coordinate variable in different ways, including file name or dataset/variable attribute
- cftime decoding (again!)
- pre processing JSONs
- post-processing output
- Don't concatenate arrays that are the same in every dataset and don't depend on the dimensions being concated.
- coords that should be int are being converted to float somewhere
- Get inlining to work
(broke when running after previous test due to state)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome Martin, good work! The docs didn't render well for me, I moved the selector list under the coo_map
parameter section and made a couple changes that helped the docs build properly
Yeah, we have some mix of numpydoc and whatever my IDE did automatically. Should have marked this WIP... |
I don't seem to know how to use github properly 😆 do you mind me committing directly to the PR? All that changed was the docstring |
Go ahead. Interestingly, I got a notice that you requested changes, but I saw no changes :) |
Coordinates to be stored internally as sets, since suplicates are always fine and make no difference for the coord of interest
Question:
Should we avoid guessing this, and require the user to specify which arrays not to concatenate? Then we can avoid all those xarray compat arguments. |
I'd very much support this. However, is there a reason why you want to specify which arrays to skip? At least for the datasets I'm working with, it is most common to concatenate variables that already have Otherwise, what if that parameter, in addition to (or instead of) a sequence, can also be a callable that decides based on the |
I didn't mean to skip, but let's say there is a 1xN variable in each dataset which is the same in every dataset, there would be no good reason to produce a MxN 2d array in the output. Sometimes you can know this, and xarray has a number of ways of guessing, including the "minimal" compat options - but my idea is to require the user to say rather than rely on obscure flags. Note that combine2 is designed to be able to get coordinates for the concat dimension(s) from a number of different sources, and so you don't necessarily know which variables depend on those dimensions or and which don't. |
Right. My only concern was that this would make the workflow that is currently supported using Edit: in case it helps, |
I am wondering whether it isn't a good thing to force users to pick the terms for each of their data variables. In some cases there will be no way to guess without downloading whole arrays' worth of data (or even then) - actually xarray gets it wrong some times too. |
(to be sure, this is for cases where the concat coordinate is not already an array in the data, so that its relationship with the other variables is not clear) |
I like the idea of people of at least allowing users to specify explicitly what to do with variables, avoiding obscure flags. I think it will be easy for most use cases, and this would make the workflows more transparent. |
@rabernat , all tasks here are done, so I would like to merge. It will break pangeo-forge recipes calling combine. We could make some version check or so until I get to fixing the other side, or just ignore it. |
I think we are ok as long as you don't make a release. If you are going to make a release that is not backwards compatible, it would be great if you could add a version check to the imports in Pangeo Forge Recipes (or even better, just update PFR to use the new code path 😁 ). |