Combine2 #122

martindurant · 2022-02-03T21:03:38Z

(broke when running after previous test due to state)

lsterzinger

This is awesome Martin, good work! The docs didn't render well for me, I moved the selector list under the coo_map parameter section and made a couple changes that helped the docs build properly

martindurant · 2022-02-04T19:28:44Z

Yeah, we have some mix of numpydoc and whatever my IDE did automatically. Should have marked this WIP...

lsterzinger · 2022-02-04T19:29:28Z

I don't seem to know how to use github properly 😆 do you mind me committing directly to the PR? All that changed was the docstring

mistake

martindurant · 2022-02-04T19:30:37Z

Go ahead. Interestingly, I got a notice that you requested changes, but I saw no changes :)

Coordinates to be stored internally as sets, since suplicates are always fine and make no difference for the coord of interest

martindurant · 2022-02-25T22:24:27Z

Question:

Don't concatenate arrays that are the same in every dataset and don't depend on the dimensions being concated.

Should we avoid guessing this, and require the user to specify which arrays not to concatenate? Then we can avoid all those xarray compat arguments.

keewis · 2022-03-04T09:51:07Z

Should we avoid guessing this, and require the user to specify which arrays not to concatenate?

I'd very much support this. However, is there a reason why you want to specify which arrays to skip? At least for the datasets I'm working with, it is most common to concatenate variables that already have concat_dim (so something like xarray's data_vars="minimal" works pretty well).

Otherwise, what if that parameter, in addition to (or instead of) a sequence, can also be a callable that decides based on the name, attrs, and dims (or .zattrs and .zarray) whether to skip that variable?

martindurant · 2022-03-04T17:07:32Z

is there a reason why you want to specify which arrays to skip?

I didn't mean to skip, but let's say there is a 1xN variable in each dataset which is the same in every dataset, there would be no good reason to produce a MxN 2d array in the output. Sometimes you can know this, and xarray has a number of ways of guessing, including the "minimal" compat options - but my idea is to require the user to say rather than rely on obscure flags.

Note that combine2 is designed to be able to get coordinates for the concat dimension(s) from a number of different sources, and so you don't necessarily know which variables depend on those dimensions or and which don't.

keewis · 2022-03-05T21:47:30Z

combine2 is designed to be able to get coordinates for the concat dimension(s) from a number of different sources, and so you don't necessarily know which variables depend on those dimensions or and which don't.

Right. My only concern was that this would make the workflow that is currently supported using "minimal" become more difficult. I guess I can extract that information from the metadata / zarr objects, but I wonder if this can be a helper function provided by kerchunk?

Edit: in case it helps, import pdb; pdb.set_trace() can be replaced by the builtin breakpoint(), and there's the debug-statements hook in pre-commit/pre-commit-hooks to make sure those are not committed

martindurant · 2022-03-05T22:26:30Z

I am wondering whether it isn't a good thing to force users to pick the terms for each of their data variables. In some cases there will be no way to guess without downloading whole arrays' worth of data (or even then) - actually xarray gets it wrong some times too.

martindurant · 2022-03-05T22:39:00Z

(to be sure, this is for cases where the concat coordinate is not already an array in the data, so that its relationship with the other variables is not clear)

rsignell-usgs · 2022-03-07T14:50:34Z

I like the idea of people of at least allowing users to specify explicitly what to do with variables, avoiding obscure flags. I think it will be easy for most use cases, and this would make the workflows more transparent.

martindurant · 2022-03-10T17:10:10Z

@rabernat , all tasks here are done, so I would like to merge. It will break pangeo-forge recipes calling combine. We could make some version check or so until I get to fixing the other side, or just ignore it.

rabernat · 2022-03-10T17:19:55Z

I think we are ok as long as you don't make a release. If you are going to make a release that is not backwards compatible, it would be great if you could add a version check to the imports in Pangeo Forge Recipes (or even better, just update PFR to use the new code path 😁 ).

martindurant added 4 commits February 1, 2022 12:43

poin

f64fb4a

Fairly working implementation of combine2 - with tests

0f5ab39

multi var-dim

8ef3355

point

68ec71c

martindurant mentioned this pull request Feb 4, 2022

Apply kerchunk to one-timestep-per-file (climate model) output #123

Open

martindurant added 2 commits February 4, 2022 10:34

Fix test (needs coordinate drop)

b61de0b

Fix test again

c3061a4

(broke when running after previous test due to state)

lsterzinger previously requested changes Feb 4, 2022

View reviewed changes

Cleaned up docs

e16a96e

martindurant mentioned this pull request Feb 4, 2022

Support concat and merge dimensions with HDFReferenceRecipe pangeo-forge/pangeo-forge-recipes#277

Open

martindurant added 2 commits February 7, 2022 11:26

Updates to combine

b34bd1d

Coordinates to be stored internally as sets, since suplicates are always fine and make no difference for the coord of interest

more index fix for era5

9ca539f

martindurant mentioned this pull request Feb 21, 2022

planned improvements #106

Open

7 tasks

martindurant added 2 commits February 24, 2022 12:41

Add postprocess ad output to file

f6d5dbc

Merge branch 'main' into combine2

bbccde6

Set dtype on outputs

4f2f4bb

martindurant added 2 commits March 5, 2022 12:10

Implement cftime de/coding

e925dd0

remove pdb (doh)

f82fe9e

Merge branch 'main' into combine2

83d5565

Add combine inlining

be1ecb2

martindurant added 3 commits March 10, 2022 12:23

fix test

9114240

Update notebooks a bit

e3201f3

remake tutorial

9198fbd

martindurant merged commit c774576 into fsspec:main Mar 11, 2022

martindurant deleted the combine2 branch March 11, 2022 15:19

martindurant mentioned this pull request Mar 11, 2022

Kerchunuk combine2 pangeo-forge/pangeo-forge-recipes#326

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combine2 #122

Combine2 #122

martindurant commented Feb 3, 2022 •

edited

Loading

lsterzinger left a comment

martindurant commented Feb 4, 2022

lsterzinger commented Feb 4, 2022

martindurant commented Feb 4, 2022

martindurant commented Feb 25, 2022

keewis commented Mar 4, 2022

martindurant commented Mar 4, 2022

keewis commented Mar 5, 2022 •

edited

Loading

martindurant commented Mar 5, 2022

martindurant commented Mar 5, 2022

rsignell-usgs commented Mar 7, 2022

martindurant commented Mar 10, 2022

rabernat commented Mar 10, 2022

Combine2 #122

Combine2 #122

Conversation

martindurant commented Feb 3, 2022 • edited Loading

lsterzinger left a comment

Choose a reason for hiding this comment

martindurant commented Feb 4, 2022

lsterzinger commented Feb 4, 2022

martindurant commented Feb 4, 2022

martindurant commented Feb 25, 2022

keewis commented Mar 4, 2022

martindurant commented Mar 4, 2022

keewis commented Mar 5, 2022 • edited Loading

martindurant commented Mar 5, 2022

martindurant commented Mar 5, 2022

rsignell-usgs commented Mar 7, 2022

martindurant commented Mar 10, 2022

rabernat commented Mar 10, 2022

martindurant commented Feb 3, 2022 •

edited

Loading

keewis commented Mar 5, 2022 •

edited

Loading