Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scratch code #83

Closed
wants to merge 1 commit into from
Closed

Add scratch code #83

wants to merge 1 commit into from

Conversation

martindurant
Copy link
Member

No description provided.

@ashalper-usgs
Copy link

Hi Martin,

I'm working with Rich Signell on the possibility of implementing this feature. I was wondering if it is already being developed on main? Thanks.

@martindurant
Copy link
Member Author

There has been no development on this since I pushed to this branch, although I do list it as one of the things I mean to work on in #106 . What specific feature are you thinking to work on?

@ashalper-usgs
Copy link

I think it's the "combine2 and multi-axis concat?" feature that Rich asks about here. Initially we were going to write a separate package for it, but then decided it makes more sense to contribute to the project if we can.

@martindurant
Copy link
Member Author

Sounds good!
It is already possible if you were prepared to concat on each axis in turn (having grouped the input data for the earlier stage(s)); but a one-stop multi-combine would be great. What I have here did work for at least some cases.

We get concatenation coordinates purely from the file names; we align wavelength images,
which were taken in sequence, to the nearest instance of the shortest wavelength.
"""
import fsspec_reference_maker.fits

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is fits() (now kerchunk.fits()?) a function that is TBD?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in this PR: https://github.com/fsspec/kerchunk/pull/83/files#diff-e2d10fd56b4a80740b843f8bc0752225f72c82713b5c1364785613058fa68b3b

The SDO example doesn't really belong here, though, should be removed (along with any of the WCS stuff in fits.py, probably).

@@ -274,7 +235,7 @@ def _determine_dims(self):
self.extra_dims = set(ds.dims) - set(ds0.dims)
self.concat_dims = set(
k for k, v in ds.dims.items()
if k in ds0.dims and v / ds0.dims[k] == 2
if k in ds0.dims and v / ds0.dims[k] == 2
)
self.same_dims = set(ds.dims) - self.extra_dims - self.concat_dims
return ds, ds0, fss

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is make_coord() being preserved in v2, or is it deprecated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The v2 module is a complete rewrite of everything here, and doesn't depend on anything that went before. Clearly I didn't leave the PR in a good state to have mixed changed in combine and combine_v2 - but you can scavenge anything that might be useful.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks. Was the plan to replace combine.py with combine_v2.py, or supplement it? Since make_coord() was defined public, I didn't want to remove it in the event it was considered part of the kerchunk API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As much as there was a plan, the new code was supposed to replace the previous version, so long as it could handle all the existing use cases. It is fine to change the API, and this method in particular would not have been called by users anyway. Yes, it would be a good idea at this point to define what we do expect the users to call exactly. But the whole repo is experimental and not used by many people so far, so we can make as many breaking changes as we need.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent. Thanks for your help.

@martindurant
Copy link
Member Author

@ashalper-usgs , do you have any code to contribute for me to collaborate on? I think we should close this PR and start one with combine2 only (and we can re-introduce fits at another time). @rsignell-usgs 's ERA5 example on gitter recently showed the need fo a better combine: the data is chunked per day in the time dimension, but each file contains a different number of chunks according to the month of the year. Kerchunk should be able to cope with this.

@ashalper-usgs
Copy link

No @martindurant , no code to contribute right now. I agree about closing this PR. We're following you.

@rsignell-usgs
Copy link
Collaborator

@martindurant , very much looking forward to getting this resolved, as ERA5 would indeed be a very high profile dataset to get kerchunked! Okay by me to close this in favor of a new take on addressing the problem...

@martindurant martindurant mentioned this pull request Jan 28, 2022
@martindurant martindurant deleted the temp branch July 11, 2023 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants