Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In infographic, separate legacy and domain specific formats, STAC is a spec #4

Open
rbavery opened this issue Sep 18, 2023 · 1 comment

Comments

@rbavery
Copy link

rbavery commented Sep 18, 2023

STAC is a metadata spec and COG is a thriving format that should probably be separated from HDF, GRIB, and NetCDF in order to reduce confusion.

See https://docs.google.com/presentation/d/1k1bHj-H92LPXCKealUzqYUiOnqsCP4u27H57f6k2BcQ/edit#slide=id.g1e90ac8b347_0_31
and https://guide.cloudnativegeo.org/ which covers COG and Zarr as different cloud optimized format options.

COGs are great for 3 dimensional (x, y, band), sparse data (satellite scenes) that need to maintain geographic projection information.

@rbavery rbavery changed the title Separate legacy and domain specific formats, STAC is a spec In infographic, separate legacy and domain specific formats, STAC is a spec Sep 18, 2023
@jbednar
Copy link
Contributor

jbednar commented Sep 18, 2023

Totally! What I'm hoping is that we can add a full website to this repo that talks about issues like that and unpacks a lot of the details to guide users working in this area. That's definitely important context to convey. I tried (obviously not very successfully) to format the grey box to show that COG and STAC are the "domain specific" bits while the others are the "legacy" bits; Pandata can't build directly on STAC or COG since those are geoscience-specific and don't address the need for arbitrary domain-independent cataloging (as Intake covers) or arbitrary n-D arrays (Zarr), but they are fully valid and can be used safely with the Pandata stack without sacrificing scalability or efficiency.

image

The "legacy" bits of that box are also fine to use with Pandata, when combined with Kerchunk so that they become suitable for distributed access. So everything in that box is fine to use with Pandata, but is not part of Pandata, because by explicit definition only domain-independent approaches can be in Pandata itself.

Lots of other things about that diagram need similar unpacking! If there's a better word than "format" to cover what's in that box, happy to hear it! There are similar issues elsewhere, e.g. Parquet and Zarr are both formats and an associated set of libraries for reading those formats, and so while I blithely say that the things are on this slide are "packages" or "libraries", they aren't really, which is confusing. But trying to be fully accurate here will lose precision; at some point everything is just a "entity" and it's hard to say much! Any concrete ideas for specific improvements would be gratefully accepted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants