In infographic, separate legacy and domain specific formats, STAC is a spec #4

rbavery · 2023-09-18T15:50:15Z

STAC is a metadata spec and COG is a thriving format that should probably be separated from HDF, GRIB, and NetCDF in order to reduce confusion.

See https://docs.google.com/presentation/d/1k1bHj-H92LPXCKealUzqYUiOnqsCP4u27H57f6k2BcQ/edit#slide=id.g1e90ac8b347_0_31
and https://guide.cloudnativegeo.org/ which covers COG and Zarr as different cloud optimized format options.

COGs are great for 3 dimensional (x, y, band), sparse data (satellite scenes) that need to maintain geographic projection information.

jbednar · 2023-09-18T19:50:17Z

Totally! What I'm hoping is that we can add a full website to this repo that talks about issues like that and unpacks a lot of the details to guide users working in this area. That's definitely important context to convey. I tried (obviously not very successfully) to format the grey box to show that COG and STAC are the "domain specific" bits while the others are the "legacy" bits; Pandata can't build directly on STAC or COG since those are geoscience-specific and don't address the need for arbitrary domain-independent cataloging (as Intake covers) or arbitrary n-D arrays (Zarr), but they are fully valid and can be used safely with the Pandata stack without sacrificing scalability or efficiency.

The "legacy" bits of that box are also fine to use with Pandata, when combined with Kerchunk so that they become suitable for distributed access. So everything in that box is fine to use with Pandata, but is not part of Pandata, because by explicit definition only domain-independent approaches can be in Pandata itself.

Lots of other things about that diagram need similar unpacking! If there's a better word than "format" to cover what's in that box, happy to hear it! There are similar issues elsewhere, e.g. Parquet and Zarr are both formats and an associated set of libraries for reading those formats, and so while I blithely say that the things are on this slide are "packages" or "libraries", they aren't really, which is confusing. But trying to be fully accurate here will lose precision; at some point everything is just a "entity" and it's hard to say much! Any concrete ideas for specific improvements would be gratefully accepted.

rbavery changed the title ~~Separate legacy and domain specific formats, STAC is a spec~~ In infographic, separate legacy and domain specific formats, STAC is a spec Sep 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In infographic, separate legacy and domain specific formats, STAC is a spec #4

In infographic, separate legacy and domain specific formats, STAC is a spec #4

rbavery commented Sep 18, 2023 •

edited

jbednar commented Sep 18, 2023 •

edited

In infographic, separate legacy and domain specific formats, STAC is a spec #4

In infographic, separate legacy and domain specific formats, STAC is a spec #4

Comments

rbavery commented Sep 18, 2023 • edited

jbednar commented Sep 18, 2023 • edited

rbavery commented Sep 18, 2023 •

edited

jbednar commented Sep 18, 2023 •

edited