Skip to content

pangeo-data/pangeo-datastore-stac

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pangeo STAC Catalog

Validate STAC catalogs

This repository contains a copy of Pangeo's cloud data catalog, formatted to follow the SpatioTemporal Asset Catalog (STAC) specification. The root STAC catalog can be found at:

https://raw.githubusercontent.com/pangeo-data/pangeo-datastore-stac/master/master/catalog.json

Currently the catalogs contain:

  • Consolidated metadata Zarr group/arrays (represented through STAC Collections with assets)

In time they should be able to hold:

Motivations

The motivation behind this project is to have a version of the current cloud data catalog which can be searched and browsed regardless of language. At the moment, the current YAML-based catalogs are only accessible through Python using intake. This means that any server-side code accessing these catalogs must be written in Python, which has historically played a big role in how we have generated the website containing previews of all catalogged data:

  • Originally, the website was created using a static site generator; however, this approach ran into issues once we began catalogging data which authentication to be accessed, which could not be done through GitHub.
  • We later moved to a dynamic Flask-based website, powered by Google App Engine; this allowed us to get the proper authentication to load dataset previews on-demand, but frequently ran into memory issues which made many datasets impossible to view.

With the introduction of intake-stac, an intake extension which allows Python users to browse STAC catalogs, there is no longer a need to for the catalogs themselves to be tied to intake. Thus, a move to JSON-based STAC catalogs allows a variety of new languages (in particular JavaScript, Ruby, and PHP) access to the catalogs, without leaving behind initial Python users.

Guidelines (subject to change)

All of the Pangeo STAC catalogs are working with version 1.0.0-beta.2 of the STAC specification.

Currently, the Pangeo STAC catalog follows STAC specifications for an absolute published catalog. All preexisting intake catalogs correspond to STAC catalogs, while datasets and data collections correspond to STAC collections with extensions required to access the data being listed under the stac_extensions field.

Progress

There is still a lot of work to be done before this catalog can be considered equivalent to the current cloud data catalog. In particular:

  • Representing Zarr stores using the collection-assets extension
  • Finishing the specifications for the ESM extension to allow ESM collections to be represented
  • Filling in metadata fields in the catalog/collections with relevant information (such as description, extent, providers, and license)
  • Making sure the catalogs validate using stac-validator in conjunction with continuous integration
  • Making sure that validated catalogs can be accessed using intake-stac

About

STAC implementation of Pangeo Catalog

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published