Skip to content
This repository has been archived by the owner on Nov 21, 2023. It is now read-only.
/ pangeo-smithy Public archive

The tool for managing pangeo-forge feedstocks.

Notifications You must be signed in to change notification settings

pangeo-forge/pangeo-smithy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Pangeo Forge

The idea of Pangeo Forge is to copy the very successful pattern of Conda Forge for crowdsourcing the curation of an analysis-ready data library.

In Conda Forge, a maintainer contributes a recipe which is used to generate a conda package from a source code tarball. Behind the scenes, CI downloads the source code, builds the package, and uploads it to a repository.

In Pangeo Forge, a maintainer contributes a recipe which is used to generate an analysis-ready cloud-based copy of a dataset in a cloud-optimized format like Zarr. Behind the scenes, CI downloads the original files from their source (e.g. FTP, HTTP, or OpenDAP), combines them using xarray, writes out the Zarr file, and uploads to cloud storage.

There are many details to work out. It is useful to think about what ingredients might go into a recipe.

Sample meta.yaml recipe.

dataset:
  # should be a unique name and valid python identifier
  name: noaa_oisst_avhrr_only
  # how should we version datasets?
  version: 0.1.0

# where do we find the files
source:
  format: netcdf
  # example: https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/access/avhrr-only/198109/avhrr-only-v2.19810901.nc
  url_pattern: https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/access/avhrr-only/{{ yyyymm }}/avhrr-only-v2.{{ yyyymmdd }}.nc
  # mirror pandas syntax
  # https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.date_range.html
  date_range:
    start: 1981-09-01
    # the end date can
    end: 2019-08-11
    freq: D

# tell xarray how to open and combine the files
combine:
  # options passed to xarray.open_dataset
  open_options:
    decode_cf: False
  combine_method: combine_by_coords
  # options passed to combine method
  combine_options:
    compat: no_conflicts

# where / how to save the data
output:
  format: zarr
  chunks:
    time: 5
  consolidated: True
  target:
    urlpath: gcs://pangeo-noaa/
    # how to deal with credentials?
    credentials: ???

about:
  home: https://www.ncdc.noaa.gov/oisst
  # how to deal with licensing?
  license: ???
  summary: 'NOAA Optimally Interpolated Sea Surface Temperature (AVHRR Only)'
  description: |
    The NOAA 1/4° daily Optimum Interpolation Sea Surface Temperature (or daily OISST) is an analysis constructed by combining observations from different platforms (satellites, ships, buoys) on a regular global grid. A spatially complete SST map is produced by interpolating to fill in gaps.
    The methodology includes bias adjustment of satellite and ship observations (referenced to buoys) to compensate for platform differences and sensor biases. This proved critical during the Mt. Pinatubo eruption in 1991, when the widespread presence of volcanic aerosols resulted in infrared satellite temperatures that were much cooler than actual ocean temperatures (Reynolds 1993).
  support_email: oisst-help@noaa.gov

About

The tool for managing pangeo-forge feedstocks.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages