Design analysis ready Zarr to allow updating with preliminary ERA5 data. #23

alxmrs · 2023-01-27T23:31:20Z

For phase 2, we'd like to produce surface and atmospheric Zarrs that can be updated with preliminary data. Specifically, we intend to backfill the raw data covering 1959 to 1978. It's possible that in the future, ECMWF will produce an even earlier backfill. As I understand it, the standard structure of Zarr only allow appending to the end.

The aim for this issue would be to devise a means of avoiding recomputing our Zarr datasets whenever we want to include new data at earlier times.

For overall pipeline structure, I have the following sketch in mind:

For each epoch of preliminary data we ingest from ECMWF, we manually produce a new cloud optimized dataset. This can use the scripts in this project that we've already developed.
For recent data, we use Pangeo components update the cloud-optimized datasets (e.g. Making appending work in the beam refactor pangeo-forge/pangeo-forge-recipes#447). These download and append steps can be automated to a regular cadence (monthly or quarterly).
For the analysis ready version, we create XArray-Beam pipelines to transform the Zarr datasets.
After a new cloud-optimized preliminary dataset is produced, we invoke the same XArray-Beam pipelines to update the analysis ready versions.

@shoyer @rabernat: Do either of you have any thoughts on how we could structure our Zarr to these ends?

alxmrs · 2023-08-29T18:50:21Z

Fixed by @DarshanSP19 and @dabhicusp; will mark closed.

alxmrs added the phase-2 label Jan 27, 2023

alxmrs closed this as completed Aug 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design analysis ready Zarr to allow updating with preliminary ERA5 data. #23

Design analysis ready Zarr to allow updating with preliminary ERA5 data. #23

alxmrs commented Jan 27, 2023

alxmrs commented Aug 29, 2023

Design analysis ready Zarr to allow updating with preliminary ERA5 data. #23

Design analysis ready Zarr to allow updating with preliminary ERA5 data. #23

Comments

alxmrs commented Jan 27, 2023

alxmrs commented Aug 29, 2023