Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design analysis ready Zarr to allow updating with preliminary ERA5 data. #23

Closed
alxmrs opened this issue Jan 27, 2023 · 1 comment
Closed
Labels

Comments

@alxmrs
Copy link
Collaborator

alxmrs commented Jan 27, 2023

For phase 2, we'd like to produce surface and atmospheric Zarrs that can be updated with preliminary data. Specifically, we intend to backfill the raw data covering 1959 to 1978. It's possible that in the future, ECMWF will produce an even earlier backfill. As I understand it, the standard structure of Zarr only allow appending to the end.

The aim for this issue would be to devise a means of avoiding recomputing our Zarr datasets whenever we want to include new data at earlier times.

For overall pipeline structure, I have the following sketch in mind:

  • For each epoch of preliminary data we ingest from ECMWF, we manually produce a new cloud optimized dataset. This can use the scripts in this project that we've already developed.
  • For recent data, we use Pangeo components update the cloud-optimized datasets (e.g. Making appending work in the beam refactor pangeo-forge/pangeo-forge-recipes#447). These download and append steps can be automated to a regular cadence (monthly or quarterly).
  • For the analysis ready version, we create XArray-Beam pipelines to transform the Zarr datasets.
  • After a new cloud-optimized preliminary dataset is produced, we invoke the same XArray-Beam pipelines to update the analysis ready versions.

@shoyer @rabernat: Do either of you have any thoughts on how we could structure our Zarr to these ends?

@alxmrs alxmrs added the phase-2 label Jan 27, 2023
@alxmrs
Copy link
Collaborator Author

alxmrs commented Aug 29, 2023

Fixed by @DarshanSP19 and @dabhicusp; will mark closed.

@alxmrs alxmrs closed this as completed Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant