Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring the recipe into dict-object #9

Merged
merged 92 commits into from
Sep 1, 2023
Merged

Conversation

jbusecke
Copy link
Collaborator

@jbusecke jbusecke commented Aug 7, 2023

This PR is the result of another great pair-programming session with @cisaacstern just now.
I think it would be beneficial to wait for some upstream changes to be cleaned and merged, and merge this PR in a cleaner form.

Nice to haves (but not blocking this PR directly):

Won't close #11 anymore. See pangeo-forge/pangeo-forge-recipes#559 for upstream implementation suggestion.

@jbusecke
Copy link
Collaborator Author

jbusecke commented Aug 7, 2023

To close out the night with a success @cisaacstern
image
Both iids in the recipe submission passed.

@jbusecke
Copy link
Collaborator Author

jbusecke commented Aug 7, 2023

But of course I could not stop myself from adding a few more iids for testing!
image

@jbusecke
Copy link
Collaborator Author

jbusecke commented Aug 7, 2023

image

@jbusecke
Copy link
Collaborator Author

jbusecke commented Aug 7, 2023

import xarray as xr
path = 'gs://leap-persistent-ro/data-library/cmip6-testing/a618127503-5790271747-1/CMIP6.ScenarioMIP.NIMS-KMA.UKESM1-0-LL.ssp245.r15i1p1f2.day.psl.gn.v20210427.zarr'
ds = xr.open_dataset(path, engine='zarr', chunks={})
ds

gives
image
With nice ~2 year chunks and a good chunksize.

@Timh37 the raw data for this batch is in gs://leap-persistent-ro/data-library/cmip6-testing/a618127503-5790271747-1. If you could check these few stores, that would definitely help me move along. I will push on this and also make you a combined catalog, but am not quite there yet.

@jbusecke
Copy link
Collaborator Author

jbusecke commented Aug 7, 2023

This is awesome, I am beyond happy @cisaacstern thank you so much for your help today.

@cisaacstern
Copy link
Contributor

@jbusecke this is incredible. Your gif selections had me actually lol'ing at my desk by myself. I'm so happy to see this. You and I have been iterating for a while to find the right design for this, and it's so rewarding to see that we're finally on a tractable path.

@Timh37
Copy link

Timh37 commented Aug 8, 2023

import xarray as xr
path = 'gs://leap-persistent-ro/data-library/cmip6-testing/a618127503-5790271747-1/CMIP6.ScenarioMIP.NIMS-KMA.UKESM1-0-LL.ssp245.r15i1p1f2.day.psl.gn.v20210427.zarr'
ds = xr.open_dataset(path, engine='zarr', chunks={})
ds

gives image With nice ~2 year chunks and a good chunksize.

@Timh37 the raw data for this batch is in gs://leap-persistent-ro/data-library/cmip6-testing/a618127503-5790271747-1. If you could check these few stores, that would definitely help me move along. I will push on this and also make you a combined catalog, but am not quite there yet.

Fantastic, will do that, thank you. By the looks of it that timeseries looks too short though - 21600 daily mean timesteps would be just short of 60 years. A complete 2015-2100 run would be longer?

@Timh37
Copy link

Timh37 commented Aug 8, 2023

@jbusecke Checked the UKESM1-0-LL data and found that most of these datasets are incomplete (in terms of timesteps). As far as I can see the historical & SSP2-4.5, variant r14i1p1f2 stores are complete but the rest isn't.

@jbusecke
Copy link
Collaborator Author

jbusecke commented Aug 8, 2023

Thank for catching that @Timh37.

I just picked a random example here:

"CMIP6.CMIP.NIMS-KMA.UKESM1-0-LL.historical.r15i1p1f2.day.psl.gn.v20210426": 
["https://esgf-data1.llnl.gov/thredds/fileServer/css03_data/CMIP6/CMIP/NIMS-KMA/UKESM1-0-LL/historical/r15i1p1f2/day/psl/gn/v20210426/psl_day_UKESM1-0-LL_historical_r15i1p1f2_gn_18500101-18511230.nc",
"https://esgf-data1.llnl.gov/thredds/fileServer/css03_data/CMIP6/CMIP/NIMS-KMA/UKESM1-0-LL/historical/r15i1p1f2/day/psl/gn/v20210426/psl_day_UKESM1-0-LL_historical_r15i1p1f2_gn_18520101-18531230.nc",
"https://esgf-data1.llnl.gov/thredds/fileServer/css03_data/CMIP6/CMIP/NIMS-KMA/UKESM1-0-LL/historical/r15i1p1f2/day/psl/gn/v20210426/psl_day_UKESM1-0-LL_historical_r15i1p1f2_gn_18540101-18551230.nc",
"https://esgf-data1.llnl.gov/thredds/fileServer/css03_data/CMIP6/CMIP/NIMS-KMA/UKESM1-0-LL/historical/r15i1p1f2/day/psl/gn/v20210426/psl_day_UKESM1-0-LL_historical_r15i1p1f2_gn_18600101-18611230.nc",
"https://esgf-data1.llnl.gov/thredds/fileServer/css03_data/CMIP6/CMIP/NIMS-KMA/UKESM1-0-LL/historical/r15i1p1f2/day/psl/gn/v20210426/psl_day_UKESM1-0-LL_historical_r15i1p1f2_gn_18620101-18631230.nc",
"https://esgf-data1.llnl.gov/thredds/fileServer/css03_data/CMIP6/CMIP/NIMS-KMA/UKESM1-0-LL/historical/r15i1p1f2/day/psl/gn/v20210426/psl_day_UKESM1-0-LL_historical_r15i1p1f2_gn_18640101-18651230.nc",
"https://esgf-data1.llnl.gov/thredds/fileServer/css03_data/CMIP6/CMIP/NIMS-KMA/UKESM1-0-LL/historical/r15i1p1f2/day/psl/gn/v20210426/psl_day_UKESM1-0-LL_historical_r15i1p1f2_gn_18660101-18671230.nc"
], 

and it seems like the list of URLs is incomplete.

The good news is that this is probably an easy fix upstream (I made these with a notebook and a dev version of pangeo-forge-esgf), and the actual methodology seems to work here. I do not expect this to necessarily fail when we get all files.

Ill investigate whats going on. Thanks for helping here! I would have missed this for a while

@jbusecke
Copy link
Collaborator Author

jbusecke commented Aug 8, 2023

Ok I think this is just a short run. I looked it up on the ESGF search web site and got this:
image
AFAICT these are the same files we picked up.
@Timh37 I am not aware of any attribute that we could use to identify incomplete runs, if you could investigate that, this would be a helpful thing.
Also note that these members actually have f2 in their signature which indicates different forcings (not sure that matters for your science, just wanted to point it out).
I think there are a bunch of experiments that were aborted (for whatever reason, probably needs individual investigation) and still uploaded. I think in this case here.

@Timh37
Copy link

Timh37 commented Aug 8, 2023

@jbusecke Great, sounds like it's working as it should then. I don't think such an attribute exists but in principle it's possible to work out what length a simulation should have given its calendar and its start and end dates. If incomplete files are picked up as they are on ESGF that's fine with me, as my processing chain should be able to handle these and eventually filter them out for operations for which I require simulations to fully cover certain time periods. I guess something similar could (or may already) be implemented in xmip? We may want to come up with a way of extending otherwhise complete simulations running to 2099-12 instead of 2100-12 by a year.

Some models seem to have f!=1 as their 'main' ScenarioMIP setting, including the UK one. Generally, I search for the 'ipf' with the most 'r' for each model, so that the 'ipf' may differ between models but never between variants or experiments of the same model. So that shouldn't be an issue.

@jbusecke
Copy link
Collaborator Author

Closes #30 with jbusecke/pangeo-forge-esgf@d19e800

@jbusecke
Copy link
Collaborator Author

Just diagnosed some issues with "CMIP6.ScenarioMIP.MIROC.MIROC6.ssp245.r47i1p1f1.day.psl.gn.v20210917"
dataflow_job

This was affected by jbusecke/pangeo-forge-esgf#17. I will again delete the noqc catalog and rerun.

@jbusecke
Copy link
Collaborator Author

jbusecke commented Sep 1, 2023

As discussed with @cisaacstern I have moved the requirements in the first comments to separate issues and will merge this to make reviewing changes easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add provenance metadata (urls, timestamp) to zarr store Label based runs not compatible with CMIP6 iids
4 participants