[REQUEST]: HighResMIP HadGEM vars for ocean sound speed calculation #72

rsignell · 2023-11-26T16:50:52Z

List of requested idds

'CMIP6.HighResMIP.MOHC.HadGEM3-GC31-HH.highres-future.r1i1p1f1.Omon.so.gn.v20200514',
 'CMIP6.HighResMIP.NERC.HadGEM3-GC31-HH.hist-1950.r1i1p1f1.Omon.so.gn.v20200514',
 'CMIP6.HighResMIP.NERC.HadGEM3-GC31-HH.hist-1950.r1i1p1f1.Omon.thetao.gn.v20200514',
 'CMIP6.HighResMIP.MOHC.HadGEM3-GC31-HH.highres-future.r1i1p1f1.Omon.thetao.gn.v20200514'

Description

We are working on Climate Change impacts on sound speed, and this dataset is believed to be the best for this purpose. This is about 3TB of data files.

rsignell · 2023-11-26T16:53:34Z

ping @gzt5142

jbusecke · 2023-11-28T05:04:40Z

Ok I merged #73, lets see how that goes...thumbs pressed

jbusecke · 2023-11-28T13:29:20Z

Hmm the full runs failed with

Cannot combine fragments because they do not form a regular hypercube.

Here are the respective dataflow
https://console.cloud.google.com/dataflow/jobs/us-central1/2023-11-27_22_54_46-3551523587034016837;mainTab=JOB_GRAPH;bottomTab=JOB_LOGS;logsSeverity=ERROR;graphView=0?project=leap-pangeo&pageState=(%22dfTime%22:(%22l%22:%22dfJobMaxTime%22))

https://console.cloud.google.com/dataflow/jobs/us-central1/2023-11-27_22_54_53-630489811718491371;mainTab=JOB_GRAPH;graphView=0?project=leap-pangeo&pageState=(%22dfTime%22:(%22l%22:%22dfJobMaxTime%22))

https://console.cloud.google.com/dataflow/jobs/us-central1/2023-11-27_22_55_00-4931705266925495521;mainTab=JOB_GRAPH;bottomTab=JOB_LOGS;logsSeverity=ERROR;graphView=0?project=leap-pangeo&pageState=(%22dfTime%22:(%22l%22:%22dfJobMaxTime%22))

@cisaacstern what could cause this issue? I will try to dig into this a bit more later. Sorry for the delay.

rsignell · 2023-11-29T05:21:27Z

When we processed these with Kerchunk we did them as two datasets: hires-future and hist-1950. Each of these had two variables, so and thetao. So for each dataset we used kerchunk to combine each variable along the time dimension, and then merge_vars to create a single JSON for each of the two datasets.

Should we have split the issue into two different issues, with:

'CMIP6.HighResMIP.MOHC.HadGEM3-GC31-HH.highres-future.r1i1p1f1.Omon.so.gn.v20200514',
 'CMIP6.HighResMIP.MOHC.HadGEM3-GC31-HH.highres-future.r1i1p1f1.Omon.thetao.gn.v20200514'

in one and

 'CMIP6.HighResMIP.NERC.HadGEM3-GC31-HH.hist-1950.r1i1p1f1.Omon.so.gn.v20200514',
 'CMIP6.HighResMIP.NERC.HadGEM3-GC31-HH.hist-1950.r1i1p1f1.Omon.thetao.gn.v20200514',

in the other?

jbusecke · 2023-11-29T16:42:21Z

@rsignell no that should not be an issue. We process every dataset separately anyways. There is something else going on. Will look into this after the talk 😆

cisaacstern · 2023-11-30T17:35:30Z

The regular hypercube error appears to be the same issue discussed in pangeo-forge/pangeo-forge-recipes#520. If so, this would seem to be a corner case bug relating to certain chunking scenarios. As documented on the linked issue, the next step there is for @tom-the-hill to open a PR with a minimal reproducer failing test.

rsignell · 2024-01-05T22:07:10Z

@cisaacstern or @jbusecke Any updates here?

jbusecke · 2024-01-18T15:14:36Z

Unfortunately we are still blocked by the above bug in PGF-recipes. The good news is that once that is fixed we might be able to get the data straight into the public bucket! Is there a concrete deadline?

rsignell · 2024-01-18T22:27:25Z

Nope. No deadline. I was just curious about the status. Thanks!

jbusecke · 2024-02-12T22:52:11Z

Still working on this @rsignell. It seems like there was a bug in my legacy notebook and the current versions of your requested datasets only have 2 timesteps (#76 describes the culprit), but I just resubmitted them manually (#98) and they are currently running...

jbusecke · 2024-03-29T15:33:39Z

Ok I think we have removed the main roadblock here temporarily (gnarly, gnarly stuff really).
So far it seems like we were not getting any urls for your iids from the API, but ill retry frequently. You can check the progress like this:

def zstore_to_iid(zstore: str):
    # this is a bit whacky to account for the different way of storing old/new stores
    return '.'.join(zstore.replace('gs://','').replace('.zarr','').replace('.','/').split('/')[-11:-1])

iids_requested = [
'CMIP6.HighResMIP.MOHC.HadGEM3-GC31-HH.highres-future.r1i1p1f1.Omon.so.gn.v20200514',
 'CMIP6.HighResMIP.NERC.HadGEM3-GC31-HH.hist-1950.r1i1p1f1.Omon.so.gn.v20200514',
 'CMIP6.HighResMIP.NERC.HadGEM3-GC31-HH.hist-1950.r1i1p1f1.Omon.thetao.gn.v20200514',
 'CMIP6.HighResMIP.MOHC.HadGEM3-GC31-HH.highres-future.r1i1p1f1.Omon.thetao.gn.v20200514',
]

import intake
# uncomment/comment lines to swap catalogs
url = "https://storage.googleapis.com/cmip6/cmip6-pgf-ingestion-test/catalog/catalog.json"
col = intake.open_esm_datastore(url)

iids_all= [zstore_to_iid(z) for z in col.df['zstore'].tolist()]
iids_uploaded = [iid for iid in iids_all if iid in iids_requested]
iids_uploaded

Since we got 'some' of the urls during #73, I am currently assuming this will just resolve itself with time, but there might be some bugs either in pangeo-forge-esgf or the ESGF API itself that do not return all the supposedly available urls (I feel that #119 might be similar).

Lets keep an eye on this for now and ill try to investigate more deeply later.

jbusecke · 2024-05-08T21:46:00Z

I just checked, and they are all ingested! Please reopen if there are issues on your end.

jbusecke · 2024-05-09T03:01:46Z

Meeep. That is not looking great...

I hope I can fix these soon (#76 is relevant).

rsignell · 2024-05-09T05:57:52Z

Darn. Very much still interested in this @jbusecke. Thanks for continuing to push!

jbusecke · 2024-05-09T17:48:31Z

Ill get there...

jbusecke · 2024-05-12T00:05:03Z

gs://cmip6/cmip6-pgf-ingestion-test/zarr_stores/9044158586_1/CMIP6.HighResMIP.MOHC.HadGEM3-GC31-HH.highres-future.r1i1p1f1.Omon.so.gn.v20200514.zarr

gs://cmip6/cmip6-pgf-ingestion-test/zarr_stores/9044158586_1/CMIP6.HighResMIP.MOHC.HadGEM3-GC31-HH.highres-future.r1i1p1f1.Omon.thetao.gn.v20200514.zarr

rsignell · 2024-05-12T18:21:00Z

Check this out @andreall!!

jbusecke · 2024-05-13T14:15:59Z

Just as a headsup, these jobs are absolutely massive, and for now I have to babysit them manually. For whatever reason the other experiment_id did seem unavailable at the time of running these (from the ESGF side), so please feel free to ping me in a few days.

Eventually I think we will be able to handle such large datasets better with more efficient downloading upstream.

rsignell · 2024-05-13T15:26:53Z

Yes, the HiRESMIP data is massive. This is only 2 variables!
I'm happy we have contributed a nice stress test. 😸

rsignell added the request Requests for new data to be ingested to the cloud label Nov 26, 2023

jbusecke mentioned this issue Nov 27, 2023

Test High-Res MIP #73

Merged

cisaacstern mentioned this issue Nov 30, 2023

StoreToZarr _invert_meshgrid() assertion error. pangeo-forge/pangeo-forge-recipes#520

Closed

This was referenced Feb 12, 2024

Check time steps against expectations as part of the QC tests #99

Open

How to overwrite. We need a way to reliably trigger a new write of a given iid #76

Open

jbusecke added cant find urls in progress labels Mar 29, 2024

jbusecke closed this as completed May 8, 2024

jbusecke reopened this May 9, 2024

jbusecke mentioned this issue May 9, 2024

Possibly deployed short stores #156

Closed

jbusecke mentioned this issue May 11, 2024

Random vs 'deterministic' data_node selection #160

Open

rsignell closed this as completed Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REQUEST]: HighResMIP HadGEM vars for ocean sound speed calculation #72

[REQUEST]: HighResMIP HadGEM vars for ocean sound speed calculation #72

rsignell commented Nov 26, 2023 •

edited

Loading

rsignell commented Nov 26, 2023

jbusecke commented Nov 28, 2023

jbusecke commented Nov 28, 2023

rsignell commented Nov 29, 2023 •

edited

Loading

jbusecke commented Nov 29, 2023

cisaacstern commented Nov 30, 2023

rsignell commented Jan 5, 2024

jbusecke commented Jan 18, 2024

rsignell commented Jan 18, 2024

jbusecke commented Feb 12, 2024

jbusecke commented Mar 29, 2024

jbusecke commented May 8, 2024

jbusecke commented May 9, 2024

rsignell commented May 9, 2024

jbusecke commented May 9, 2024

jbusecke commented May 12, 2024 •

edited

Loading

rsignell commented May 12, 2024

jbusecke commented May 13, 2024

rsignell commented May 13, 2024

[REQUEST]: HighResMIP HadGEM vars for ocean sound speed calculation #72

[REQUEST]: HighResMIP HadGEM vars for ocean sound speed calculation #72

Comments

rsignell commented Nov 26, 2023 • edited Loading

List of requested idds

Description

rsignell commented Nov 26, 2023

jbusecke commented Nov 28, 2023

jbusecke commented Nov 28, 2023

rsignell commented Nov 29, 2023 • edited Loading

jbusecke commented Nov 29, 2023

cisaacstern commented Nov 30, 2023

rsignell commented Jan 5, 2024

jbusecke commented Jan 18, 2024

rsignell commented Jan 18, 2024

jbusecke commented Feb 12, 2024

jbusecke commented Mar 29, 2024

jbusecke commented May 8, 2024

jbusecke commented May 9, 2024

rsignell commented May 9, 2024

jbusecke commented May 9, 2024

jbusecke commented May 12, 2024 • edited Loading

rsignell commented May 12, 2024

jbusecke commented May 13, 2024

rsignell commented May 13, 2024

rsignell commented Nov 26, 2023 •

edited

Loading

rsignell commented Nov 29, 2023 •

edited

Loading

jbusecke commented May 12, 2024 •

edited

Loading