Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

combine_staggered_grid error with CESM2 #240

Open
emaroon opened this issue Jun 27, 2022 · 8 comments
Open

combine_staggered_grid error with CESM2 #240

emaroon opened this issue Jun 27, 2022 · 8 comments

Comments

@emaroon
Copy link
Contributor

emaroon commented Jun 27, 2022

I'm working on cheyenne with a minimal set of CMIP6-OMIP output and trying to combine thetao and uo on a staggered grid. I get an error with only the CESM2 output that I think might be traceable to the staggered_grid_config.yaml.

Here's what I'm doing:

catalog_file = '/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cmip6.json'
col = intake.open_esm_datastore(catalog_file)

ds_cesm2 = {}

variable_id = ['thetao','uo']

for vv in variable_id:
    cat = col.search(experiment_id = ['omip1'],
            variable_id = [vv], 
            table_id = 'Omon',
            grid_label = ['gn'],
            source_id = 'CESM2',
            member_id = 'r1i1p1f1', 
            version = 'v20190802')

    ds_cesm2[vv] = cat.to_dataset_dict( preprocess=combined_preprocessing,
        aggregate=True)

key = 'OMIP.NCAR.CESM2.omip1.Omon.gn'
grid,ds_comb = combine_staggered_grid(ds_cesm2['thetao'][key], \
                 other_ds=[ds_cesm2['uo'][key]])

and I get this error:

RuntimeError                              Traceback (most recent call last)
Input In [9], in <cell line: 3>()
      1 key = 'OMIP.NCAR.CESM2.omip1.Omon.gn'
----> 3 grid,ds_comb = combine_staggered_grid(ds_cesm2['thetao'][key], \
      4                  other_ds=[ds_cesm2['uo'][key]])

File ~/miniconda3/envs/xmoc_test/lib/python3.10/site-packages/cmip6_preprocessing/grids.py:465, in combine_staggered_grid(ds_base, other_ds, recalculate_metrics, grid_dict, **kwargs)
    463             additional_dims = [di for di in ds_new.dims if di not in ds_g.dims]
    464             if len(additional_dims) > 0:
--> 465                 raise RuntimeError(
    466                     f"While trying to parse `{ds_new.name}`, detected dims that are not in the base dataset:[{additional_dims}]"
    467                 )
    468             ds_g[ds_new.name] = ds_new
    470 # Restore dims attrs from the beginning

RuntimeError: While trying to parse `uo`, detected dims that are not in the base dataset:[['x_right']]

However, if I manually set the grid_dict, everything works:

cesm2_gdict = {'CESM2':{'gn':{'axis_shift':{'X':'right', 'Y':'right'}}}}
grid,ds_comb = combine_staggered_grid(ds_cesm2['thetao'][key], \
                 other_ds=[ds_cesm2['uo'][key]], grid_dict = cesm2_gdict)

The grid_dict for native-grid CESM2 in the yaml is different for the X axis-shift:

CESM2:
  gn:
    axis_shift:
      X: left
      Y: right

I'm still wrapping my head around xgcm, so I'm not sure if this workaround is getting me to a right answer for the wrong reason or if the staggered_grid_config.yaml needs updating. I thought I'd raise this issue here in case it is something that needs fixing. Thanks for your help!

@jbusecke
Copy link
Owner

Hi @emaroon I would say the likely culprit is that the yaml file is plain wrong. I hacked that together using the lon/lat information from the velocities published on ESGF. I have now learned that this is very error prone actually (@jdldeauna had similar experiences a while back, but I cant find that issue right now).
I think there are several ways forward, all of which should rely on the native grid information as much as possible (instead of using the lon/lat of the published datasets).

  1. The most immediate fix would be that users like you submit corrections to the existing yaml after properly ensuring that the information in the existing yaml is wrong.
  2. The more longterm but my preferred method would move all of that information over to https://github.com/pangeo-forge/CMIP6_static_grids-feedstock. I would want to encode the grid position information in the metadata of the native grids themselves.

Doing 1 would however be very helpful for 2, since we need to parse that information somehow manually for each model 😜

@jbusecke
Copy link
Owner

Also happy to have a quick call about these things, since I realize I havent really made up my mind on most of the details and that is probably quite confusing here 😜

@emaroon
Copy link
Contributor Author

emaroon commented Jun 28, 2022

I'm still getting comfortable using github, so submitting a fix for the CESM-grids in the yaml is a good exercise for me. Give me a little while, but I can put in a pull request for #1.

I double-checked the CESM2 via CMIP grid: x_u is to the right of x_t. Since all of the CESM contributions to CMIP likely used the same POP-grid (there's only one 1-degree supported POP configuration that I'm aware of), I'll update all of the CESM grids.

Phone call sometime would be great. I'm only a little confused (I think? I could be confused about how confused I really am 🤪) but it would be helpful for me to see where this is headed. I'm also involved with the current phase of the NOAA MDTF, and with more ocean-focused diagnostics now in development, I have a feeling that this staggered grid issue may be on our horizon soon.

@dcherian
Copy link

This bit of code originally from Ryan should be useful for adding the attributes:
https://github.com/NCAR/pop-tools/blob/f8dc20b9962e0fdc7b9a1995503e368ff326e3e7/pop_tools/xgcm_util.py#L35-L44

@jbusecke
Copy link
Owner

Yes! It would be nice if all model codes had code like that available haha.

@jbusecke
Copy link
Owner

I'll update all of the CESM grids.

This would be a great quick PR, that would patch this issue for now, and until we have a better solution.

@jbusecke
Copy link
Owner

Phone call sometime would be great. I'm only a little confused (I think? I could be confused about how confused I really am 🤪) but it would be helpful for me to see where this is headed. I'm also involved with the current phase of the NOAA MDTF, and with more ocean-focused diagnostics now in development, I have a feeling that this staggered grid issue may be on our horizon soon.

I think the best is to try to schedule a meeting on my calendly. Some time next week would be great.

@emaroon
Copy link
Contributor Author

emaroon commented Jul 1, 2022

Sounds great - scheduled a meeting next week on your calendly and will have that PR done by then. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants