Incorrect grid output when combined_preprocessing is used. #93

jetesdal · 2021-02-15T12:34:01Z

Thanks for putting the package together! This is a great tool to deal with all the differences among CMIP6 models.

I noticed that in some cases the output does not look right when combined_preprocessing is used. Please see below two examples 'HadGEM3-GC31-MM' and 'CMCC-CM2-HR4' where I am comparing the grid cell area with and without combined_preprocessing. I can look into this further but not sure how much time I can spend on it right now. So I thought to share this here already. Any idea what's going on?

jbusecke · 2021-02-15T18:14:54Z

Yikes, that looks like a nasty bug. Could you tell me a bit more about the version you are using? Did you install from conda/pip or from source?

jetesdal · 2021-02-16T10:48:43Z

Thanks, @jbusecke for the quick response! I think I am using the latest version from Github. I used the command
pip install git+https://github.com/jbusecke/cmip6_preprocessing.git --upgrade

Also, I checked the following:

In [1]: import cmip6_preprocessing
   ...: cmip6_preprocessing.__version__
Out[1]: '0.1.5.dev319+g20e3868.d20210215'

jbusecke · 2021-02-16T20:06:16Z

I assume this is on the pangeo google cloud deployment?

Could you paste the full code (including the catalog URL you used) here? Ill see what is going on there.

jetesdal · 2021-02-17T09:44:32Z

Sure. I followed the steps described in intake-esm tutorial and I use the same URL:

url = 'https://raw.githubusercontent.com/NCAR/intake-esm-datastore/master/catalogs/pangeo-cmip6.json'
col = intake.open_esm_datastore(url)

You can find a notebook with the relevant code here. Currently, I have found three models with the issue, but I only looked at ~10 models (out of 53).

jbusecke · 2021-02-17T12:58:05Z

Awesome. I think this is caused by the reordering of longitudes, which has caused me all kinds of trouble. I am actually thinking of getting rid of that functionality altogether (#94). Checking this now.

jbusecke · 2021-02-17T13:09:36Z

Ok I was able to reproduce the error and it seems indeed related to the longitude ordering.

Here is a quick workaround while I try to fix that bug:

### 'HadGEM3-GC31-MM'
from cmip6_preprocessing.preprocessing import (
    rename_cmip6, 
    promote_empty_dims, 
    correct_coordinates, 
    correct_lon, 
    correct_units, 
    broadcast_lonlat,
    parse_lon_lat_bounds,
    sort_vertex_order,
    maybe_convert_bounds_to_vertex, 
    maybe_convert_vertex_to_bounds,
)
    
def modified_preprocessing(ds):
    ds = ds.copy()
    # fix naming
    ds = rename_cmip6(ds)
    # promote empty dims to actual coordinates
    ds = promote_empty_dims(ds)
    # demote coordinates from data_variables
    ds = correct_coordinates(ds)
    # broadcast lon/lat
    ds = broadcast_lonlat(ds)
    # shift all lons to consistent 0-360
    ds = correct_lon(ds)
    # fix the units
    ds = correct_units(ds)
    # replace x,y with nominal lon,lat
#     ds = replace_x_y_nominal_lat_lon(ds)
    # rename the `bounds` according to their style (bound or vertex)
    ds = parse_lon_lat_bounds(ds)
    # sort verticies in a consistent manner
    ds = sort_vertex_order(ds)
    # convert vertex into bounds and vice versa, so both are available
    ds = maybe_convert_bounds_to_vertex(ds)
    ds = maybe_convert_vertex_to_bounds(ds)
    return ds

for si in ['HadGEM3-GC31-MM', 'CMCC-ESM2', 'CMCC-CM2-HR4']:
    cat = col.search(activity_id='CMIP', grid_label='gn', source_id=si, variable_id=['areacello'])

    fig, axs = plt.subplots(ncols=2, constrained_layout=True, figsize=(20,6))

    # without combined_preprocessing
    ddict = cat.to_dataset_dict(zarr_kwargs={'consolidated':True, 'decode_times':True})
    ddict[next(iter(ddict))].areacello[0].plot(ax=axs[0])

    # with combined_preprocessing
    ddict = cat.to_dataset_dict(zarr_kwargs={'consolidated':True, 'decode_times':True},
                                preprocess=modified_preprocessing)
    ddict[next(iter(ddict))].areacello[0].plot(ax=axs[1])

    plt.show()

Let me know if that works for you.

jetesdal · 2021-02-17T14:54:59Z

I think that works for me. Reordering of longitudes is indeed very useful but might not be essential for my analysis. Thanks a lot for looking into it so quickly!

jetesdal · 2021-02-18T11:04:38Z

A follow-up question that I'm just going to ask here (even though it is probably not the right place): I am seeing faulty data across various CMIP6 datasets obtained from
'https://raw.githubusercontent.com/NCAR/intake-esm-datastore/master/catalogs/pangeo-cmip6.json'
Those erroneous data are not related to using combined_preprocessing but must be in the underlying dataset or introduced when downloading the data. I'm not sure where I should report these issues. Is cmip6_preprocessing the right place?

jbusecke · 2021-02-18T14:15:39Z

I am actually not sure that is the most up to date catalog. @naomi-henderson has recently refactored a lot of the cloud data.

Can you try:

import intake
col = intake.open_esm_datastore("https://storage.googleapis.com/cmip6/pangeo-cmip6.json")
col

and see if the problems persist?

Otherwise I think here is always a good spot to report but https://github.com/pangeo-forge/cmip6-pipeline might be the even more appropriate spot? @naomi-henderson, are there official guidelines for reporting on the new catalog?

naomi-henderson · 2021-02-18T14:30:30Z

Hmmm, I am still trying to understand why the very old NCAR version of the Pangeo CMIP6 Google Cloud's JSON file is still being used. They have a JSON file for their own collection at NCAR, but anyone using the GC collection should use the JSON file in GC. Yes, @jbusecke, your link to https://storage.googleapis.com/cmip6/pangeo-cmip6.json is correct.

The re-organization of the GC version is now complete. If you are still having trouble, please report here: https://github.com/pangeo-forge/cmip6-pipeline

The AWS copy might still be out of sync for a few more days.

jbusecke · 2021-02-18T14:59:06Z

Hmmm, I am still trying to understand why the very old NCAR version of the Pangeo CMIP6 Google Cloud's JSON file is still being used. They have a JSON file for their own collection at NCAR, but anyone using the GC collection should use the JSON file in GC. Yes, @jbusecke, your link to https://storage.googleapis.com/cmip6/pangeo-cmip6.json is correct.

Probably partially my fault, since I put that one into the cmip6-preprocessing readme back at the cmip6-hackathon. I have to thoroughly refactor the docs and make it really clear that people need to switch!

jetesdal · 2021-02-18T15:09:07Z

Ah, that's good to know. I will switch to https://storage.googleapis.com/cmip6/pangeo-cmip6.json and I will report any issues in https://github.com/pangeo-forge/cmip6-pipeline. Thanks, @jbusecke and @naomi-henderson!

jbusecke mentioned this issue Feb 16, 2021

Drop the 'reordering' of longitude values. #94

Closed

jbusecke mentioned this issue Feb 18, 2021

DOCS: Note on old catalog #96

Closed

jbusecke mentioned this issue Feb 18, 2021

Confusion about 'official' catalog for pangeo cmip6 intake/intake-esm#323

Closed

jdldeauna mentioned this issue Apr 6, 2021

Lat/lon output error when processing CMCC-CESM2 #105

Closed

JoranAngevaare mentioned this issue May 30, 2023

replace_x_y_nominal_lat_lon does not work for > 360 lon coordinates #295

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect grid output when combined_preprocessing is used. #93

Incorrect grid output when combined_preprocessing is used. #93

jetesdal commented Feb 15, 2021

jbusecke commented Feb 15, 2021

jetesdal commented Feb 16, 2021

jbusecke commented Feb 16, 2021

jetesdal commented Feb 17, 2021

jbusecke commented Feb 17, 2021

jbusecke commented Feb 17, 2021

jetesdal commented Feb 17, 2021

jetesdal commented Feb 18, 2021 •

edited

Loading

jbusecke commented Feb 18, 2021

naomi-henderson commented Feb 18, 2021

jbusecke commented Feb 18, 2021

jetesdal commented Feb 18, 2021

Incorrect grid output when combined_preprocessing is used. #93

Incorrect grid output when combined_preprocessing is used. #93

Comments

jetesdal commented Feb 15, 2021

jbusecke commented Feb 15, 2021

jetesdal commented Feb 16, 2021

jbusecke commented Feb 16, 2021

jetesdal commented Feb 17, 2021

jbusecke commented Feb 17, 2021

jbusecke commented Feb 17, 2021

jetesdal commented Feb 17, 2021

jetesdal commented Feb 18, 2021 • edited Loading

jbusecke commented Feb 18, 2021

naomi-henderson commented Feb 18, 2021

jbusecke commented Feb 18, 2021

jetesdal commented Feb 18, 2021

jetesdal commented Feb 18, 2021 •

edited

Loading