Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect grid output when combined_preprocessing is used. #93

Open
jetesdal opened this issue Feb 15, 2021 · 12 comments
Open

Incorrect grid output when combined_preprocessing is used. #93

jetesdal opened this issue Feb 15, 2021 · 12 comments

Comments

@jetesdal
Copy link

Thanks for putting the package together! This is a great tool to deal with all the differences among CMIP6 models.

I noticed that in some cases the output does not look right when combined_preprocessing is used. Please see below two examples 'HadGEM3-GC31-MM' and 'CMCC-CM2-HR4' where I am comparing the grid cell area with and without combined_preprocessing. I can look into this further but not sure how much time I can spend on it right now. So I thought to share this here already. Any idea what's going on?

Screen Shot 2021-02-15 at 1 30 08 PM

107852374-a9121a00-6e10-11eb-9a01-b3e6f4ebe0ca

@jbusecke
Copy link
Owner

Yikes, that looks like a nasty bug. Could you tell me a bit more about the version you are using? Did you install from conda/pip or from source?

@jetesdal
Copy link
Author

Thanks, @jbusecke for the quick response! I think I am using the latest version from Github. I used the command
pip install git+https://github.com/jbusecke/cmip6_preprocessing.git --upgrade

Also, I checked the following:

In [1]: import cmip6_preprocessing
   ...: cmip6_preprocessing.__version__
Out[1]: '0.1.5.dev319+g20e3868.d20210215'

@jbusecke
Copy link
Owner

I assume this is on the pangeo google cloud deployment?

Could you paste the full code (including the catalog URL you used) here? Ill see what is going on there.

@jetesdal
Copy link
Author

Sure. I followed the steps described in intake-esm tutorial and I use the same URL:

url = 'https://raw.githubusercontent.com/NCAR/intake-esm-datastore/master/catalogs/pangeo-cmip6.json'
col = intake.open_esm_datastore(url)

You can find a notebook with the relevant code here. Currently, I have found three models with the issue, but I only looked at ~10 models (out of 53).

@jbusecke
Copy link
Owner

Awesome. I think this is caused by the reordering of longitudes, which has caused me all kinds of trouble. I am actually thinking of getting rid of that functionality altogether (#94). Checking this now.

@jbusecke
Copy link
Owner

Ok I was able to reproduce the error and it seems indeed related to the longitude ordering.

Here is a quick workaround while I try to fix that bug:

### 'HadGEM3-GC31-MM'
from cmip6_preprocessing.preprocessing import (
    rename_cmip6, 
    promote_empty_dims, 
    correct_coordinates, 
    correct_lon, 
    correct_units, 
    broadcast_lonlat,
    parse_lon_lat_bounds,
    sort_vertex_order,
    maybe_convert_bounds_to_vertex, 
    maybe_convert_vertex_to_bounds,
)
    
def modified_preprocessing(ds):
    ds = ds.copy()
    # fix naming
    ds = rename_cmip6(ds)
    # promote empty dims to actual coordinates
    ds = promote_empty_dims(ds)
    # demote coordinates from data_variables
    ds = correct_coordinates(ds)
    # broadcast lon/lat
    ds = broadcast_lonlat(ds)
    # shift all lons to consistent 0-360
    ds = correct_lon(ds)
    # fix the units
    ds = correct_units(ds)
    # replace x,y with nominal lon,lat
#     ds = replace_x_y_nominal_lat_lon(ds)
    # rename the `bounds` according to their style (bound or vertex)
    ds = parse_lon_lat_bounds(ds)
    # sort verticies in a consistent manner
    ds = sort_vertex_order(ds)
    # convert vertex into bounds and vice versa, so both are available
    ds = maybe_convert_bounds_to_vertex(ds)
    ds = maybe_convert_vertex_to_bounds(ds)
    return ds

for si in ['HadGEM3-GC31-MM', 'CMCC-ESM2', 'CMCC-CM2-HR4']:
    cat = col.search(activity_id='CMIP', grid_label='gn', source_id=si, variable_id=['areacello'])

    fig, axs = plt.subplots(ncols=2, constrained_layout=True, figsize=(20,6))

    # without combined_preprocessing
    ddict = cat.to_dataset_dict(zarr_kwargs={'consolidated':True, 'decode_times':True})
    ddict[next(iter(ddict))].areacello[0].plot(ax=axs[0])

    # with combined_preprocessing
    ddict = cat.to_dataset_dict(zarr_kwargs={'consolidated':True, 'decode_times':True},
                                preprocess=modified_preprocessing)
    ddict[next(iter(ddict))].areacello[0].plot(ax=axs[1])

    plt.show()

Let me know if that works for you.

@jetesdal
Copy link
Author

I think that works for me. Reordering of longitudes is indeed very useful but might not be essential for my analysis. Thanks a lot for looking into it so quickly!

@jetesdal
Copy link
Author

jetesdal commented Feb 18, 2021

A follow-up question that I'm just going to ask here (even though it is probably not the right place): I am seeing faulty data across various CMIP6 datasets obtained from
'https://raw.githubusercontent.com/NCAR/intake-esm-datastore/master/catalogs/pangeo-cmip6.json'
Those erroneous data are not related to using combined_preprocessing but must be in the underlying dataset or introduced when downloading the data. I'm not sure where I should report these issues. Is cmip6_preprocessing the right place?

@jbusecke
Copy link
Owner

I am actually not sure that is the most up to date catalog. @naomi-henderson has recently refactored a lot of the cloud data.

Can you try:

import intake
col = intake.open_esm_datastore("https://storage.googleapis.com/cmip6/pangeo-cmip6.json")
col

and see if the problems persist?

Otherwise I think here is always a good spot to report but https://github.com/pangeo-forge/cmip6-pipeline might be the even more appropriate spot? @naomi-henderson, are there official guidelines for reporting on the new catalog?

@naomi-henderson
Copy link

Hmmm, I am still trying to understand why the very old NCAR version of the Pangeo CMIP6 Google Cloud's JSON file is still being used. They have a JSON file for their own collection at NCAR, but anyone using the GC collection should use the JSON file in GC. Yes, @jbusecke, your link to https://storage.googleapis.com/cmip6/pangeo-cmip6.json is correct.

The re-organization of the GC version is now complete. If you are still having trouble, please report here: https://github.com/pangeo-forge/cmip6-pipeline

The AWS copy might still be out of sync for a few more days.

@jbusecke
Copy link
Owner

Hmmm, I am still trying to understand why the very old NCAR version of the Pangeo CMIP6 Google Cloud's JSON file is still being used. They have a JSON file for their own collection at NCAR, but anyone using the GC collection should use the JSON file in GC. Yes, @jbusecke, your link to https://storage.googleapis.com/cmip6/pangeo-cmip6.json is correct.

Probably partially my fault, since I put that one into the cmip6-preprocessing readme back at the cmip6-hackathon. I have to thoroughly refactor the docs and make it really clear that people need to switch!

@jetesdal
Copy link
Author

Ah, that's good to know. I will switch to https://storage.googleapis.com/cmip6/pangeo-cmip6.json and I will report any issues in https://github.com/pangeo-forge/cmip6-pipeline. Thanks, @jbusecke and @naomi-henderson!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants