-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect grid output when combined_preprocessing is used. #93
Comments
Yikes, that looks like a nasty bug. Could you tell me a bit more about the version you are using? Did you install from conda/pip or from source? |
Thanks, @jbusecke for the quick response! I think I am using the latest version from Github. I used the command Also, I checked the following:
|
I assume this is on the pangeo google cloud deployment? Could you paste the full code (including the catalog URL you used) here? Ill see what is going on there. |
Sure. I followed the steps described in intake-esm tutorial and I use the same URL:
You can find a notebook with the relevant code here. Currently, I have found three models with the issue, but I only looked at ~10 models (out of 53). |
Awesome. I think this is caused by the reordering of longitudes, which has caused me all kinds of trouble. I am actually thinking of getting rid of that functionality altogether (#94). Checking this now. |
Ok I was able to reproduce the error and it seems indeed related to the longitude ordering. Here is a quick workaround while I try to fix that bug: ### 'HadGEM3-GC31-MM'
from cmip6_preprocessing.preprocessing import (
rename_cmip6,
promote_empty_dims,
correct_coordinates,
correct_lon,
correct_units,
broadcast_lonlat,
parse_lon_lat_bounds,
sort_vertex_order,
maybe_convert_bounds_to_vertex,
maybe_convert_vertex_to_bounds,
)
def modified_preprocessing(ds):
ds = ds.copy()
# fix naming
ds = rename_cmip6(ds)
# promote empty dims to actual coordinates
ds = promote_empty_dims(ds)
# demote coordinates from data_variables
ds = correct_coordinates(ds)
# broadcast lon/lat
ds = broadcast_lonlat(ds)
# shift all lons to consistent 0-360
ds = correct_lon(ds)
# fix the units
ds = correct_units(ds)
# replace x,y with nominal lon,lat
# ds = replace_x_y_nominal_lat_lon(ds)
# rename the `bounds` according to their style (bound or vertex)
ds = parse_lon_lat_bounds(ds)
# sort verticies in a consistent manner
ds = sort_vertex_order(ds)
# convert vertex into bounds and vice versa, so both are available
ds = maybe_convert_bounds_to_vertex(ds)
ds = maybe_convert_vertex_to_bounds(ds)
return ds
for si in ['HadGEM3-GC31-MM', 'CMCC-ESM2', 'CMCC-CM2-HR4']:
cat = col.search(activity_id='CMIP', grid_label='gn', source_id=si, variable_id=['areacello'])
fig, axs = plt.subplots(ncols=2, constrained_layout=True, figsize=(20,6))
# without combined_preprocessing
ddict = cat.to_dataset_dict(zarr_kwargs={'consolidated':True, 'decode_times':True})
ddict[next(iter(ddict))].areacello[0].plot(ax=axs[0])
# with combined_preprocessing
ddict = cat.to_dataset_dict(zarr_kwargs={'consolidated':True, 'decode_times':True},
preprocess=modified_preprocessing)
ddict[next(iter(ddict))].areacello[0].plot(ax=axs[1])
plt.show() Let me know if that works for you. |
I think that works for me. Reordering of longitudes is indeed very useful but might not be essential for my analysis. Thanks a lot for looking into it so quickly! |
A follow-up question that I'm just going to ask here (even though it is probably not the right place): I am seeing faulty data across various CMIP6 datasets obtained from |
I am actually not sure that is the most up to date catalog. @naomi-henderson has recently refactored a lot of the cloud data. Can you try: import intake
col = intake.open_esm_datastore("https://storage.googleapis.com/cmip6/pangeo-cmip6.json")
col and see if the problems persist? Otherwise I think here is always a good spot to report but https://github.com/pangeo-forge/cmip6-pipeline might be the even more appropriate spot? @naomi-henderson, are there official guidelines for reporting on the new catalog? |
Hmmm, I am still trying to understand why the very old NCAR version of the Pangeo CMIP6 Google Cloud's JSON file is still being used. They have a JSON file for their own collection at NCAR, but anyone using the GC collection should use the JSON file in GC. Yes, @jbusecke, your link to The re-organization of the GC version is now complete. If you are still having trouble, please report here: https://github.com/pangeo-forge/cmip6-pipeline The AWS copy might still be out of sync for a few more days. |
Probably partially my fault, since I put that one into the cmip6-preprocessing readme back at the cmip6-hackathon. I have to thoroughly refactor the docs and make it really clear that people need to switch! |
Ah, that's good to know. I will switch to |
Thanks for putting the package together! This is a great tool to deal with all the differences among CMIP6 models.
I noticed that in some cases the output does not look right when combined_preprocessing is used. Please see below two examples 'HadGEM3-GC31-MM' and 'CMCC-CM2-HR4' where I am comparing the grid cell area with and without combined_preprocessing. I can look into this further but not sure how much time I can spend on it right now. So I thought to share this here already. Any idea what's going on?
The text was updated successfully, but these errors were encountered: