Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace_x_y_nominal_lat_lon does not work for > 360 lon coordinates #295

Closed
JoranAngevaare opened this issue May 30, 2023 · 2 comments · Fixed by #365
Closed

replace_x_y_nominal_lat_lon does not work for > 360 lon coordinates #295

JoranAngevaare opened this issue May 30, 2023 · 2 comments · Fixed by #365

Comments

@JoranAngevaare
Copy link
Contributor

JoranAngevaare commented May 30, 2023

Thanks a lot for this great tool that comes in super handy for harmonizing CMIP data

The bug

When using data with more than 360 lon coordinates, the replace_x_y_nominal_lat_lon will not work as a 360 periodicity is assumed in _interp_nominal_lon. I noticed this behavior in EC-Earth-Consortium EC-Earth3 ssp585 r1i1p1f1 Amon tas gr v20200310, which has this subdegree gridding (lon being 512 datapoints long).

from xmip.preprocessing import promote_empty_dims, replace_x_y_nominal_lat_lon, rename_cmip6, correct_coordinates, broadcast_lonlat

ds = xr.open_mfdataset( ... )

def recast(data_set):
    ds = rename_cmip6(data_set)
    ds = promote_empty_dims(ds)
    ds = broadcast_lonlat(ds)
    ds = replace_x_y_nominal_lat_lon(ds)
    return ds

ds.isel(time=0)['tas'].plot()
plt.show()
recast(ds).isel(time=0)['tas'].plot()
plt.show()

image
image

Expected behavior

Don't suffle the data when interpolating with sub-degree longitude gridding. This can be solved by not assuming a 360 periodicity.

Underlying cause:

numpy.interp takes the period argument, but this is in the x-coordinate, however, in _interp_nominal_lon, the period is set to 360, this causes funky behavior in this function when you feed it an array with >360 entries, as they are sorted by their arg index.

MWE:

Maybe a small example might be helpful here:

# Create some dummy data
import numpy as np
import xarray as xr
from xmip.preprocessing import promote_empty_dims, replace_x_y_nominal_lat_lon, rename_cmip6, correct_coordinates, broadcast_lonlat
import matplotlib.pyplot as plt

lon=np.linspace(0,360, 513)[:-1]
lat=np.linspace(-90,90, 181)[:-1]
time=np.arange(1)
# Totally arbitrary data
data=np.arange(len(lat)*len(lon)*len(time)).reshape(len(time), len(lat), len(lon))*lon

# Add some NaN values just as an example
data[:, :, len(lon)//2+30: len(lon)//2+50] = np.nan

ds_dummy = xr.Dataset(
    data_vars = dict(var=(('time','lat', 'lon'), data,)),
    coords = dict(time=time, lon=lon, lat=lat,),
    attrs=dict(source_id='bla'))
ds_dummy['var'].isel(time=0).plot(cbar_kwargs=dict(orientation='horizontal'))

image

#  Now apply the recast using replace_x_y_nominal_lat_lon
def recast(data_set):
    ds = rename_cmip6(data_set)
    ds = promote_empty_dims(ds)
    ds = broadcast_lonlat(ds)
    ds = replace_x_y_nominal_lat_lon(ds)
    return ds

ds_dum_recast = recast(ds_dummy)
replace_x_y_nominal_lat_lon(ds_dum_recast)['var'].isel(time=0).plot(cbar_kwargs=dict(orientation='horizontal'),)

image

This second imagine is clearly not what we'd hoped for.

Versions

import sys
import numpy, xarray, xmip
message = f'{sys.platform}\n{sys.version}'
for b in 'numpy xarray xmip'.split():
    message += f'\n\t{b}=={locals().get(b).__version__}'
print(message)

linux
3.8.16 (default, Mar  2 2023, 03:21:46) 
[GCC 11.2.0]
	numpy==1.24.3
	xarray==2023.1.0
	xmip==0.7.1

Related PRs/Issues

I believe #93 is also related to this and #79 might be where the bug was introduced.

One way this might be solved is by replacing the above mentioned line as is done in this PR:
#296

@bguillod
Copy link

I am also encountering some of the issues described here (and similar once related to coordinates). For some models the longitude fixing seems to screw the data, as also experienced by @Quentin-Bourgeois. Is a fix for this issue in the pipeline @jbusecke?

@jbusecke
Copy link
Owner

Hi @JoranAngevaare and @bguillod. Very sorry for the long radio silence here. Trying to catch up with a lot of maintenance after working on the new Pangeo/ESGF CMIP6 Zarr data ingestion.

First of all, thanks a lot for the carefully crafted report here @JoranAngevaare. And even more for putting in a fix for this. Ill reopen the PR and continue discussion over there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants