Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concat result not correct for particular dataset #3681

Closed
lvankampenhout opened this issue Jan 10, 2020 · 7 comments · Fixed by #3839
Closed

concat result not correct for particular dataset #3681

lvankampenhout opened this issue Jan 10, 2020 · 7 comments · Fixed by #3839
Labels

Comments

@lvankampenhout
Copy link

MCVE Code Sample

data here: https://www.dropbox.com/sh/8eist9mmlf41mpc/AAB8yp6ERz-b4VYozL8tsj-ma?dl=0

import xarray as xr
import numpy as np

print(xr.__version__) #

ds1 = xr.open_dataset('tas_Amon_NorESM1-ME_rcp26_r1i1p1_200601-206012.nc')
ds2 = xr.open_dataset('tas_Amon_NorESM1-ME_rcp26_r1i1p1_206101-210112.nc')

print(np.allclose(ds1.lat , ds2.lat), np.allclose(ds1.lat_bnds , ds2.lat_bnds)) # True, True

ds3 = xr.concat((ds1,ds2), dim='time')

print(ds3.lat.shape == ds1.lat.shape) # False

Expected Output

0.14.1
True True
True

since ds3.lat should be identical to ds1.lat

Problem Description

I've encountered a particular NetCDF dataset which is not handled correctly by the Xarray concat operation. It's a climate dataset with 96 latitude points which has been split into two time segments. After concatenation (dim = 'time') there are suddenly 142 latitude points even though the latitude arrays are completely identical AFAIK.

As a workaround, I've tried to reindex the result (ds3) as follows

ds4 = ds3.reindex_like(ds1.drop_dims('time'))

but that yields an incomplete field after the year 2061. This can be seen by issuing:

ds4.tas.isel(time=-1).plot()

with white areas indicating missing data. There is no missing data in the source files.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 13:42:17) [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 17.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: C LANG: None LOCALE: None.None libhdf5: 1.10.4 libnetcdf: 4.6.1

xarray: 0.14.1
pandas: 0.25.1
numpy: 1.16.4
scipy: 1.3.1
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: 2.9.0
Nio: None
zarr: None
cftime: 1.0.0b1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.3.0
distributed: 2.3.2
matplotlib: 3.0.2
cartopy: 0.17.0
seaborn: 0.9.0
numbagg: None
setuptools: 41.0.1
pip: 19.2.2
conda: 4.8.1
pytest: 5.0.1
IPython: 7.7.0
sphinx: 2.1.2

@TomNicholas
Copy link
Member

even though the latitude arrays are completely identical AFAIK.

Are they completely identical? I see you've used np.allclose to assert that they are identical rather than np.array_equal. You might want to try concat with compat='override'?

@lvankampenhout
Copy link
Author

good point,np.array_equal(ds1.lat , ds2.lat) yields False whereas np.allclose() reported True.

How to use compat='override' in this case? I tried

ds3 = xr.concat((ds1,ds2), dim='time',compat='override',coords='minimal')

But this didn't work.

@TomNicholas
Copy link
Member

Looks like instead of using compat you can tell concat to just use the indexers (so lat, lon etc.) from the leftmost object using join='left', so it will ignore the (small) differences and return you

ds3 = xr.concat([ds1,ds2], dim='time', join='left')
<xarray.Dataset>
Dimensions:    (bnds: 2, lat: 96, lon: 144, time: 1152)
Coordinates:
    height     float64 2.0
  * lon        (lon) float64 0.0 2.5 5.0 7.5 10.0 ... 350.0 352.5 355.0 357.5
  * lat        (lat) float64 -90.0 -88.11 -86.21 -84.32 ... 86.21 88.11 90.0
  * time       (time) object 2006-01-16 12:00:00 ... 2101-12-16 12:00:00
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (time, bnds) object 2006-01-01 00:00:00 ... 2102-01-01 00:00:00
    lat_bnds   (time, lat, bnds) float64 -90.0 -89.05 -89.05 ... nan nan nan
    lon_bnds   (time, lon, bnds) float64 -1.25 1.25 1.25 ... 356.2 356.2 358.8
    tas        (time, lat, lon) float32 244.75986 244.83588 ... nan nan
Attributes:
    institution:            Norwegian Climate Centre
    institute_id:           NCC
    experiment_id:          rcp26
    source:                 NorESM1-ME 2011  atmosphere: CAM-Oslo (CAM4-Oslo-...
    model_id:               NorESM1-ME
    forcing:                GHG, SA, Oz, Sl, BC, OC
    parent_experiment_id:   historical
    parent_experiment_rip:  r1i1p1
    branch_time:            56940.0
    contact:                Please send any requests or bug reports to noresm...
    initialization_method:  1
    physics_version:        1
    tracking_id:            fd266701-a253-4b74-91e1-5c0213483ba2
    product:                output
    experiment:             RCP2.6
    frequency:              mon
    creation_date:          2012-05-16T19:40:05Z
    history:                2012-05-16T19:40:05Z CMOR rewrote data to comply ...
    Conventions:            CF-1.4
    project_id:             CMIP5
    table_id:               Table Amon (01 February 2012) 81f919710c21dca8a17...
    title:                  NorESM1-ME model output prepared for CMIP5 RCP2.6
    parent_experiment:      historical
    modeling_realm:         atmos
    realization:            1
    cmor_version:           2.7.1

Is that what you're after? Best to check through the data explicitly

@lvankampenhout
Copy link
Author

lvankampenhout commented Jan 10, 2020

Thanks Tom. This indeed gives a dataset with the correct dimensions but there is missing data
Screen Shot 2020-01-10 at 14 42 27

I've also tried join='override' but this raises an IndexError.

@dcherian
Copy link
Contributor

Since lat is a dimension coordinate or "index variable" you want join="override" instead of compat="override".

A PR to make the docs clearer on this would be appreciated!

@lvankampenhout
Copy link
Author

lvankampenhout commented Jan 10, 2020

Unfortunately, join='override' raises an IndexError.

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-90-462267a327eb> in <module>
----> 1 ds6 = xr.concat((ds1,ds2), dim='time',join='override')

~/anaconda3/lib/python3.6/site-packages/xarray/core/concat.py in concat(objs, dim, data_vars, coords, compat, positions, fill_value, join)
    131             "objects, got %s" % type(first_obj)
    132         )
--> 133     return f(objs, dim, data_vars, coords, compat, positions, fill_value, join)
    134 
    135 

~/anaconda3/lib/python3.6/site-packages/xarray/core/concat.py in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join)
    299     datasets = [ds.copy() for ds in datasets]
    300     datasets = align(
--> 301         *datasets, join=join, copy=False, exclude=[dim], fill_value=fill_value
    302     )
    303 

~/anaconda3/lib/python3.6/site-packages/xarray/core/alignment.py in align(join, copy, indexes, exclude, fill_value, *objects)
    269 
    270     if join == "override":
--> 271         objects = _override_indexes(objects, all_indexes, exclude)
    272 
    273     # We don't reindex over dimensions with all equal indexes for two reasons:

~/anaconda3/lib/python3.6/site-packages/xarray/core/alignment.py in _override_indexes(objects, all_indexes, exclude)
     53         for dim in obj.dims:
     54             if dim not in exclude:
---> 55                 new_indexes[dim] = all_indexes[dim][0]
     56         objects[idx + 1] = obj._overwrite_indexes(new_indexes)
     57 

IndexError: list index out of range

@dcherian
Copy link
Contributor

Ah thanks, there is a bug here.

xarray is trying to overwrite indexes for the bnds dimension but there are no indexes associated with that dimension. As a work around, you can assign values for bnds: ds1["bnds"] = [0,1] and ds2["bnds"] = [0,1] and then use join="override"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants