Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing preserves outdated attrs which cause trouble downstream #2247

Open
leouieda opened this issue Jun 22, 2018 · 3 comments
Open

Indexing preserves outdated attrs which cause trouble downstream #2247

leouieda opened this issue Jun 22, 2018 · 3 comments
Labels
topic-metadata Relating to the handling of metadata (i.e. attrs and encoding)

Comments

@leouieda
Copy link

Code Sample, a copy-pastable example if possible

import xarray as xr

# Load a global grid with 1 degree spacing distributed by GMT
whole = xr.open_dataarray('earth_relief_60m.grd')
print("Global grid:\n", whole)
# The grid coordinates have metadata regarding the range (actual_range).
# This is used to detect if the grid is pixel or node registered.
print("\nMetadata for global coordinates:", whole.lat.attrs)

# Select only between latitudes -40 and 40
part = whole.sel(lat=slice(-40, 40))
print("\nSliced grid:\n", part)
# Slicing preserves the coordinate metadata and now the actual_range is incorrect.
# This is preserved when saving to netCDF and causes errors that are difficult to
# diagnose and fix when passing it along to GMT for plotting.
print("\nMetadata for sliced coordinates:", part.lat.attrs)

Output:

Global grid:
 <xarray.DataArray 'z' (lat: 181, lon: 361)>
array([[ 2762.,  2762.,  2762., ...,  2762.,  2762.,  2762.],
       [ 2983.,  2980.,  2977., ...,  2989.,  2986.,  2983.],
       [ 3074.,  3074.,  3074., ...,  3072.,  3073.,  3074.],
       ...,
       [-3727., -3715., -3706., ..., -3759., -3742., -3727.],
       [-2294., -2282., -2271., ..., -2322., -2308., -2294.],
       [-4181., -4181., -4181., ..., -4181., -4181., -4181.]], dtype=float32)
Coordinates:
  * lon      (lon) float64 -180.0 -179.0 -178.0 -177.0 -176.0 -175.0 -174.0 ...
  * lat      (lat) float64 -90.0 -89.0 -88.0 -87.0 -86.0 -85.0 -84.0 -83.0 ...
Attributes:
    long_name:     z
    actual_range:  [-8425.  5551.]

Metadata for global coordinates: OrderedDict([('long_name', 'latitude'), ('units', 'degrees_north'), ('actual_range', array([-90.,  90.]))])

Sliced grid:
 <xarray.DataArray 'z' (lat: 81, lon: 361)>
array([[-3062., -3451., -3695., ..., -1504., -3226., -3062.],
       [-3559., -3515., -3773., ...,  -128., -2991., -3559.],
       [-3552., -3498., -3459., ...,   519., -1149., -3552.],
       ...,
       [-5425., -5389., -5268., ..., -5162., -5399., -5425.],
       [-5385., -5499., -5580., ..., -5407., -5497., -5385.],
       [-5325., -5937., -5557., ..., -5602., -5572., -5325.]], dtype=float32)
Coordinates:
  * lon      (lon) float64 -180.0 -179.0 -178.0 -177.0 -176.0 -175.0 -174.0 ...
  * lat      (lat) float64 -40.0 -39.0 -38.0 -37.0 -36.0 -35.0 -34.0 -33.0 ...
Attributes:
    long_name:     z
    actual_range:  [-8425.  5551.]

Metadata for sliced coordinates: OrderedDict([('long_name', 'latitude'), ('units', 'degrees_north'), ('actual_range', array([-90.,  90.]))])

Problem description

Indexing seems to preserve the attrs. If it contains information about the values, then this information will be outdated. Some software, like GMT rely on this information for certain operations. It can manage missing metadata but there is no way to guard against incorrect metadata.

Expected Output

I would expect indexing to drop attrs unless keep_attrs is specified. It's better to have no metadata than to have incorrect metadata.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.4.final.0 python-bits: 64 OS: Linux OS-release: 4.15.18-3-MANJARO machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.utf8 LOCALE: en_US.UTF-8

xarray: 0.10.2
pandas: 0.22.0
numpy: 1.14.2
scipy: 1.0.1
netCDF4: 1.3.1
h5netcdf: 0.5.0
h5py: 2.7.1
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.17.2
distributed: 1.21.4
matplotlib: 2.2.2
cartopy: None
seaborn: None
setuptools: 39.0.1
pip: 9.0.1
conda: None
pytest: 3.5.0
IPython: 6.2.1
sphinx: 1.7.2

@leouieda
Copy link
Author

Here is the earth_relief_60m.grd grid if anyone wants to try the code: earth_relief_60m.grd.zip

@shoyer
Copy link
Member

shoyer commented Jun 22, 2018

See #1614 for a large discussion about these sorts of issues (propagating metadata).

@leouieda
Copy link
Author

@shoyer thanks, I missed that issue. I appreciate the complexity of the problem. For now, I'm dropping the actual_range attribute manually but I image other people will have this issue when using grids made with GMT. We're working on the GMT side to issue warnings if we detect that the metadata might be stale.

@TomNicholas TomNicholas added the topic-metadata Relating to the handling of metadata (i.e. attrs and encoding) label Apr 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-metadata Relating to the handling of metadata (i.e. attrs and encoding)
Projects
None yet
Development

No branches or pull requests

3 participants