Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved inference of names when concatenating arrays #2775

Closed
Zac-HD opened this issue Feb 19, 2019 · 1 comment
Closed

Improved inference of names when concatenating arrays #2775

Zac-HD opened this issue Feb 19, 2019 · 1 comment

Comments

@Zac-HD
Copy link
Contributor

Zac-HD commented Feb 19, 2019

Problem description

Using the name of the first element to concatenate as the name of the concatenated array is only correct if all names are identical. When names vary, using a clear placeholder name or the name of the new dimension would avoid misleading data users.

This came up for me recently when stacking several bands of a satellite image to produce a faceted plot - the resulting colorbar was labelled "blue", even though that was clearly incorrect.

A similar process is probably also desirable for aggregation of units across concatenated arrays - use first if identical, otherwise discard or error depending on the compat argument.

Code Sample, a copy-pastable example if possible

ds = xr.Dataset({
    k: xr.DataArray(np.random.random((2, 2)), dims="x y".split(), name=k) 
    for k in "blue green red".split()
})
# arr.name == "blue", could be "band" or "concat_dim"
arr = xr.concat([ds.blue, ds.green, ds.red], dim="band")
# label of colorbar is "blue", which is meaningless
arr.plot.imshow(col="band")

image

One implementation that would certainly be nice for this use-case (though perhaps not generally) is that concatenating DataArrays along an entirely new dimension with unique array names and dim passed a string could create a new Index as well, as pd.Index([a.name for a in objs], name=dim).

INSTALLED VERSIONS

commit: None
python: 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
libhdf5: 1.10.3
libnetcdf: 4.4.1.1

xarray: 0.11.2
pandas: 0.23.1
numpy: 1.14.5
scipy: 1.2.1
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.3.4
PseudonetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
cyordereddict: None
dask: None
distributed: None
matplotlib: 3.0.2
cartopy: None
seaborn: 0.9.0
setuptools: 40.6.2
pip: 10.0.1
conda: None
pytest: 4.2.0
IPython: 6.4.0
sphinx: 1.8.0

I'd be happy to write a PR for this if it would be accepted.

@shoyer
Copy link
Member

shoyer commented Feb 19, 2019

Indeed, this seems broken to me. I think we should use the same heuristic we use for naming the result of operations with apply_ufunc:

def result_name(objects: list) -> Any:
# use the same naming heuristics as pandas:
# https://github.com/blaze/blaze/issues/458#issuecomment-51936356
names = {getattr(obj, 'name', _DEFAULT_NAME) for obj in objects}
names.discard(_DEFAULT_NAME)
if len(names) == 1:
name, = names
else:
name = None
return name

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants