Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

open_mfdataset() on a single file drops the concat_dim #1988

Closed
WeatherGod opened this issue Mar 14, 2018 · 6 comments · Fixed by #2048
Closed

open_mfdataset() on a single file drops the concat_dim #1988

WeatherGod opened this issue Mar 14, 2018 · 6 comments · Fixed by #2048

Comments

@WeatherGod
Copy link
Contributor

When calling xr.open_mfdataset() on a 1 element list of filenames, the concat dimension is never added.

This isn't a MWE at the moment (will make one soon enough), just wanted to get my thoughts down.

from datetime import datetime
import xarray as xr

time_coord = xr.DataArray([datetime.utcnow()], name='time', dims='time')
radmax_ds = xr.open_mfdataset(['foobar.nc'], concat_dim=time_coord)
print(radmax_ds)
<xarray.Dataset>
Dimensions:    (latitude: 5650, longitude: 12050)
Coordinates:
  * latitude   (latitude) float32 13.505002 13.515002 13.525002 13.535002 ...
  * longitude  (longitude) float32 -170.495 -170.485 -170.475 -170.465 ...
Data variables:
    RadarMax   (latitude, longitude) float32 dask.array<shape=(5650, 12050), chunksize=(5650, 12050)>
Attributes:
    start_date:   03/07/2017 01:00
    end_date:     03/07/2017 01:55
    elapsed:      60

Problem description

If there are two files, then there is a time coordinate, and the data array becomes 3D.

Output of xr.show_versions()

I am currently on a recent-ish master of xarray.

@jhamman
Copy link
Member

jhamman commented Mar 23, 2018

This does seem inconsistent to me. Not that the behavior of concat already works with a single object:

In [1]: import xarray as xr

In [2]: da = xr.DataArray([1, 2], dims='x', name='foo')

In [3]: xr.concat([da], dim='y')
Out[3]:
<xarray.DataArray 'foo' (y: 1, x: 2)>
array([[1, 2]])
Dimensions without coordinates: y, x

The offending line is here:

def _auto_concat(datasets, dim=None, data_vars='all', coords='different'):
if len(datasets) == 1:
return datasets[0]

Based on this, I'm surprised my little example with concat works the way it does. In either event, it would be great if someone could spend some time normalizing the behavior here.

@shoyer
Copy link
Member

shoyer commented Mar 23, 2018

Yes, this seems like a bug. open_mfdataset() should always concatenate if a dim argument is provided explicitly.

@WeatherGod
Copy link
Contributor Author

Could the fix be as simple as if len(datasets) == 1 and dim is None:?

@shoyer
Copy link
Member

shoyer commented Apr 9, 2018

@WeatherGod Possibly! As usual, tests are the hard part :)

@WeatherGod
Copy link
Contributor Author

I'll give it a go tomorrow. My work has gotten to this point now, and I have some unit tests that happen to exercise this edge case.

On a somewhat related note, would a allow_missing feature be welcomed in open_mfdataset()? I have written up some code that expects a concat_dim, and a list of filenames. It will then pass to open_mfdataset() only the files (and corresponding concat_dim values) that exists, and then calls reindex() with the original concat_dim to have a nan-filled slab where-ever there was a missing file.

Any interest?

@WeatherGod
Copy link
Contributor Author

Yup... looks like that did the trick (for auto_combine and open_mfdataset). I even have a simple test to demonstrate it. PR coming shortly.

WeatherGod added a commit to WeatherGod/xray that referenced this issue Apr 10, 2018
WeatherGod added a commit to WeatherGod/xray that referenced this issue Apr 10, 2018
shoyer pushed a commit that referenced this issue Apr 10, 2018
* concat_dim for auto_combine for a single object is now respected

Closes #1988

* Added what's new entry for the bugfix.
WeatherGod added a commit to WeatherGod/xray that referenced this issue Jan 4, 2019
shoyer pushed a commit that referenced this issue Jan 5, 2019
…#2648)

* Change an `==` to an `is`. Fix tests so that this won't happen again.

Closes #2647 and re-affirms #1988.

* Reuse the same _CONCAT_DIM_DEFAULT object
shoyer pushed a commit that referenced this issue Jan 24, 2019
…#2648)

* Change an `==` to an `is`. Fix tests so that this won't happen again.

Closes #2647 and re-affirms #1988.

* Reuse the same _CONCAT_DIM_DEFAULT object
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants