Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ordered Groupby Keys #757

Open
jhamman opened this issue Feb 10, 2016 · 6 comments
Open

Ordered Groupby Keys #757

jhamman opened this issue Feb 10, 2016 · 6 comments

Comments

@jhamman
Copy link
Member

jhamman commented Feb 10, 2016

The current behavior of the xarray's Groupby.groups property provides a standard (unordered) dictionary. This is fine for most cases but leads to odd orderings in use cases like this one where I am using xarray's FacetGrid plotting:

plot_kwargs = dict(col='season', vmin=15, vmax=35, levels=12, extend='both')

da_obs = ds_obs.SALT.isel(depth=0).groupby('time.season').mean('time')
da_obs.plot(**plot_kwargs)

index
Note that MAM and JJA are out of order.

I think this could be easily fixed by using an OrderedDict in xarray.core.Groupby.groups.

@shoyer
Copy link
Member

shoyer commented Feb 11, 2016

I agree this is annoying, but I don't think your diagnosis is correct here. The groups property isn't used by any internal routines AFAICT. The issue is that groups are sorted, but as text rather than ordered categorical -- notice that the labels are ordered alphabetically.

@jhamman
Copy link
Member Author

jhamman commented Feb 11, 2016

Hmmm, a mystery. I'll look into this a bit more.

@shoyer
Copy link
Member

shoyer commented Feb 11, 2016

For what it's worth, I don't think we have any good solutions short of
adding our own array type do to Categorical in xarray. We could set
sort=False in some cases when we call pd.factorize but that's not a great
alternative.

@tdihp
Copy link

tdihp commented Mar 3, 2018

Ahh, so it's sorted, instead of keeping the original order.

I was expecting DataArray.groupby().reduce would work like np.apply_along_axis, and used the data of the result directly.

@jbusecke
Copy link
Contributor

Just stumbled across this issue. Is there a recommended workaround?

I am usually doing this (specific to seasons):

import xarray as xr
ds = xr.tutorial.open_dataset('air_temperature')
airtemp_seasonal = ds.groupby('time.season').mean('time').sortby(xr.DataArray(['DJF','MAM','JJA', 'SON'],dims=['season']))

Thought this might help some folks who need to solve this problem.

@dcherian
Copy link
Contributor

I use reindex instead of sortby

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants