Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset.groupby summary methods #122

Closed
jhamman opened this issue May 11, 2014 · 3 comments
Closed

Dataset.groupby summary methods #122

jhamman opened this issue May 11, 2014 · 3 comments

Comments

@jhamman
Copy link
Member

jhamman commented May 11, 2014

This may just be a documentation issue but the summary apply and combine methods for the Dataset.GroupBy object seem to be missing.

In [146]:

foo_values = np.random.RandomState(0).rand(3, 4)
times = pd.date_range('2000-01-01', periods=3)
ds = xray.Dataset({'time': ('time', times),
                   'foo': (['time', 'space'], foo_values)})

ds.groupby('time').mean()  #replace time with time.month after #121 is adressed
# ds.groupby('time').apply(np.mean)  # also Errors here

 ---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-146-eec1e73cff23> in <module>()
      3 ds = xray.Dataset({'time': ('time', times),
      4                    'foo': (['time', 'space'], foo_values)})
----> 5 ds.groupby('time').mean()
      6 ds.groupby('time').apply(np.mean)

AttributeError: 'DatasetGroupBy' object has no attribute 'mean'

Adding this functionality, if not already present, seems like a really nice addition to the package.

@shoyer
Copy link
Member

shoyer commented May 12, 2014

The problem is that it's ambiguous which array you would like to summarize. I suppose we could default to using all noncoordinates? That would be similar to how pandas does its DataFrame.groupby methods.

@jhamman
Copy link
Member Author

jhamman commented May 12, 2014

Right, although it doesn't have to be ambiguous. Why not just summarize each group along the group_dim?

@shoyer
Copy link
Member

shoyer commented May 12, 2014

We currently don't implement methods like ds.mean(), which it would make sense to implement before adding the groupby version. But I do think it would be perfectly consistent to say that the mean of a dataset is the dataset given by taking the mean of every variable in the dataset.

@shoyer shoyer closed this as completed in 18c7b9d Jun 23, 2014
keewis pushed a commit to keewis/xarray that referenced this issue Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants