You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Never used Pandas, so I am not sure how its seen, but as a suggestion, maybe it would be an idea to have pandas throw a warning (or even an Exception) if the result is broadcasted to all columns? That numpy functions work (due to a funny back and forth when trying to execute the corresponding pandas attribute with axis=None) very differently if they are ndarray attributes compared to when they are not (and generally the silent switching between passing 2D or 1D array likes) seems a bit dangerous.
You're right, the issue is that axis=0 isn't passed in by default (and perhaps it should be... although it may be difficult to know what to pass in as default for arbitrary functions?).
As median flattens the array (and produces a number), and since:
test_g.aggregate(lambda x: 8)
makes everything 8, this behaviour is "expected" in some sense (and sometimes might be what you want) so we probably don't want an exception...?
test_g.aggregate(np.median) should now result in the correct result. np.mean was different originally because certain numpy functions are special cased in the pandas groupby machinery for speed, which also changed default behavior to be pandas-like (df.mean()) rather than numpy-like (np.mean(arr)).
Migrated from http://stackoverflow.com/questions/12651618/inconsisitency-in-results-of-aggregating-pandas-groupby-object-using-numpy-media
Aggregating using
np.median
(unexpectedly) produces DataFrame-wise aggregation within groups:It works perfectly when
axis=0
is passed in:For
np.mean
(alsosum
,min
,max
) this doesn't average over entire array:Perhaps worth noting that when passing in as a list this behaviour is not seen:
The text was updated successfully, but these errors were encountered: