BUG: .groupby on Series' values without reindexing #15340

kernc · 2017-02-08T00:29:46Z

closes .groupby by should indicate it aligns the passed in Series #15338
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

jreback · 2017-02-08T00:35:52Z

see my comments on the issue.

codecov-io · 2017-02-08T04:40:24Z

Codecov Report

Merging #15340 into master will not impact coverage.

@@           Coverage Diff           @@
##           master   #15340   +/-   ##
=======================================
  Coverage   86.32%   86.32%           
=======================================
  Files         141      141           
  Lines       51165    51165           
=======================================
  Hits        44169    44169           
  Misses       6996     6996

Impacted Files	Coverage Δ
pandas/core/groupby.py	`95.15% <100%> (ø)`	✅

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 542c916...87d6170. Read the comment docs.

jreback · 2017-02-08T14:37:38Z

so this changes the actual behavior. why is this warranted?

kernc · 2017-02-09T04:24:08Z

I don't know if it is. The docs say in case of series, the values will be used as grouping keys. I guess in this instance I don't see series as much different from any other series of values, like lists or numpy arrays. I don't know why aligning is necessary, particularly if the series of keys comes from another source (if not, why would it be passed as a series explicitly instead of by a column reference), in which case the index can be arbitrary and the result of alignment just as so.

In other words, if the grouping series is from the same source (i.e. the same frame) as the grouped object, then it already shares the same index, hence aligning is not necessary.
If the series is from a different source, then automatically aligning might be dangerous. I highly appreciate the fact that an indexed series makes a great random-access key-value store, but I assume most users don't treat it as such in preference to somewhat simpler ordered array of values which happens to have an additional index.

Ideally, I'd have .groupby() fail if the passed grouping is a Sequence and its length doesn't match. Otherwise, it should behave as it does when passed a list.

Of course, Series is not a Sequence?? 😳

kernc · 2017-02-09T04:27:43Z

so this changes the actual behavior. why is this warranted?

It's wholly compatible with the docs, and it doesn't break any previous tests. 😆

jreback · 2017-02-09T16:30:54Z

@kernc this certainly changes behavior. The entire point is to align. This is actually very confusing what you changed. Virtually every operation in pandas aligns. Why would this not?

kernc · 2017-02-10T15:00:52Z

Thanks for treating me like a pandas expert I'm not. ❤️

In that case, I agree a remark in the docs (docstring) would be welcome.

jreback · 2017-02-10T18:38:17Z

@kernc hahh, np.

if you'd like to do a PR for the doc-string would be great.

BUG: .groupby on Series' values without reindexing

87d6170

jreback added the Groupby label Feb 8, 2017

kernc closed this Feb 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: .groupby on Series' values without reindexing #15340

BUG: .groupby on Series' values without reindexing #15340

kernc commented Feb 8, 2017

jreback commented Feb 8, 2017

codecov-io commented Feb 8, 2017 •

edited

Loading

jreback commented Feb 8, 2017

kernc commented Feb 9, 2017

kernc commented Feb 9, 2017

jreback commented Feb 9, 2017

kernc commented Feb 10, 2017

jreback commented Feb 10, 2017

BUG: .groupby on Series' values without reindexing #15340

BUG: .groupby on Series' values without reindexing #15340

Conversation

kernc commented Feb 8, 2017

jreback commented Feb 8, 2017

codecov-io commented Feb 8, 2017 • edited Loading

Codecov Report

jreback commented Feb 8, 2017

kernc commented Feb 9, 2017

kernc commented Feb 9, 2017

jreback commented Feb 9, 2017

kernc commented Feb 10, 2017

jreback commented Feb 10, 2017

codecov-io commented Feb 8, 2017 •

edited

Loading