ENH: Allow the groupby by param to handle columns and index levels (GH5677) #14432

jonmmease · 2016-10-15T23:13:33Z

closes ENH/API: clarify groupby by to handle columns/index names #5677
tests added
rebase after Bug: Error when key-only Grouper is passed to groupby in a list (GH14334) #14342 is merged to fix Error when key-only Grouper is passed to groupby in a list #14334
tests pass (requires rebase above)
passes git diff upstream/master | flake8 --diff
whatsnew entry
updates existing documentation
add example to whatsnew 0.20 and groupby.rst

Change to allow strings passed as the by parameter to df.groupby to reference columns (existing behavior) or index level names if no column match is found. Columns take precedence in the case of ambiguity to maintain backward compatibility.

jreback · 2016-10-19T12:45:43Z

doc/source/whatsnew/v0.20.0.txt

@@ -29,8 +29,7 @@ New features

 Other enhancements
 ^^^^^^^^^^^^^^^^^^
-
-
+- Strings passed to ``DataFrame.groupby()`` as the ``by`` parameter may now reference either column names or index level names (:issue:`5677`)


this will need an example here. put the same one in groupby.rst (make also need to add to the groupby doc-string)

@jreback Example added here and in groupby.rst. I didn't add anything to the groupby-docstring yet as I wasn't quite sure where it would fit in (there are only two examples there right now). Let me know what you think.

jreback · 2016-10-19T12:46:28Z

pandas/tests/test_groupby.py

+        expected = df_multi_both.groupby(pd.Grouper(key='inner')).mean()
+        assert_frame_equal(result, expected)
+        not_expected = df_multi_both.groupby(pd.Grouper(level='inner')).mean()
+        assert not result.index.equals(not_expected.index)


self.assertFalse

jreback · 2016-10-19T12:49:36Z

@i think that we should raise in the ambiguous case, which just works on the column now (which this PR uses to establish precedence), e.g.

df.groupby('inner') when inner is both a column and a level name.

This is almost always an error by the user, or if its actually wanted, then the user should be more specific (by using pd.Grouper).

Currently, AFIK, this just takes the column.

jonmmease · 2016-10-19T19:25:54Z

@jreback I have no problem with raising an exception in the ambiguous case. As you noted, the only reason to have columns take precedence was to reproduce the behavior of previous versions if ambiguity was present.

@shoyer expressed a preference for the precedence approach in #14355 (comment) for the df.merge operation. And I do agree that it makes sense to handle ambiguity in a consistent way across #5677 (groupby), #14353 (sort_values), and #14355 (merge).

@TomAugspurger @jorisvandenbossche @shoyer Do any of you object to raising an exception in the ambiguous case for each of these 3 enhancements?

shoyer · 2016-10-19T21:42:25Z

Yes, we could error in the ambiguous case, but only eventually, after a deprecation cycle.

jonmmease · 2016-10-20T00:35:22Z

@shoyer
If we went the deprecation cycle route, would that mean that we'd keep the column precedence behavior but add a FutureWarning in the ambiguous case? Something like the following?

warnings.warn(("'%s' is both a column name and an index level, defaulting to column."
               "This will raise an ambiguity error in a future version") % grp,
               FutureWarning, stacklevel=2)

And then I assume we'd also add a deprecation note to the whatsnew file.

@jreback @jorisvandenbossche Thoughts?

shoyer · 2016-10-20T00:40:16Z

@jmmease Yes, that's right. We should probably be a little reluctant to add deprecated behavior right now, though, because there may be a long wait between the next pandas feature release (1.0?) and pandas 2.0.

jreback · 2016-10-20T00:44:13Z

there is no problem with deprecating things
w certainly are not going to just 'wait' for pandas 2.0

jonmmease · 2016-10-27T00:20:50Z

@jreback @shoyer @jorisvandenbossche
Rebased now that #14342 has landed. Added FutureWarning for the case where a string matches both a column and an index level.

Example for whatsnew, groupby.rst, and groupby doc-string still to come

jreback · 2016-10-27T00:27:58Z

doc/source/groupby.rst

@@ -94,6 +94,9 @@ The mapping can be specified many different ways:
  - For DataFrame objects, a string indicating a column to be used to group. Of
    course ``df.groupby('A')`` is just syntactic sugar for
    ``df.groupby(df['A'])``, but it makes life simpler
+  - For DataFrame objects, a string indicating an index level to be used to group.


versionadded tag

i would also make this into a note section

@jreback I wasn't quite sure how to handle versionadded and keep the description in the list. I left the description in the list (without ambiguity explanation) and added a note section below with versionadded tag that describes the change and the ambiguity behavior.

codecov-io · 2016-10-27T06:20:22Z

Current coverage is 85.30% (diff: 100%)

Merging #14432 into master will increase coverage by 0.01%

@@             master     #14432   diff @@
==========================================
  Files           140        144     +4   
  Lines         50719      51004   +285   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43259      43510   +251   
- Misses         7460       7494    +34   
  Partials          0          0

Powered by Codecov. Last update dca0185...2d35461

…warning

jorisvandenbossche · 2016-12-14T20:39:58Z

@jmmease Thanks a lot!

jreback · 2016-12-17T15:08:30Z

@jmmease can you do a quick followup to catch this warning that's appearing (you may need to run all groupby tests)

test_groupby_multi_categorical_as_index (pandas.tests.test_groupby.TestGroupBy) ... /Users/jreback/pandas/pandas/core/groupby.py:370: FutureWarning: 'cat' is both a column name and an index level.
Defaulting to column but this will raise an ambiguity error in a future version
  mutated=self.mutated)

jonmmease · 2016-12-17T15:48:14Z

Sure. See #14902

Follow on to #14432 to catch the newly introduced `FutureWarning` in the `test_groupby_multi_categorical_as_index` test case. Author: Jon M. Mease <jon.mease@jhuapl.edu> Closes #14902 from jmmease/GH14432_follow_on and squashes the following commits: c30fa2b [Jon M. Mease] Trap warning introduced by GH14432 in test_groupby_multi_categorical_as_index

…H5677) (pandas-dev#14432)

Follow on to pandas-dev#14432 to catch the newly introduced `FutureWarning` in the `test_groupby_multi_categorical_as_index` test case. Author: Jon M. Mease <jon.mease@jhuapl.edu> Closes pandas-dev#14902 from jmmease/GH14432_follow_on and squashes the following commits: c30fa2b [Jon M. Mease] Trap warning introduced by GH14432 in test_groupby_multi_categorical_as_index

goldenbull · 2017-06-05T11:18:14Z

So what's the next step of the deprecation cycle? What shall I do if I want to enforce the column to be used in grouping?

jreback · 2017-06-05T11:28:15Z

@goldenbull this will be changed to an error in 1.0 (after 0.21)

xref pandas-devgh-14432.

xref gh-14432.

xref pandas-devgh-14432.

jreback reviewed Oct 19, 2016

View reviewed changes

jreback added Groupby API Design labels Oct 19, 2016

jonmmease force-pushed the enh_5677 branch from 224ba7d to 010e66c Compare October 27, 2016 00:12

jreback requested changes Oct 27, 2016

View reviewed changes

jorisvandenbossche added this to the 0.20.0 milestone Oct 27, 2016

jonmmease force-pushed the enh_5677 branch from dea98f1 to da7b406 Compare November 5, 2016 15:52

Jon M. Mease added 7 commits November 18, 2016 08:35

Added test cases for GH 5677

eb1e63f

Implemented GH 5677

355d709

Documentation updates for GH 5677

b050aca

Added future warning on ambiguous case (GH 5677)

5f93ddd

Indentation fix

5325ee6

Added note explaining version 0.20 change and ambiguity resolution / …

b68af16

…warning

Added example for grouping by combination of index level and column

98af4d7

jonmmease force-pushed the enh_5677 branch from da7b406 to 98af4d7 Compare November 18, 2016 13:38

shorten test names

2d35461

jorisvandenbossche merged commit a8cabb8 into pandas-dev:master Dec 14, 2016

jonmmease mentioned this pull request Dec 17, 2016

Catch warning introduced by GH14432 in test case #14902

Closed

ischurov pushed a commit to ischurov/pandas that referenced this pull request Dec 19, 2016

ENH: Allow the groupby by param to handle columns and index levels (G…

ac18401

…H5677) (pandas-dev#14432)

jreback mentioned this pull request Mar 5, 2017

rolling( window='10D') does not work for df with MultiIndex #15584

Closed

jsexauer mentioned this pull request Jun 5, 2017

DEPR: Clean up list of deprecations from prior versions #6581

Closed

1 task

jonmmease mentioned this pull request Aug 29, 2017

ENH: Allow the groupby by param to handle columns and index levels dask/dask#2635

Closed

TomAugspurger mentioned this pull request Aug 30, 2017

Groupby with matching column and index name emits spurious warning #17383

Closed

jonmmease mentioned this pull request Sep 11, 2017

Support merging DataFrames on a combo of columns and index levels (GH 14355) #17484

Merged

5 tasks

gfyoung added a commit to forking-repos/pandas that referenced this pull request Aug 19, 2018

DEPR: Error with ambiguous groupby strings

5374ba4

xref pandas-devgh-14432.

gfyoung mentioned this pull request Aug 19, 2018

DEPR: Error with ambiguous groupby strings #22415

Merged

jreback mentioned this pull request Aug 19, 2018

DEPR: deprecations log for removed issues #13777

Closed

jreback pushed a commit that referenced this pull request Aug 22, 2018

DEPR: Error with ambiguous groupby strings (#22415)

25e6a21

xref gh-14432.

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

DEPR: Error with ambiguous groupby strings (pandas-dev#22415)

48f57f8

xref pandas-devgh-14432.

WillAyd mentioned this pull request Mar 13, 2019

GroupBy Not Throwing KeyError When Names Exist in MultiIndex #25704

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Allow the groupby by param to handle columns and index levels (GH5677) #14432

ENH: Allow the groupby by param to handle columns and index levels (GH5677) #14432

jonmmease commented Oct 15, 2016 •

edited

Loading

jreback Oct 19, 2016

jonmmease Nov 5, 2016

jreback Oct 19, 2016

jreback commented Oct 19, 2016 •

edited

Loading

jonmmease commented Oct 19, 2016

shoyer commented Oct 19, 2016

jonmmease commented Oct 20, 2016

shoyer commented Oct 20, 2016

jreback commented Oct 20, 2016

jonmmease commented Oct 27, 2016

jreback Oct 27, 2016

jonmmease Oct 27, 2016

codecov-io commented Oct 27, 2016 •

edited

Loading

jorisvandenbossche commented Dec 14, 2016

jreback commented Dec 17, 2016

jonmmease commented Dec 17, 2016

goldenbull commented Jun 5, 2017 •

edited

Loading

jreback commented Jun 5, 2017

ENH: Allow the groupby by param to handle columns and index levels (GH5677) #14432

ENH: Allow the groupby by param to handle columns and index levels (GH5677) #14432

Conversation

jonmmease commented Oct 15, 2016 • edited Loading

jreback Oct 19, 2016

Choose a reason for hiding this comment

jonmmease Nov 5, 2016

Choose a reason for hiding this comment

jreback Oct 19, 2016

Choose a reason for hiding this comment

jreback commented Oct 19, 2016 • edited Loading

jonmmease commented Oct 19, 2016

shoyer commented Oct 19, 2016

jonmmease commented Oct 20, 2016

shoyer commented Oct 20, 2016

jreback commented Oct 20, 2016

jonmmease commented Oct 27, 2016

jreback Oct 27, 2016

Choose a reason for hiding this comment

jonmmease Oct 27, 2016

Choose a reason for hiding this comment

codecov-io commented Oct 27, 2016 • edited Loading

Current coverage is 85.30% (diff: 100%)

jorisvandenbossche commented Dec 14, 2016

jreback commented Dec 17, 2016

jonmmease commented Dec 17, 2016

goldenbull commented Jun 5, 2017 • edited Loading

jreback commented Jun 5, 2017

jonmmease commented Oct 15, 2016 •

edited

Loading

jreback commented Oct 19, 2016 •

edited

Loading

codecov-io commented Oct 27, 2016 •

edited

Loading

goldenbull commented Jun 5, 2017 •

edited

Loading