COMPAT: Emit warning when groupby by a tuple #18731

TomAugspurger · 2017-12-11T19:05:39Z

TomAugspurger · 2017-12-11T19:05:56Z

codecov · 2017-12-11T23:04:26Z

Codecov Report

Merging #18731 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #18731      +/-   ##
==========================================
- Coverage   91.64%   91.62%   -0.02%     
==========================================
  Files         154      154              
  Lines       51401    51408       +7     
==========================================
- Hits        47106    47104       -2     
- Misses       4295     4304       +9

Flag	Coverage Δ
#multiple	`89.49% <100%> (ø)`	⬆️
#single	`40.83% <0%> (-0.13%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/groupby.py	`92.07% <100%> (+0.02%)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.68% <0%> (-0.11%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fb178fc...3057aab. Read the comment docs.

toobaz · 2017-12-11T23:09:28Z

pandas/core/groupby.py

@@ -2850,7 +2850,15 @@ def _get_grouper(obj, key=None, axis=0, level=None, sort=True,
    elif isinstance(key, BaseGrouper):
        return key, [], obj

-    # Everything which is not a list is a key (including tuples):
+    tuple_as_list = isinstance(key, tuple) and key not in obj


The only disadvantage I see with this approach is that

pd.DataFrame(1, index=range(3), columns=pd.MultiIndex.from_product([[1, 2], [3,4]])).groupby((7, 8)).mean()

will raise KeyError: 7 while KeyError: (7,8) would be more correct. Do you think

isinstance(key, tuple) and key not in obj and set(key).issubset(obj)

is too expensive?

this change would be ok

(Except that, as @jorisvandenbossche reminded below, set(key) could contains non-hashable objects, so this possibility should be catched)

toobaz · 2017-12-11T23:09:34Z

pandas/tests/groupby/test_groupby.py

+        with tm.assert_produces_warning(FutureWarning) as w:
+            df[['a', 'b', 'c']].groupby(('a', 'b')).c.mean()
+
+        assert "Interpreting tuple 'by' as a list" in str(w[0].message)


Isn't this the same as above?

jorisvandenbossche

The example from the docs (see top post #18314 (comment) for the code to reproduce) is still failing with this branch.

jorisvandenbossche · 2017-12-12T09:36:36Z

doc/source/whatsnew/v0.22.0.txt

@@ -202,6 +202,9 @@ Deprecations
 - ``Series.from_array`` and ``SparseSeries.from_array`` are deprecated. Use the normal constructor ``Series(..)`` and ``SparseSeries(..)`` instead (:issue:`18213`).
 - ``DataFrame.as_matrix`` is deprecated. Use ``DataFrame.values`` instead (:issue:`18458`).
 - ``Series.asobject``, ``DatetimeIndex.asobject``, ``PeriodIndex.asobject`` and ``TimeDeltaIndex.asobject`` have been deprecated. Use ``.astype(object)`` instead (:issue:`18572`)
+- Grouping by a tuple of keys now emits a ``FutureWarning`` and is deprecated.
+  In the future, a tuple passed to ``'by'`` will always refer to a single key
+  that is the actual tuple, instead of treating the tuple as multiple keys (:issue:`18314`)


mention you can simply replace the tuple with a list

jorisvandenbossche · 2017-12-12T09:41:03Z

pandas/core/groupby.py

+        msg = ("Interpreting tuple 'by' as a list of keys, rather than "
+               "a single key. Use 'by={!r}' instead of 'by={!r}'. In the "
+               "future, a tuple will always mean a single key.".format(
+                   list(key), key))


the key can contain a long array or column, so not sure it is a good idea to format it like this into the message.

I thought NumPy's short repr kicked in sooner that it does. I'll fix this

pep8speaks · 2017-12-15T20:14:56Z

Hello @TomAugspurger! Thanks for updating the PR.

In the file pandas/core/groupby.py, following are the PEP8 issues :

Line 2866:17: W503 line break before binary operator

Comment last updated on December 18, 2017 at 12:53 Hours UTC

toobaz · 2017-12-15T20:37:28Z

pandas/core/groupby.py

+    all_hashable = is_tuple and all(is_hashable(x) for x in key)
+
+    if is_tuple:
+        if not all_hashable or key not in obj:


I'm lost. Why do you check that elements are not hashable? I would have done instead

if all_hashable and key not in obj and set(key).issubset(obj):

or (if we want to account for the to-be-deprecated possibility to index with missing keys):

if all_hashable and key not in obj and set(key) & (obj):

Or better - performance-wise:

if key not in obj and all(is_hashable(x) for x in key) and set(key).issubset(obj):

This is for the case where you're grouping by non-hashable arrays like in #18314 (comment)

In that case, don't we know that they're certainly relying on groupby((a, b)) to be groupby([a, b]), so we want to warn and listify?

I do still need to handle your KeyError example.

Ahh, I see what you're doing now. Yes, that's probably better, and will make handling the KeyError easier.

TomAugspurger · 2017-12-15T21:12:52Z

Huh about the example:

df = pd.DataFrame(1, index=range(3), columns=pd.MultiIndex.from_product([[1, 2], [3,4]]))
df.groupby((7, 8)).mean()

On master that gives me

Out[4]:
   1     2
   3  4  3  4
7  1  1  1  1
8  1  1  1  1

Is that correct? That seems like it should throw a KeyError, right?

Opened #18798 for that.

TomAugspurger · 2017-12-15T21:21:08Z

OK, updated to use your suggestion @toobaz, with a slight modification so that we warn when either

the tuple isn't a valid key and elements are hashable and all of the elements are valid keys
any of the elements are not hashable

toobaz · 2017-12-15T21:50:09Z

Yeah, I think that's perfect (I had forgot case 2.). Two small comments:

you could replace all(is_hashable(x) for x in key) with is_hashable(key) - but without significant performance gain, so not sure it's an improvement
the duplicated test is still there

TomAugspurger · 2017-12-15T23:09:24Z

Thanks, fixed. Should be good to go hopefully.

jreback · 2017-12-18T12:48:11Z

lgtm. needs a rebase to fix conflict. merge on green.

COMPAT: Emit warning when groupby by a tuple

209ffce

Closes pandas-dev#18314

TomAugspurger added this to the 0.22.0 milestone Dec 11, 2017

TomAugspurger added the Groupby label Dec 11, 2017

TomAugspurger mentioned this pull request Dec 11, 2017

DOC: fix options table #18730

Merged

jsexauer mentioned this pull request Dec 11, 2017

DEPR: Clean up list of deprecations from prior versions #6581

Closed

1 task

DOC: avoid future warning

ad09ade

toobaz reviewed Dec 11, 2017

View reviewed changes

jreback added Deprecate Functionality to remove in pandas MultiIndex labels Dec 12, 2017

jorisvandenbossche reviewed Dec 12, 2017

View reviewed changes

TomAugspurger added 2 commits December 15, 2017 13:56

Merge remote-tracking branch 'upstream/master' into warn-groupby-tuple

a489b20

Cleanup, test unhashable

ade2b2b

PEP8

6050226

toobaz reviewed Dec 15, 2017

View reviewed changes

TomAugspurger added 2 commits December 15, 2017 15:01

Correct KeyError

d8c20e8

update

d2a2372

TomAugspurger added 3 commits December 15, 2017 15:17

xfail

38ef818

remove old comments

a27f449

pep8

4e5ae9f

toobaz mentioned this pull request Dec 15, 2017

Groupby missing tuple doesn't throw #18798

Closed

Fixups

a8b4383

Merge remote-tracking branch 'upstream/master' into warn-groupby-tuple

3057aab

TomAugspurger merged commit b6a7cc9 into pandas-dev:master Dec 18, 2017

TomAugspurger deleted the warn-groupby-tuple branch December 18, 2017 18:37

jreback mentioned this pull request Nov 29, 2019

DEPR: deprecations log for removed issues #13777

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

COMPAT: Emit warning when groupby by a tuple #18731

COMPAT: Emit warning when groupby by a tuple #18731

TomAugspurger commented Dec 11, 2017

TomAugspurger commented Dec 11, 2017

codecov bot commented Dec 11, 2017 •

edited

Loading

toobaz Dec 11, 2017

jreback Dec 12, 2017

toobaz Dec 12, 2017

toobaz Dec 11, 2017

jorisvandenbossche left a comment

jorisvandenbossche Dec 12, 2017

jorisvandenbossche Dec 12, 2017

TomAugspurger Dec 12, 2017

pep8speaks commented Dec 15, 2017 •

edited

Loading

toobaz Dec 15, 2017

toobaz Dec 15, 2017

TomAugspurger Dec 15, 2017

TomAugspurger Dec 15, 2017

TomAugspurger Dec 15, 2017

TomAugspurger commented Dec 15, 2017 •

edited

Loading

TomAugspurger commented Dec 15, 2017

toobaz commented Dec 15, 2017

TomAugspurger commented Dec 15, 2017

jreback commented Dec 18, 2017

COMPAT: Emit warning when groupby by a tuple #18731

COMPAT: Emit warning when groupby by a tuple #18731

Conversation

TomAugspurger commented Dec 11, 2017

TomAugspurger commented Dec 11, 2017

codecov bot commented Dec 11, 2017 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pep8speaks commented Dec 15, 2017 • edited Loading

Comment last updated on December 18, 2017 at 12:53 Hours UTC

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Dec 15, 2017 • edited Loading

TomAugspurger commented Dec 15, 2017

toobaz commented Dec 15, 2017

TomAugspurger commented Dec 15, 2017

jreback commented Dec 18, 2017

codecov bot commented Dec 11, 2017 •

edited

Loading

pep8speaks commented Dec 15, 2017 •

edited

Loading

TomAugspurger commented Dec 15, 2017 •

edited

Loading