GroupBy enhancement unifies the return of iterating over GroupBy #42795 #47719

ahmedibrhm · 2022-07-14T14:48:04Z

closes ENH: consistent types in output of df.groupby #42795
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Applied the deprecation in DEPR: returning tuple when grouping by a list containing single element #47761

pep8speaks · 2022-07-14T19:55:01Z

Hello @ahmedibrhm! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-07-26 22:07:40 UTC

…sue6

rhshadrach

It can be useful to put up a proof of concept to demonstrate what a behavior change would look like once a deprecation is enforced. However, when doing so it would be helpful to mark the PR as a draft since we do not want to merge it yet.

There are also many things changing here that I would not expect. I think if you aren't iterating over the group, then there should be no change in behavior. I highlighted a few examples below.

rhshadrach · 2022-07-23T16:21:49Z

pandas/tests/groupby/test_groupby.py

@@ -806,7 +806,7 @@ def test_groupby_as_index_cython(df):
    msg = "The default value of numeric_only"
    with tm.assert_produces_warning(FutureWarning, match=msg):
        result = grouped.mean()
-        expected = data.groupby(["A"]).mean()
+        expected = data.groupby("A").mean()


Why is this changing?

As I mentioned in #47761
I thought it's better to generalize the rule of using not using a list when grouping by a single key as groupby is being iterated over in other functions. So I thought it will be a good idea to generalize the rule.

I thought it's better to generalize the rule of...

I do not understand what this means. Can you expand on it? Also, it's not clear to me - is the result of data.groupby(["A"]).mean() different from what the main branch currently produces?

rhshadrach · 2022-07-23T16:22:13Z

pandas/plotting/_matplotlib/hist.py

+                bymodi = fix_groupby_singlelist_input(self.by)
+                grouped = self.data.groupby(bymodi)[self.columns]


Why is this necessary?

because .hist and .box use groupby internally in a single way. For example if I did hist by ['a','b','c','d'] the results will be like (a,), (b,), (c,), (d,).
some plotting functions and the pivot table are actually iterating over groupby.

But here, grouped is only being used in L66 immediately below, right?

self.bins = [self._calculate_bins(group) for key, group in grouped]

In this usage, only the group is being used and the key is ignored. So why is this needed if it's only the key changing?

…sue6

ahmedibrhm · 2022-07-27T03:13:22Z

@rhshadrach
Do you think in changing the behaviour we should change it from its root or only the final result.

What I mean is that when iterating over groupby the group_keys_seq variable appear, so do you think we should change the variable itself in case the user passed single element in a list or just change the iter function to return the required behavior?
I am asking this because group_keys_seq appear in other different functions so it may affect them as well.

github-actions · 2022-08-27T00:06:16Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

mroeschke · 2022-11-18T19:58:57Z

@ahmedibrhm are you interested in continuing this PR and applying the deprecation? We are looking to enforce this for the next release

mroeschke · 2022-12-05T18:01:19Z

Thanks for the pull request, but I believe this was handled by #50064 so closing. If I misunderstood happy to reopen

ahmedibrhm added 9 commits July 8, 2022 00:29

DOC pandas-dev#45443 edited the documentation of where/mask functions

54c5068

DOC pandas-dev#45443 edited the documentation of where/mask functions

2951fb1

Merge branch 'main' into main

6335204

Update generic.py

8afd6a1

Merge branch 'pandas-dev:main' into main

6a7ede4

Merge branch 'pandas-dev:main' into main

eb0ed28

groupby enahn

83ca209

fixing pivot

a0b3a59

fixing ops

153bbe5

ahmedibrhm added 20 commits July 14, 2022 12:55

syntax

43d1f92

editting test apply

242468c

removing testing lines

9717f5d

edit pivot

5d7331e

Merge branch 'main' into issue6

a6829a7

Merge branch 'issue6' of https://github.com/ahmedibrhm/pandas into is…

b0abd59

…sue6

edit pivot lib

c709311

edit pivot

4e14c87

pivot

09dec70

editting ops

d1e9525

skipping tests for changing the inputs

6b5d26b

adding tests

b7c797a

adding test groupby

600cdd9

editing tests

1cb253d

tests

c0ef8b6

tests

ef05b5b

tests

bd157c0

testing

0e10d13

Merge branch 'main' into issue6

a4d58c8

pivot editing

7efc7ef

ahmedibrhm added 8 commits July 18, 2022 07:49

editting plotting

6464033

hist and box

d745717

box

b788c0e

hist fix

143466e

boxplot

56f5aa3

bp

93889dd

box and hist

a12a6fb

plotting

fdea4a8

ahmedibrhm marked this pull request as ready for review July 19, 2022 22:53

Merge branch 'main' into issue6

871fec8

ahmedibrhm marked this pull request as draft July 19, 2022 23:38

ahmedibrhm added 2 commits July 19, 2022 16:44

Update _core.py

ebf7b92

Merge branch 'main' into issue6

6ea317e

ahmedibrhm mentioned this pull request Jul 20, 2022

DEPR: returning tuple when grouping by a list containing single element #47761

Merged

5 tasks

ahmedibrhm marked this pull request as ready for review July 20, 2022 20:45

ahmedibrhm mentioned this pull request Jul 20, 2022

ENH: consistent types in output of df.groupby #42795

Closed

mroeschke requested a review from rhshadrach July 22, 2022 17:38

mroeschke added Enhancement Groupby labels Jul 22, 2022

rhshadrach reviewed Jul 23, 2022

View reviewed changes

ahmedibrhm marked this pull request as draft July 26, 2022 15:32

ahmedibrhm added 3 commits July 26, 2022 14:40

unnecessary tests

20c65a7

Merge branch 'issue6' of https://github.com/ahmedibrhm/pandas into is…

78e1e04

…sue6

Merge branch 'main' into issue6

43a89e9

github-actions bot added the Stale label Aug 27, 2022

mroeschke closed this Dec 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GroupBy enhancement unifies the return of iterating over GroupBy #42795 #47719

GroupBy enhancement unifies the return of iterating over GroupBy #42795 #47719

ahmedibrhm commented Jul 14, 2022 •

edited

Loading

pep8speaks commented Jul 14, 2022 •

edited

Loading

rhshadrach left a comment

rhshadrach Jul 23, 2022

ahmedibrhm Jul 23, 2022

rhshadrach Jul 23, 2022

rhshadrach Jul 23, 2022

ahmedibrhm Jul 23, 2022

rhshadrach Jul 23, 2022

ahmedibrhm commented Jul 27, 2022

github-actions bot commented Aug 27, 2022

mroeschke commented Nov 18, 2022

mroeschke commented Dec 5, 2022

		bymodi = fix_groupby_singlelist_input(self.by)
		grouped = self.data.groupby(bymodi)[self.columns]

GroupBy enhancement unifies the return of iterating over GroupBy #42795 #47719

GroupBy enhancement unifies the return of iterating over GroupBy #42795 #47719

Conversation

ahmedibrhm commented Jul 14, 2022 • edited Loading

pep8speaks commented Jul 14, 2022 • edited Loading

Comment last updated at 2022-07-26 22:07:40 UTC

rhshadrach left a comment

Choose a reason for hiding this comment

rhshadrach Jul 23, 2022

Choose a reason for hiding this comment

ahmedibrhm Jul 23, 2022

Choose a reason for hiding this comment

rhshadrach Jul 23, 2022

Choose a reason for hiding this comment

rhshadrach Jul 23, 2022

Choose a reason for hiding this comment

ahmedibrhm Jul 23, 2022

Choose a reason for hiding this comment

rhshadrach Jul 23, 2022

Choose a reason for hiding this comment

ahmedibrhm commented Jul 27, 2022

github-actions bot commented Aug 27, 2022

mroeschke commented Nov 18, 2022

mroeschke commented Dec 5, 2022

ahmedibrhm commented Jul 14, 2022 •

edited

Loading

pep8speaks commented Jul 14, 2022 •

edited

Loading