Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Accept list as level for groupby in non-MultiIndexed objects #13907

Closed
wants to merge 1 commit into from
Closed

ENH: Accept list as level for groupby in non-MultiIndexed objects #13907

wants to merge 1 commit into from

Conversation

agraboso
Copy link
Contributor

@agraboso agraboso commented Aug 4, 2016

@@ -2368,6 +2368,13 @@ def _get_grouper(obj, key=None, axis=0, level=None, sort=True,
# axis of the object
if level is not None:
if not isinstance(group_axis, MultiIndex):
if isinstance(level, list) or isinstance(level, range):
if len(level) > 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handle empty case.

Copy link
Member

@sinhrks sinhrks Aug 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only checking the length doesn't meet MI behavior. Following works in MI.

pd.DataFrame([[1, 1], [2, 2], [3, 3]], index=pd.MultiIndex.from_arrays([[1, 1, 1], [1, 2, 3]])).groupby(level=[0, 0]).sum()

I think it should raise if level is not scalar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only checking the length doesn't meet MI behavior.

Done by checking len(set(level)) instead of len(level).

handle empty case.

Done by handling len(set(level)) == 0.

I think it should raise if level is not scalar.

But that's the whole point of this PR and the issue it stems from!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think about it: should we raise an error when given an empty iterable for level instead of making it level=0?

# GH 13901
a = Series([1, 2, 3, 10, 4, 5, 20, 6], Index([1, 2, 3, 1,
4, 5, 2, 6], name='foo'))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use s = Series(....)

create an expected result and use tm.assert_series_equal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I took the liberty of modifying the test right above (which is the one I based mine off) to use s = Series(...) and assert_series_equal.

@codecov-io
Copy link

codecov-io commented Aug 4, 2016

Current coverage is 85.25% (diff: 100%)

Merging #13907 into master will increase coverage by <.01%

@@             master     #13907   diff @@
==========================================
  Files           140        140          
  Lines         50568      50575     +7   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43112      43119     +7   
  Misses         7456       7456          
  Partials          0          0          

Powered by Codecov. Last update a7469cf...fcbb724

@agraboso agraboso changed the title ENH: Accept list and range as levels for groupby in non-MultiIndexed objects ENH: Accept list as level for groupby in non-MultiIndexed objects Aug 4, 2016
s = Series([1, 2, 3, 10, 4, 5, 20, 6],
Index([1, 2, 3, 1, 4, 5, 2, 6], name='foo'))

result = s.groupby(level=[0]).sum()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add:

  • [0, 0], [-1]

@agraboso
Copy link
Contributor Author

agraboso commented Aug 5, 2016

@sinhrks Made all changes and additions.

@@ -2368,11 +2369,21 @@ def _get_grouper(obj, key=None, axis=0, level=None, sort=True,
# axis of the object
if level is not None:
if not isinstance(group_axis, MultiIndex):
if is_list_like(level):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment here why this is needed.

you are setting level, but below we reset to None (see L2389), so something wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment here why this is needed.

Done.

you are setting level, but below we reset to None (see L2389), so something wrong.

L2368-L2369 says that this chunk is indeed just doing validation: if the passed level validates, it is reset to None; if it doesn't, it raises a ValueError. This PR only adds to the validation logic.

@agraboso
Copy link
Contributor Author

agraboso commented Aug 9, 2016

@jreback @sinhrks Check out the latest amendment. How does it look to you?

@@ -437,8 +437,7 @@ API changes
- ``pd.Timedelta(None)`` is now accepted and will return ``NaT``, mirroring ``pd.Timestamp`` (:issue:`13687`)
- ``Timestamp``, ``Period``, ``DatetimeIndex``, ``PeriodIndex`` and ``.dt`` accessor have gained a ``.is_leap_year`` property to check whether the date belongs to a leap year. (:issue:`13727`)
- ``pd.read_hdf`` will now raise a ``ValueError`` instead of ``KeyError``, if a mode other than ``r``, ``r+`` and ``a`` is supplied. (:issue:`13623`)


- ``groupby()`` now accepts ``level=[0]`` (in addition to ``level=0``), ``level=-1`` and ``level=[-1]`` for non-``MultiIndex``ed objects. (:issue:`13907`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too complicated. a user is reading this. just say .groupby() will accept a scalar and a single element list for specifyig level on an Index grouper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jreback jreback mentioned this pull request Aug 11, 2016
4 tasks
@agraboso
Copy link
Contributor Author

@jreback There is a test that fails, but it has nothing to do with groupby and I cannot reproduce it on my machine. Thoughts?

@agraboso
Copy link
Contributor Author

@jreback

@jreback
Copy link
Contributor

jreback commented Sep 9, 2016

can you rebase / update?

@agraboso
Copy link
Contributor Author

Rebased and all green.

@@ -2039,6 +2039,23 @@ def test_mulitindex_passthru(self):
result = df.groupby(axis=1, level=[0, 1]).first()
assert_frame_equal(result, df)

def test_multiindex_negative_level(self):
result = self.mframe.groupby(level=-1).sum()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue reference as a comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jreback jreback added this to the 0.19.0 milestone Sep 20, 2016
@jreback
Copy link
Contributor

jreback commented Sep 20, 2016

ok lgtm. @jorisvandenbossche

@jreback jreback closed this in ebc4ac1 Sep 22, 2016
@jreback
Copy link
Contributor

jreback commented Sep 22, 2016

thanks!

@agraboso agraboso deleted the fix-13901 branch September 23, 2016 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Groupby doesn't accept level=[0] for Index.
4 participants