Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in xs raising KeyError for MultiIndex columns with droplevel False and list indexe #41789

Merged
merged 8 commits into from Jun 4, 2021

Conversation

phofl
Copy link
Member

@phofl phofl commented Jun 2, 2021

cc @simonjayhawkins Should we include in 1.2.5 or push to 1.3?

@phofl phofl added Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version labels Jun 2, 2021
@simonjayhawkins
Copy link
Member

cc @simonjayhawkins Should we include in 1.2.5 or push to 1.3?

this seems backportable. will run a bisect on the issue to get some more context.

@simonjayhawkins simonjayhawkins added this to the 1.2.5 milestone Jun 3, 2021
@simonjayhawkins
Copy link
Member

@phofl is this a bug or is the user expecting output that may have been incorrect in 1.1.5.

IIUC key is an index label or tuple of index label.

if a tuple, represents more than one level of the index to match.

a list should not be an allowable input. As a sequence should it be treated as a tuple and raise KeyError as it does now in 1.2.x/master.

or should we validate the key and raise TypeError?

@phofl
Copy link
Member Author

phofl commented Jun 3, 2021

Good point, thanks for bringing this up. You are correct this should not work at all.

In general I think a list is counterintuitive as an input, because in loc/iloc we treat list as single level indexers, so we should disallow them? We have one test using this, but there are two levels given, so it is implicitly clear, that the user tries to use the list elements on different levels df.xs([2008, "sat"], level=["year", "day"], drop_level=False).

So for the reason of backwarts compatibility we should probably raise, if a list is given and the number of levels does not match the length of the list (like in this case)?

I am not really happy with the list case above, but we probably have to deprecate instead of removing outright.

@jreback
Copy link
Contributor

jreback commented Jun 3, 2021

Good point, thanks for bringing this up. You are correct this should not work at all.

In general I think a list is counterintuitive as an input, because in loc/iloc we treat list as single level indexers, so we should disallow them? We have one test using this, but there are two levels given, so it is implicitly clear, that the user tries to use the list elements on different levels df.xs([2008, "sat"], level=["year", "day"], drop_level=False).

So for the reason of backwarts compatibility we should probably raise, if a list is given and the number of levels does not match the length of the list (like in this case)?

I am not really happy with the list case above, but we probably have to deprecate instead of removing outright.

I agree that we should remove this case. Can you add a deprecation warning in this cse? if so then ok to backport. (.xs is itself kind of janky anyhow), though we don't allow drop_level via .loc.....

@jreback
Copy link
Contributor

jreback commented Jun 3, 2021

cc @jbrockmendel if comments here

@jbrockmendel
Copy link
Member

We have one test using this, but there are two levels given, so it is implicitly clear, that the user tries to use the list elements on different levels df.xs([2008, "sat"], level=["year", "day"], drop_level=False).

So the idea is that we would deprecate allowing this so the user would need to pass either (2008, "sat") or [(2008, "sat")]?

though we don't allow drop_level via .loc.....

xref #35418 but i dont think there's any appetite for making loc more complicated

@phofl
Copy link
Member Author

phofl commented Jun 3, 2021

Yes, disallowing lists completly, e.g. user has to pass (2008, "sat").

@jbrockmendel
Copy link
Member

Yes, disallowing lists completly, e.g. user has to pass (2008, "sat").

im on board

@phofl
Copy link
Member Author

phofl commented Jun 3, 2021

Will repurpose this pr with the changes

@phofl
Copy link
Member Author

phofl commented Jun 3, 2021

Deprecated list as key with same length as level and raised when key has different length than level.

Added the whatsnew to 1.3. Would move if we want to backport this?

@jreback
Copy link
Contributor

jreback commented Jun 3, 2021

Deprecated list as key with same length as level and raised when key has different length than level.

Added the whatsnew to 1.3. Would move if we want to backport this?

this is ok to backport i think


if isinstance(key, list):
len_levels = 1 if level is None or is_scalar(level) else len(level)
if len(key) != len_levels:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this break any existing tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One test_xs_partial

Could deprecate this too, if you prefer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah let's just deprecate entirely (prob easier to remember too)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@phofl
Copy link
Member Author

phofl commented Jun 3, 2021

Moved whatsnew

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jreback jreback merged commit 75b25b3 into pandas-dev:master Jun 4, 2021
@jreback
Copy link
Contributor

jreback commented Jun 4, 2021

@meeseeksdev backport 1.2.x

@jreback
Copy link
Contributor

jreback commented Jun 4, 2021

thanks @phofl

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Jun 4, 2021
…ex columns with droplevel False and list indexe
@lumberbot-app
Copy link

lumberbot-app bot commented Jun 4, 2021

Something went wrong ... Please have a look at my logs.

@phofl phofl deleted the 41760 branch June 4, 2021 08:09
~~~~~~~~~~~~

- Deprecated passing lists as ``key`` to :meth:`DataFrame.xs` and :meth:`Series.xs` (:issue:`41760`)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can backport this since it violates our policy.

pandas will introduce deprecations in **minor** releases. These deprecations
will preserve the existing behavior while emitting a warning that provide
guidance on:

* How to achieve similar behavior if an alternative is available
* The pandas version in which the deprecation will be enforced.

We will not introduce new deprecations in patch releases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will make a pr later to move back to 1.3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. just need to move the release note on master. I'll close the backport PR.

@simonjayhawkins simonjayhawkins modified the milestones: 1.2.5, 1.3 Jun 4, 2021
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: XS for multiple keys in combination with drop_level=False
4 participants