Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pd.MultiIndex.isin doesn't validate input correctly #26622

Open
telamonian opened this issue Jun 2, 2019 · 0 comments
Open

pd.MultiIndex.isin doesn't validate input correctly #26622

telamonian opened this issue Jun 2, 2019 · 0 comments
Labels
Bug Error Reporting Incorrect or improved errors from pandas isin isin method MultiIndex

Comments

@telamonian
Copy link

Code Sample, a copy-pastable example if possible

mix = pd.MultiIndex.from_product([['foo', 'bar'], ['one', 'two'], ['A', 'B']])

mix.isin([('foo', 'one', 'B'), ('bar', 'two', 'A')]) # returns `[False, True, False, False, False, False, True, False]`

mix.isin([('bar',), ('foo', 'one', 'B'), ('bar',)]) # should raise, returns [False, True, False, False, False, False, False, False] instead

Problem description

I got bit this morning by some improperly structured arguments for MultiIndex.isin. It turns out that as long as one value in the input values is the right length, isin doesn't care if every other value is too short. On the other hand, isin will error out in many other cases relating to the lengths of elements of input. With respect to my above example, these all raise an error:

mix.isin([('foo', 'one'), ('bar', 'two')]) # raises `ValueError: Length of names must match number of levels in MultiIndex.`

mix.isin([('foo', 'one', 'B'), ('bar', 'two', 'A', 'alpha')]) # raises `ValueError: Length of names ...`

mix.isin([('bar',), ('foo', 'one', 'B'), ('bar', 'two', 'A', 'alpha')]) # raises `ValueError: Length of names ...`

but this hot mess executes normally:

mix.isin([('foo', 'one', 'B'), *(('bar',),)*33]) # returns [False,  True, False, False, False, False, False, False]

I'd like to see two fixes:

  • If elements of values are too short, isin should raise an error, just like it does in cases when elements are too long.

  • The current error message for invalid value length is a little confusing, since it refers to Length of names, even though names isn't an argument for isin. It turns out the reason for this is that the error is raised by the function _set_names during the construction of a new MultiIndex that isin uses to do validation.

I'll submit a PR

telamonian added a commit to telamonian/pandas that referenced this issue Jun 2, 2019
@simonjayhawkins simonjayhawkins added Error Reporting Incorrect or improved errors from pandas MultiIndex labels Jun 2, 2019
@jbrockmendel jbrockmendel added the isin isin method label Oct 30, 2020
@mroeschke mroeschke added the Bug label Jul 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Error Reporting Incorrect or improved errors from pandas isin isin method MultiIndex
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants