Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: specify missing keys in KeyError when passing list-like to .loc #34272

Closed
arredond opened this issue May 20, 2020 · 6 comments · Fixed by #34912
Closed

ENH: specify missing keys in KeyError when passing list-like to .loc #34272

arredond opened this issue May 20, 2020 · 6 comments · Fixed by #34912
Labels
Error Reporting Incorrect or improved errors from pandas
Milestone

Comments

@arredond
Copy link
Contributor

I'm a big fan of the strong limitation when passing list-likes to .loc[]. However, when the a key is missing and the KeyError is raised, it'd be helpful to know which key/s is/are missing.

I understand this could have performance implications since each key of the list-like object should be searched, but also think the UX would be improved.

At least, the first missing key could be reported, without hindering performance. I'm happy to provide a PR for this if provided with some guidelines (I suppose this affects both Series and DataFrames, as well as maybe some other parts of the code I'm not aware of).

pd.__version__
# '1.0.3'

s = pd.Series({'a': 1, 'b': 2, 'c': 3})
s.loc['d'] # Raises KeyError but specifies 'd' is missing
s.loc[['a', 'b', 'c', 'd']] # Raises KeyError without specifying which key is missing
@arredond arredond added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels May 20, 2020
@MarcoGorelli MarcoGorelli added Error Reporting Incorrect or improved errors from pandas and removed Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels May 20, 2020
@vipulrai91
Copy link
Contributor

Does this needs to be fixed ? Since mentioned here 'https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike'

This says it is deprecated.

@arredond
Copy link
Contributor Author

Well, I'm not proposing that the behavior be reinstated, just that the KeyError that is thrown be more verbose about what keys are actually missing

@vipulrai91
Copy link
Contributor

I understand this could have performance implications since each key of the list-like object should be searched, but also think the UX would be improved.

Well, I'm not proposing that the behavior be reinstated, just that the KeyError that is thrown be more verbose about what keys are actually missing

I agree , what I meant that this was purposely deprecated to use .reindex() as per doc.
And as you already mentioned it would require iterating through all the elements , I am also of the view that this was done to increase performance.
Just adding my understanding.

@arredond
Copy link
Contributor Author

I may have expressed myself wrong, sorry if that's the case.

When I've run into this error, I thought all the columns I was wondering were present, but that actually wasn't the case. Many times, I had DataFrames with hundreds of columns were maybe just one was missing from the list-like indexer, generally because of a typo in the original CSV I had read.

Since the error gives no information about what was columns were actually missing in my DataFrame, I'd then have to do something like:

missing_cols = [c for c in my_list if c not in my_df.columns]

What I wish is that this behavior of searching the missing columns happened out of the box when raising the KeyError (the missing columns would be listed in the KeyError message). If that were not possible due to performance, at least the first missing column could be mentioned, without hindering performance

@timhunderwood
Copy link
Contributor

timhunderwood commented Jun 20, 2020

I also think this feature would be helpful and a useful enhancement for debugging missing or misspelled columns.

Since it would only find the missing labels if it has already met the missing columns condition, I don't think this would impact performance in most cases pandas/core/indexing.py:1288

I would be happy to try and make the PR.

@jreback
Copy link
Contributor

jreback commented Jun 20, 2020

yep PRs welcome here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants