ENH: KeyError from missing column should list available columns #50076

janosh · 2022-12-05T21:36:31Z

Feature Type

Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas

Problem Description

import pandas as pd

df = pd.util.testing.makeMixedDataFrame()

print(f'{list(df)=}')
>>> list(df)=['A', 'B', 'C', 'D']

df[['foo']]
>>> KeyError: "['foo'] not in index"

Feature Description

The error message would be more helpful if it listed available columns:

df[['foo']]
>>> KeyError: "['foo'] not in columns=['A', 'B', 'C', 'D']"

Alternative Solutions

n/a

Additional Context

No response

rhshadrach · 2022-12-05T22:33:16Z

pandas DataFrames can have a massive number of columns which would either (a) overload the stdout or (b) we would need to truncate the output. Even when there are few colums, we'd need to worry about the repr of the individual columns being long themselves.

janosh · 2022-12-05T22:42:29Z

Could check if

if len(', '.join(df)) < some_treshold:
    raise KeyError(f"['foo'] not in columns={', '.join(df)}")

and make some_treshold configurable via, say, pd.options.key_errors.max_col_list_len.

mroeschke · 2022-12-05T23:28:15Z

Yeah as is I would be -0.5 to include this given @rhshadrach concerns

In an interactive environment, one can quickly access df.columns or df.index after the error
In a script/process, I guess it could be useful in the traceback with error logging infrastructure but may be too verbose more often than not

janosh · 2022-12-05T23:36:35Z

but may be too verbose more often than not

More often than not column count and name lengths should be manageable, no?

In a script/process, I guess it could be useful in the traceback with error logging infrastructure

Exactly, that's my use case! When a job fails and I only see it several hours later in a workflow with a dozen different dataframes, it can be hard to determine which data access is failing and how to fix it. I usually have to rerun the script interactively and print column names to determine the fix.

kostyafarber · 2022-12-09T13:59:07Z

Hey I'd like to work on this. Do we want to go ahead with making these changes?

Or are we not fully sold on this idea yet.

phofl · 2022-12-09T14:33:57Z

This needs more discussion first.

I am also leaning more towards no. We don't want to have a million options

janosh added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 5, 2022

rhshadrach added the Error Reporting Incorrect or improved errors from pandas label Dec 5, 2022

rhshadrach added Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: KeyError from missing column should list available columns #50076

ENH: KeyError from missing column should list available columns #50076

janosh commented Dec 5, 2022 •

edited

rhshadrach commented Dec 5, 2022 •

edited

janosh commented Dec 5, 2022

mroeschke commented Dec 5, 2022

janosh commented Dec 5, 2022

kostyafarber commented Dec 9, 2022 •

edited

phofl commented Dec 9, 2022

ENH: KeyError from missing column should list available columns #50076

ENH: KeyError from missing column should list available columns #50076

Comments

janosh commented Dec 5, 2022 • edited

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

rhshadrach commented Dec 5, 2022 • edited

janosh commented Dec 5, 2022

mroeschke commented Dec 5, 2022

janosh commented Dec 5, 2022

kostyafarber commented Dec 9, 2022 • edited

phofl commented Dec 9, 2022

janosh commented Dec 5, 2022 •

edited

rhshadrach commented Dec 5, 2022 •

edited

kostyafarber commented Dec 9, 2022 •

edited