Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: better error-handling for df.set_index #22484

Closed
h-vetinari opened this issue Aug 23, 2018 · 0 comments

Comments

Projects
None yet
3 participants
@h-vetinari
Copy link
Contributor

commented Aug 23, 2018

splitting up #22236.

Let's have
df = pd.DataFrame(np.random.randn(5, 5), columns=list('ABCDE'))

The error handling of df.set_index can be improved in at least three cases:

  1. df.set_index(['A', 'A'], drop=False) works, while
    df.set_index(['A', 'A'], drop=True) yields
    KeyError: 'A'
  2. Objects of unknown type yield KeyError instead of TypeError:
    df.set_index(map(str, df.A))
    KeyError: "None of [Index([...], dtype='object')] are in the [columns]"
  3. df.set_index(['foo', 'bar', 'baz']) only shows one missing key
    KeyError: 'foo' (in a huge stacktrace)

Better would be:

  1. gracefully handle duplicate column names when drop=True
  2. raise better error message, e.g. TypeError: only allowed types are: ...
  3. Show all missing keys: KeyError: "['foo', 'bar', 'baz']"

@h-vetinari h-vetinari referenced this issue Aug 23, 2018

Merged

API: better error-handling for df.set_index #22486

4 of 4 tasks complete

@h-vetinari h-vetinari changed the title API: improve warnings for df.set_index API: better error-handling for df.set_index Aug 23, 2018

@jreback jreback added this to the 0.24.0 milestone Sep 23, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.