Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/API: Accessors like .cat raise AttributeError when invalid #9617

Merged
merged 1 commit into from
Mar 10, 2015

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Mar 9, 2015

AttributeError is really the appropriate error to raise for an invalid
attribute. In particular, it is necessary to ensure that tests like
hasattr(s, 'cat') work consistently on Python 2 and 3: on Python 2,
hasattr(s, 'cat') will return False even if a TypeError was raised, but
Python 3 more strictly requires AttributeError.

This is an unfortunate trap that we should avoid. See this discussion in
Seaborn for a full report:
mwaskom/seaborn#361 (comment)

Note that technically, this is an API change, since these accessors (all but
.str, I think) raised TypeError in the last release.

This also suggests another possibility for testing for Series with a
Categorical dtype (#8814): just use hasattr(s, 'cat') (at least for Python
2 or pandas >=0.16).

CC @mwaskom @jorisvandenbossche @JanSchulz

`AttributeError` is really the appropriate error to raise for an invalid
attribute. In particular, it is necessary to ensure that tests like
`hasattr(s, 'cat')` work consistently on Python 2 and 3: on Python 2,
`hasattr(s, 'cat')` will return `False` even if a `TypeError` was raised, but
Python 3 more strictly requires `AttributeError`.

This is an unfortunate trap that we should avoid. See this discussion in
Seaborn for a full report:
mwaskom/seaborn#361 (comment)

Note that technically, this is an API change, since these accessors (all but
`.str`, I think) raised TypeError in the last release.

This also suggests another possibility for testing for Series with a
Categorical dtype (GH8814): just use `hasattr(s, 'cat')` (at least for Python
2 or pandas >=0.16).

CC mwaskom jorisvandenbossche JanSchulz
@shoyer shoyer force-pushed the accessors-raise-AttributeError branch from 3b58cee to 0b02747 Compare March 9, 2015 08:13
@shoyer
Copy link
Member Author

shoyer commented Mar 9, 2015

OK, it looks like some existing tests relied on the ability to access columns 'cat' or 'dt' on a DataFrame by typing df.cat or df.dt.

I suppose can add those to _internal_names_set strictly for Series... but is seems like fair game (to me) to put them on DataFrame, too.

@jreback
Copy link
Contributor

jreback commented Mar 9, 2015

This is the definition of when to raise TypeError., see here. It is very clear this should be a TypeError. I am -1 on this change. This is not the appropriate way to check for types.

@jankatins
Copy link
Contributor

I think there can be two ways to look at it:

  • if a TypeError is raised, then this is interpreted as "this series can not be converted to a cat"
  • if a AttributeError is raised than this is "this Series has no cat attribute"

I think I like the second one better, as

  • it enables hasattr(s, "cat") as a way to check whether this Series is a categorical with both <=0.14 and >=0.16 without messing with maybe not implemented yet is_categorical_dtype() imports
  • I don't like that all Series have a .cat, .dt or .str attribute, it looks messy...

@jankatins
Copy link
Contributor

If the AttributeError way should be followed, then these attributes should also be removed from the tab completion list.

E.g.

s = Series([...])
s.<tab> # should not show cat
c = s.astype("category")
c.<tab> # should show cat

@shoyer
Copy link
Member Author

shoyer commented Mar 10, 2015

This really seems a like a gray area to me as far as standard Python exceptions go:

  • AttributeError: Raised when an attribute reference (see Attribute references) or assignment fails. (When an object does not support attribute references or attribute assignments at all, TypeError is raised.)
  • TypeError: Raised when an operation or function is applied to an object of inappropriate type.

(Note that the only case I could find for the parenthetical remark about TypeError is when attempting to set an attribute of the built-in object. Series pretty clearly does not fall in this category.)

To me, the error from Series([1, 2, 3]).cat is closer to "an attribute reference fails" than "an operation or function is applied to an object of inappropriate type" (is attribute lookup an operation or function?), but really I can see a case being made either way.

What tips me in the direction of AttributeError is that it plays much more nicely with hasattr and getattr, which exist and will be used like getattr(s, 'cat') whether we encourage it or not. Duck typing is a pretty standard way to do things in Python, and we've had at least two reports from categorical users of using these methods. Plus, it enables a nice safe way to check for Categorical dtype (unlike s.dtype == 'category').

@JanSchulz - we jumped through some hoops to ensure that Series.cat works and Series().cat should not (see #9322). I agree that ideally s.<tab> should not show cat for non-categorical arrays. I think we discussed this in #9322 and decided against it only because the value seemed marginal and implementation would be a little messy (involving some hacks to Series.__dir__). But we can revisit that if you think it's important.

@jreback
Copy link
Contributor

jreback commented Mar 10, 2015

@shoyer ok then. go ahead and merge.

@jreback jreback added API Design Error Reporting Incorrect or improved errors from pandas labels Mar 10, 2015
@jreback jreback added this to the 0.16.0 milestone Mar 10, 2015
@jorisvandenbossche
Copy link
Member

No strong opinion on this, there are good reasons for both types, so ok with me

jreback added a commit that referenced this pull request Mar 10, 2015
BUG/API: Accessors like .cat raise AttributeError when invalid
@jreback jreback merged commit 8fadfa9 into pandas-dev:master Mar 10, 2015
@jreback
Copy link
Contributor

jreback commented Mar 10, 2015

@shoyer

thanks for this.

btw, if someone wants to do a followup at some point to remove the .str/.cat/.dt from the __dir__ on types they don't apply would be gr8. (not that hard, but requires a bit of changing), prob have to make _local_dir a bit more flexible (to add and delete)

@shoyer
Copy link
Member Author

shoyer commented Mar 10, 2015

@jreback do we still need to fix those failing tests? I can do a quick followup....

@shoyer
Copy link
Member Author

shoyer commented Mar 10, 2015

I'm working on a followup here -- the build is currently broken! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants