Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: ser.to_frame().all() inconsistent with ser.all() with CategoryDtype #36076

Closed
jbrockmendel opened this issue Sep 2, 2020 · 4 comments · Fixed by #37827
Closed

BUG: ser.to_frame().all() inconsistent with ser.all() with CategoryDtype #36076

jbrockmendel opened this issue Sep 2, 2020 · 4 comments · Fixed by #37827
Labels
API - Consistency Internal Consistency of API/Behavior Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply Numeric Operations Arithmetic, Comparison, and Logical operations Reduction Operations sum, mean, min, max, etc.
Milestone

Comments

@jbrockmendel
Copy link
Member

Based on tests.frame.test_analytics.test_any_all_np_func

ser = pd.Series([0, 1], dtype="category", name="A")
df = ser.to_frame()

>>> ser.all().
TypeError: Categorical cannot perform the operation all

>>> df.all().item()
False

Whats happening here is the DataFrame case is calling self.values which casts to int64 (avoidable with 2D EAs...).

If instead we refactor (xref #35881) to operate blockwise, this block is ignored, so we end up getting the result we'd get on an empty DataFrame, i.e. True. (xref #28900 ignoring failures is a footgun).

Expected Behavior: as long as have ignore_failures, df.all() should be True

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 2, 2020
@jorisvandenbossche
Copy link
Member

Same happens for datetime/timedelta -> #34479 (which came out of the same PR experiments)

@jorisvandenbossche jorisvandenbossche added API - Consistency Internal Consistency of API/Behavior and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 4, 2020
@jbrockmendel
Copy link
Member Author

Any preferences on how to handle the inconsistency?

@jbrockmendel
Copy link
Member Author

From the most recent call, the consensus seems to be that we should deprecate the current behavior in favor of matching the Series behavior

@jorisvandenbossche
Copy link
Member

Yes, I would be in favor of that as well. The user can always convert to boolean to be explicit.

@jbrockmendel jbrockmendel added Numeric Operations Arithmetic, Comparison, and Logical operations Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply Reduction Operations sum, mean, min, max, etc. labels Sep 18, 2020
@jreback jreback added this to the 1.2 milestone Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply Numeric Operations Arithmetic, Comparison, and Logical operations Reduction Operations sum, mean, min, max, etc.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants