Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: Performance of DataFrame axis=1 reduction ops with EA #51955

Closed
wants to merge 3 commits into from

Conversation

rhshadrach
Copy link
Member

@rhshadrach rhshadrach commented Mar 14, 2023

Ref: #51335 (comment)

I think #51923 is the right way to go, but it looks like it will take quite a bit more work. Partial revert of #51335 for axis=1 ops in the meantime.

       before           after         ratio
     [6169cba7]       [df69e36f]
     <axis_1_regr_revert~3>       <axis_1_regr_revert>
-     25.0±0.08ms      18.6±0.09ms     0.74  stat_ops.FrameOps.time_op('median', 'float', 1)
-      23.9±0.2ms       17.1±0.2ms     0.72  stat_ops.FrameOps.time_op('median', 'int', 1)
-     21.5±0.08ms       14.9±0.2ms     0.69  stat_ops.FrameOps.time_op('sem', 'float', 1)
-      21.1±0.8ms       14.6±0.7ms     0.69  stat_ops.FrameOps.time_op('skew', 'float', 1)
-      20.7±0.6ms       14.3±0.8ms     0.69  stat_ops.FrameOps.time_op('kurt', 'float', 1)
-     18.8±0.05ms       12.8±0.4ms     0.68  stat_ops.FrameOps.time_op('sem', 'int', 1)
-     16.4±0.03ms       10.1±0.4ms     0.61  stat_ops.FrameOps.time_op('skew', 'int', 1)
-        15.2±1ms       8.15±0.8ms     0.54  stat_ops.FrameOps.time_op('kurt', 'int', 1)
-     13.9±0.04ms       7.32±0.2ms     0.53  stat_ops.FrameOps.time_op('std', 'float', 1)
-      13.9±0.2ms      7.07±0.02ms     0.51  stat_ops.FrameOps.time_op('var', 'float', 1)
-     11.4±0.04ms       4.85±0.2ms     0.43  stat_ops.FrameOps.time_op('std', 'int', 1)
-      11.3±0.1ms      4.58±0.04ms     0.40  stat_ops.FrameOps.time_op('var', 'int', 1)
-     9.54±0.04ms      2.98±0.08ms     0.31  stat_ops.FrameOps.time_op('prod', 'float', 1)
-     8.93±0.02ms      2.27±0.01ms     0.25  stat_ops.FrameOps.time_op('mean', 'int', 1)
-     8.22±0.03ms      1.52±0.03ms     0.19  stat_ops.FrameOps.time_op('prod', 'int', 1)
-     8.02±0.04ms      1.31±0.01ms     0.16  stat_ops.FrameOps.time_op('sum', 'int', 1)
-     7.70±0.03ms          909±9μs     0.12  stat_ops.FrameOps.time_op('mean', 'float', 1)
-     7.69±0.02ms         879±40μs     0.11  stat_ops.FrameOps.time_op('sum', 'float', 1)
-      10.4±0.01s       97.4±0.4ms     0.01  stat_ops.FrameOps.time_op('var', 'Int64', 1)
-      10.4±0.09s       97.3±0.3ms     0.01  stat_ops.FrameOps.time_op('std', 'Int64', 1)
-      19.1±0.05s        128±0.7ms     0.01  stat_ops.FrameOps.time_op('sem', 'Int64', 1)
-      6.65±0.01s       22.6±0.3ms     0.00  stat_ops.FrameOps.time_op('sum', 'Int64', 1)
-      12.2±0.04s       40.4±0.3ms     0.00  stat_ops.FrameOps.time_op('skew', 'Int64', 1)
-      12.5±0.03s       39.5±0.2ms     0.00  stat_ops.FrameOps.time_op('kurt', 'Int64', 1)
-         8.94±0s       24.3±0.3ms     0.00  stat_ops.FrameOps.time_op('mean', 'Int64', 1)

@rhshadrach rhshadrach added Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version Reduction Operations sum, mean, min, max, etc. labels Mar 14, 2023
@rhshadrach rhshadrach added this to the 2.0 milestone Mar 14, 2023
@jbrockmendel
Copy link
Member

Just realized this is separate from #51923. Should this take priority over that? (I think the solution suggested there yesterday is probably the "right" long-term solution)

@rhshadrach
Copy link
Member Author

@jbrockmendel - I wanted to give this as an option and see what others think. I think the perf regression outweighs the behavior improvement in this case, but I'm also okay sticking with 2.0.x as-is.

@rhshadrach
Copy link
Member Author

I don't think we should do #51923 for 2.0 even if there was time to squeeze it in.

@jbrockmendel
Copy link
Member

I'm generally a believer in "correctness over performance," but this is a cornery enough case im not inclined to argue about it.

@rhshadrach
Copy link
Member Author

Hah; same. With 2.0rc1 shipped, that pushes me toward leaving 2.0.x as is. I plan on working on improvements for 2.1. Closing.

@jbrockmendel
Copy link
Member

@rhshadrach i have tabs open for this and #51923. Am I right in thinking that this is the one you are leaning towards for 2.0 and 51923 should be thought of as an option for 2.1?

@rhshadrach
Copy link
Member Author

@jbrockmendel - yes, that's correct.

@jbrockmendel
Copy link
Member

I'm finding myself going through the diff side-by-side with the diff from #51335 and trying to sort out what is and isnt a reversion. Would it be easier to revert that entirely and then do another PR to un-revert the parts we want to keep?

@rhshadrach
Copy link
Member Author

@jbrockmendel - sure that sounds good.

@rhshadrach
Copy link
Member Author

Unfortunately we can't automatically revert that PR because other changes. I can to manually revert, but won't have too much time until Sunday (currently travelling).

@rhshadrach rhshadrach mentioned this pull request Mar 27, 2023
5 tasks
@phofl
Copy link
Member

phofl commented Apr 2, 2023

the other pr supersedes this one?

@datapythonista datapythonista modified the milestones: 2.0, 2.1 Apr 3, 2023
@datapythonista
Copy link
Member

Release day, moving this to the 2.1 milestone.

@datapythonista datapythonista modified the milestones: 2.1, 2.0.1 Apr 3, 2023
@datapythonista
Copy link
Member

Sorry, my bad, the next release should be 2.0.1, changed it.

@rhshadrach
Copy link
Member Author

the other pr supersedes this one?

The other PR (#52250) precedes this one, as was created by the request above (#51955 (comment))

@rhshadrach
Copy link
Member Author

I don't see any reasonable way to support axis=1 with tzaware dtypes by operating on NumPy arrays - so I think there is no way forward in this direction.

@rhshadrach rhshadrach closed this Apr 8, 2023
@rhshadrach rhshadrach deleted the axis_1_regr_revert branch April 16, 2023 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Reduction Operations sum, mean, min, max, etc. Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants