-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(python): Avoid dispatching Series.head/tail
to the expression engine
#12946
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why dispatching to DataFrame is so much slower? 🤔
From my research it's due to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we'll have to put some thought into the dispatching. Especially when it comes to often-used/cheap methods.
Doing the more naive version seems to be very fast go("plain_series.head(0)")
go("plain_series.head(100)")
go("plain_series.tail(0)")
go("plain_series.tail(100)")
go("plain_series.to_frame().head(0).to_series()")
go("plain_series.to_frame().head(100).to_series()")
go("plain_series.to_frame().tail(0).to_series()")
go("plain_series.to_frame().tail(100).to_series()")
using that on a df of one col as reference df_one_col = plain_series.to_frame()
go("df_one_col.head(0)")
go("df_one_col.head(100)")
go("df_one_col.tail(0)")
go("df_one_col.tail(100)")
3x slower while not great is much better than the current 60+ times. |
an polars/crates/polars-lazy/src/physical_plan/expressions/slice.rs Lines 82 to 93 in 16c5dbe
on the other hand, a polars/crates/polars-core/src/frame/mod.rs Lines 2263 to 2273 in 16c5dbe
But currently this means that doing (1) is much slower than doing (2) (for literal
*edit: |
I've added direct dispatch to a lot more Series functions, will update momentarily. |
That's a bit premature. I'd like to discuss this with the team first. |
Hi @stinodego, no problem. I had assumed that any argument for direct dispatch to
|
I'll push for now so it's easier to gauge the impact, we can always revert anything we need. |
head
/tail
directly to Series
Series.head/tail
to the expression engine
@mcrumiller can you revert this PR to the head and tail then we can merge this? You can open an issue for the other ones. |
b1c8852
to
179b1e5
Compare
@ritchie46 done. I'll make a separate issue to discuss specialized series impls for some of the other functions. |
Resolves #12928.