Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(python): Avoid dispatching Series.head/tail to the expression engine #12946

Merged
merged 2 commits into from
Dec 11, 2023

Conversation

mcrumiller
Copy link
Contributor

Resolves #12928.

@github-actions github-actions bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars labels Dec 7, 2023
Copy link
Member

@stinodego stinodego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why dispatching to DataFrame is so much slower? 🤔

@mcrumiller
Copy link
Contributor Author

I wonder why dispatching to DataFrame is so much slower? 🤔

From my research it's due to select taking up all of the time. If we want to solve this issue for other cases, it might be worth having a behind-the-scenes fast path for single-column selection by name into an ldf.

stinodego
stinodego previously approved these changes Dec 7, 2023
Copy link
Member

@stinodego stinodego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we'll have to put some thought into the dispatching. Especially when it comes to often-used/cheap methods.

@CaselIT
Copy link
Contributor

CaselIT commented Dec 7, 2023

I wonder why dispatching to DataFrame is so much slower? 🤔

Doing the more naive version seems to be very fast

go("plain_series.head(0)")
go("plain_series.head(100)")
go("plain_series.tail(0)")
go("plain_series.tail(100)")
go("plain_series.to_frame().head(0).to_series()")
go("plain_series.to_frame().head(100).to_series()")
go("plain_series.to_frame().tail(0).to_series()")
go("plain_series.to_frame().tail(100).to_series()")
plain_series.head(0) 4.969168399926275
plain_series.head(100) 5.323634900152683
plain_series.tail(0) 5.622998500009999
plain_series.tail(100) 5.60176460002549
plain_series.to_frame().head(0).to_series() 0.14708300004713237
plain_series.to_frame().head(100).to_series() 0.15118449996225536
plain_series.to_frame().tail(0).to_series() 0.1474038001615554
plain_series.to_frame().tail(100).to_series() 0.14568820013664663

using that on a df of one col as reference

df_one_col = plain_series.to_frame()
go("df_one_col.head(0)")
go("df_one_col.head(100)")
go("df_one_col.tail(0)")
go("df_one_col.tail(100)")
df_one_col.head(0) 0.04367200005799532
df_one_col.head(100) 0.04254429996944964
df_one_col.tail(0) 0.041441099951043725
df_one_col.tail(100) 0.041460099862888455

3x slower while not great is much better than the current 60+ times.

@nameexhaustion
Copy link
Collaborator

nameexhaustion commented Dec 7, 2023

head and tail both use slice underneath, so it may be good to also fix slice as well.

I wonder why dispatching to DataFrame is so much slower? 🤔

From my research it's due to select taking up all of the time.

an Expr.slice runs through SliceExpr, which allocates a Vec, then creates rayon tasks to evaluate input: Expr, offset: Expr and length: Expr - afterwards the results of those are then also allocated into Series:

let results = POOL.install(|| {
[&self.offset, &self.length, &self.input]
.par_iter()
.map(|e| e.evaluate(df, state))
.collect::<PolarsResult<Vec<_>>>()
})?;
let offset = &results[0];
let length = &results[1];
let series = &results[2];
let (offset, length) = extract_args(offset, length, &self.expr)?;
Ok(series.slice(offset, length))

on the other hand, a DataFrame.slice simply slices the columns according to offset: i64, length: usize, hence being much faster:

pub fn slice(&self, offset: i64, length: usize) -> Self {
if offset == 0 && length == self.height() {
return self.clone();
}
let col = self
.columns
.iter()
.map(|s| s.slice(offset, length))
.collect::<Vec<_>>();
DataFrame::new_no_checks(col)
}

But currently this means that doing (1) is much slower than doing (2) (for literal offset and length args) in the current polars API:

  • (1) .select(col(A).slice(..))
  • (2) .select(A).slice(..)

*edit: LazyFrame construction and query plan optimization may also be contributors to the extra overhead

@mcrumiller
Copy link
Contributor Author

I've added direct dispatch to a lot more Series functions, will update momentarily.

@stinodego
Copy link
Member

I've added direct dispatch to a lot more Series functions, will update momentarily.

That's a bit premature. I'd like to discuss this with the team first.

@mcrumiller
Copy link
Contributor Author

Hi @stinodego, no problem. I had assumed that any argument for direct dispatch to head/tail could also be applied to any arbitrary function, but I can see why we wouldn't want to add code bloat for minor performance improvements. For the record, here is what I have in my commit (but have not pushed):

  • head
  • tail
  • any
  • all
  • product
  • nan_max
  • nan_min
  • std
  • var
  • cut
  • qcut
  • value_counts
  • entropy
  • is_between
  • slice

@mcrumiller
Copy link
Contributor Author

I'll push for now so it's easier to gauge the impact, we can always revert anything we need.

@stinodego stinodego changed the title fix(rust,python): expose head/tail directly to Series perf(python): Avoid dispatching Series.head/tail to the expression engine Dec 8, 2023
@stinodego stinodego removed the rust Related to Rust Polars label Dec 8, 2023
@github-actions github-actions bot added the performance Performance issues or improvements label Dec 8, 2023
@stinodego stinodego removed the fix Bug fix label Dec 8, 2023
@ritchie46
Copy link
Member

@mcrumiller can you revert this PR to the head and tail then we can merge this? You can open an issue for the other ones.

@mcrumiller
Copy link
Contributor Author

@ritchie46 done. I'll make a separate issue to discuss specialized series impls for some of the other functions.

@stinodego stinodego merged commit 1d0e077 into pola-rs:main Dec 11, 2023
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance issues or improvements python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Series head / tail is about 60+ times slower than DataFrame ones
5 participants