perf(python): Avoid dispatching `Series.head/tail` to the expression engine #12946

mcrumiller · 2023-12-07T17:09:51Z

Resolves #12928.

py-polars/src/series/mod.rs

stinodego

I wonder why dispatching to DataFrame is so much slower? 🤔

mcrumiller · 2023-12-07T17:55:19Z

I wonder why dispatching to DataFrame is so much slower? 🤔

From my research it's due to select taking up all of the time. If we want to solve this issue for other cases, it might be worth having a behind-the-scenes fast path for single-column selection by name into an ldf.

stinodego

Right, we'll have to put some thought into the dispatching. Especially when it comes to often-used/cheap methods.

CaselIT · 2023-12-07T18:14:11Z

I wonder why dispatching to DataFrame is so much slower? 🤔

Doing the more naive version seems to be very fast

go("plain_series.head(0)")
go("plain_series.head(100)")
go("plain_series.tail(0)")
go("plain_series.tail(100)")
go("plain_series.to_frame().head(0).to_series()")
go("plain_series.to_frame().head(100).to_series()")
go("plain_series.to_frame().tail(0).to_series()")
go("plain_series.to_frame().tail(100).to_series()")

plain_series.head(0) 4.969168399926275
plain_series.head(100) 5.323634900152683
plain_series.tail(0) 5.622998500009999
plain_series.tail(100) 5.60176460002549
plain_series.to_frame().head(0).to_series() 0.14708300004713237
plain_series.to_frame().head(100).to_series() 0.15118449996225536
plain_series.to_frame().tail(0).to_series() 0.1474038001615554
plain_series.to_frame().tail(100).to_series() 0.14568820013664663

using that on a df of one col as reference

df_one_col = plain_series.to_frame()
go("df_one_col.head(0)")
go("df_one_col.head(100)")
go("df_one_col.tail(0)")
go("df_one_col.tail(100)")

df_one_col.head(0) 0.04367200005799532
df_one_col.head(100) 0.04254429996944964
df_one_col.tail(0) 0.041441099951043725
df_one_col.tail(100) 0.041460099862888455

3x slower while not great is much better than the current 60+ times.

nameexhaustion · 2023-12-07T22:45:16Z

head and tail both use slice underneath, so it may be good to also fix slice as well.

I wonder why dispatching to DataFrame is so much slower? 🤔

From my research it's due to select taking up all of the time.

an Expr.slice runs through SliceExpr, which allocates a Vec, then creates rayon tasks to evaluate input: Expr, offset: Expr and length: Expr - afterwards the results of those are then also allocated into Series:

polars/crates/polars-lazy/src/physical_plan/expressions/slice.rs

Lines 82 to 93 in 16c5dbe

    
           let results = POOL.install(|| { 
        
               [&self.offset, &self.length, &self.input] 
        
                   .par_iter() 
        
                   .map(|e| e.evaluate(df, state)) 
        
                   .collect::<PolarsResult<Vec<_>>>() 
        
           })?; 
        
           let offset = &results[0]; 
        
           let length = &results[1]; 
        
           let series = &results[2]; 
        
           let (offset, length) = extract_args(offset, length, &self.expr)?; 
        
           Ok(series.slice(offset, length))

on the other hand, a DataFrame.slice simply slices the columns according to offset: i64, length: usize, hence being much faster:

polars/crates/polars-core/src/frame/mod.rs

Lines 2263 to 2273 in 16c5dbe

    
           pub fn slice(&self, offset: i64, length: usize) -> Self { 
        
               if offset == 0 && length == self.height() { 
        
                   return self.clone(); 
        
               } 
        
               let col = self 
        
                   .columns 
        
                   .iter() 
        
                   .map(|s| s.slice(offset, length)) 
        
                   .collect::<Vec<_>>(); 
        
               DataFrame::new_no_checks(col) 
        
           }

But currently this means that doing (1) is much slower than doing (2) (for literal offset and length args) in the current polars API:

(1) .select(col(A).slice(..))
(2) .select(A).slice(..)

*edit: LazyFrame construction and query plan optimization may also be contributors to the extra overhead

mcrumiller · 2023-12-07T22:52:13Z

I've added direct dispatch to a lot more Series functions, will update momentarily.

stinodego · 2023-12-07T22:55:04Z

I've added direct dispatch to a lot more Series functions, will update momentarily.

That's a bit premature. I'd like to discuss this with the team first.

mcrumiller · 2023-12-07T22:57:41Z

Hi @stinodego, no problem. I had assumed that any argument for direct dispatch to head/tail could also be applied to any arbitrary function, but I can see why we wouldn't want to add code bloat for minor performance improvements. For the record, here is what I have in my commit (but have not pushed):

head
tail
any
all
product
nan_max
nan_min
std
var
cut
qcut
value_counts
entropy
is_between
slice

mcrumiller · 2023-12-07T23:24:19Z

I'll push for now so it's easier to gauge the impact, we can always revert anything we need.

changes

ritchie46 · 2023-12-11T08:36:08Z

@mcrumiller can you revert this PR to the head and tail then we can merge this? You can open an issue for the other ones.

mcrumiller · 2023-12-11T13:29:02Z

@ritchie46 done. I'll make a separate issue to discuss specialized series impls for some of the other functions.

Expose head/tail directly

4dccc96

mcrumiller requested review from ritchie46, stinodego, alexander-beedie and MarcoGorelli as code owners December 7, 2023 17:09

github-actions bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars labels Dec 7, 2023

stinodego reviewed Dec 7, 2023

View reviewed changes

py-polars/src/series/mod.rs Outdated Show resolved Hide resolved

stinodego requested changes Dec 7, 2023

View reviewed changes

Remove Option

179b1e5

stinodego previously approved these changes Dec 7, 2023

View reviewed changes

stinodego changed the title ~~fix(rust,python): expose head/tail directly to Series~~ perf(python): Avoid dispatching Series.head/tail to the expression engine Dec 8, 2023

stinodego removed the rust Related to Rust Polars label Dec 8, 2023

github-actions bot added the performance Performance issues or improvements label Dec 8, 2023

stinodego removed the fix Bug fix label Dec 8, 2023

mcrumiller force-pushed the faster-head-tail branch from b1c8852 to 179b1e5 Compare December 11, 2023 13:28

stinodego approved these changes Dec 11, 2023

View reviewed changes

stinodego merged commit 1d0e077 into pola-rs:main Dec 11, 2023
36 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(python): Avoid dispatching `Series.head/tail` to the expression engine #12946

perf(python): Avoid dispatching `Series.head/tail` to the expression engine #12946

mcrumiller commented Dec 7, 2023

stinodego left a comment

mcrumiller commented Dec 7, 2023

stinodego left a comment

CaselIT commented Dec 7, 2023 •

edited

Loading

nameexhaustion commented Dec 7, 2023 •

edited

Loading

mcrumiller commented Dec 7, 2023

stinodego commented Dec 7, 2023

mcrumiller commented Dec 7, 2023

mcrumiller commented Dec 7, 2023

ritchie46 commented Dec 11, 2023

mcrumiller commented Dec 11, 2023

perf(python): Avoid dispatching Series.head/tail to the expression engine #12946

perf(python): Avoid dispatching Series.head/tail to the expression engine #12946

Conversation

mcrumiller commented Dec 7, 2023

stinodego left a comment

Choose a reason for hiding this comment

mcrumiller commented Dec 7, 2023

stinodego left a comment

Choose a reason for hiding this comment

CaselIT commented Dec 7, 2023 • edited Loading

nameexhaustion commented Dec 7, 2023 • edited Loading

mcrumiller commented Dec 7, 2023

stinodego commented Dec 7, 2023

mcrumiller commented Dec 7, 2023

mcrumiller commented Dec 7, 2023

ritchie46 commented Dec 11, 2023

mcrumiller commented Dec 11, 2023

perf(python): Avoid dispatching `Series.head/tail` to the expression engine #12946

perf(python): Avoid dispatching `Series.head/tail` to the expression engine #12946

CaselIT commented Dec 7, 2023 •

edited

Loading

nameexhaustion commented Dec 7, 2023 •

edited

Loading