Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve agg_list performance of chunked numerical data #3351

Merged
merged 1 commit into from
May 10, 2022

Conversation

ritchie46
Copy link
Member

Delaying a full rechunk can improve performance of many query, but some are slower.

This improves the following query by ~3x.

fn main() -> Result<()> {
    let q = LazyCsvReader::new("/home/ritchie46/code/db-benchmark/data/G1_1e7_1e2_0_0.csv".into())
        .with_rechunk(false)
        .finish()?;

    let out = q
        .groupby([col("id2"), col("id4")])
        .agg([(pearson_corr(col("v1"), col("v2")).pow(2.0)).alias("r2")])
        .collect()?;
    Ok(())
}

before

          6.685,79 msec task-clock                #    2,124 CPUs utilized          
             4.747      context-switches          #  710,013 /sec                   
               218      cpu-migrations            #   32,606 /sec                   
           173.797      page-faults               #   25,995 K/sec                  
    21.435.123.551      cycles                    #    3,206 GHz                    
    38.153.190.822      instructions              #    1,78  insn per cycle         
     5.010.104.768      branches                  #  749,366 M/sec                  
        24.141.092      branch-misses             #    0,48% of all branches        

       3,148083894 seconds time elapsed

       6,231878000 seconds user
       0,469579000 seconds sys

after

          5.024,99 msec task-clock                #    4,879 CPUs utilized          
             4.305      context-switches          #  856,718 /sec                   
               118      cpu-migrations            #   23,483 /sec                   
           212.864      page-faults               #   42,361 K/sec                  
    14.441.547.675      cycles                    #    2,874 GHz                    
    21.711.917.241      instructions              #    1,50  insn per cycle         
     4.031.158.541      branches                  #  802,222 M/sec                  
        21.774.943      branch-misses             #    0,54% of all branches        

       1,029876114 seconds time elapsed

       4,531616000 seconds user
       0,502194000 seconds sys

@github-actions github-actions bot added the rust Related to Rust Polars label May 10, 2022
@codecov-commenter
Copy link

codecov-commenter commented May 10, 2022

Codecov Report

Merging #3351 (d4a7cf3) into master (89ebc59) will decrease coverage by 1.44%.
The diff coverage is 88.39%.

@@            Coverage Diff             @@
##           master    #3351      +/-   ##
==========================================
- Coverage   77.56%   76.11%   -1.45%     
==========================================
  Files         390      393       +3     
  Lines       67948    68396     +448     
==========================================
- Hits        52703    52060     -643     
- Misses      15245    16336    +1091     
Impacted Files Coverage Δ
polars/polars-arrow/src/bitmap/mutable.rs 100.00% <ø> (+22.22%) ⬆️
polars/polars-core/src/chunked_array/object/mod.rs 58.90% <ø> (ø)
polars/polars-io/src/csv_core/buffer.rs 78.21% <0.00%> (ø)
polars/polars-core/src/frame/row.rs 65.95% <14.28%> (-0.72%) ⬇️
polars/polars-core/src/frame/groupby/mod.rs 83.15% <72.22%> (-0.59%) ⬇️
...lars/polars-core/src/frame/groupby/aggregations.rs 77.38% <98.83%> (+0.39%) ⬆️
polars/polars-time/src/lib.rs 0.00% <0.00%> (-100.00%) ⬇️
polars/polars-utils/src/lib.rs 0.00% <0.00%> (-100.00%) ⬇️
polars/polars-time/src/windows/test.rs 0.00% <0.00%> (-100.00%) ⬇️
polars/polars-time/src/groupby/dynamic.rs 52.38% <0.00%> (-40.37%) ⬇️
... and 32 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4e6fa11...d4a7cf3. Read the comment docs.

@ritchie46 ritchie46 force-pushed the improve_agg_list branch 2 times, most recently from 1c00ffa to d4a7cf3 Compare May 10, 2022 10:26
@ritchie46 ritchie46 merged commit 96cc3d9 into master May 10, 2022
@ritchie46 ritchie46 deleted the improve_agg_list branch May 10, 2022 12:10
moritzwilksch pushed a commit to moritzwilksch/polars that referenced this pull request May 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants