Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASV: add frame ops benchmarks for varying n_rows/n_columns ratios #39848

Merged

Conversation

jorisvandenbossche
Copy link
Member

Currently, we have a benchmark for frame/frame ops with a wide dataframe (originating from previous (long) PR / discussion about this, xref #32779)

To get a better idea of the trade-offs for future changes to the ops code (eg ArrayManager related), I think it is useful to have this benchmark for a varying rows/columns ratio (it is mostly this ratio that determines how much overhead we have when performing ops column-wise).

So I changed the benchmark to be parametrized over 4 different ratios, from a long to a wide dataframe, and with each case having the same number of total elements in the dataframe.

The shape in the original benchmark was (500, 2000), which is somewhere in between two new cases of (10_000, 1000) and (1000, 10_000), so that should preserve the original intent of the benchmark (the last case of (1000, 10_000) even has a lower row/column ratio, so should even be a "worse" case for the column-wise ops perspective).

cc @jbrockmendel

@jorisvandenbossche jorisvandenbossche added the Benchmark Performance (ASV) benchmarks label Feb 16, 2021
@jorisvandenbossche
Copy link
Member Author

A run of those benchmarks:

$ asv dev -b FrameWithFrameWide
· Discovering benchmarks
· Running 2 total benchmarks (1 commits * 1 environments * 2 benchmarks)
[  0.00%] ·· Benchmarking existing-py_home_joris_miniconda3_envs_dev_bin_python
[ 25.00%] ··· arithmetic.FrameWithFrameWide.time_op_different_blocks          
[ 25.00%] ··· ============================== =============== =============== =============== ===============
              --                                                          shape                             
              ------------------------------ ---------------------------------------------------------------
                            op                (1000000, 10)   (100000, 100)   (10000, 1000)   (1000, 10000) 
              ============================== =============== =============== =============== ===============
                 <built-in function add>         19.1±0ms        14.9±0ms        14.1±0ms        14.3±0ms   
               <built-in function floordiv>      435±0ms         475±0ms         418±0ms         375±0ms    
                  <built-in function gt>         12.1±0ms        11.8±0ms        12.6±0ms        11.8±0ms   
              ============================== =============== =============== =============== ===============

[ 50.00%] ··· arithmetic.FrameWithFrameWide.time_op_same_blocks               
[ 50.00%] ··· ============================== =============== =============== =============== ===============
              --                                                          shape                             
              ------------------------------ ---------------------------------------------------------------
                            op                (1000000, 10)   (100000, 100)   (10000, 1000)   (1000, 10000) 
              ============================== =============== =============== =============== ===============
                 <built-in function add>         10.6±0ms        11.7±0ms        10.5±0ms        12.3±0ms   
               <built-in function floordiv>      233±0ms         229±0ms         258±0ms         230±0ms    
                  <built-in function gt>         5.64±0ms        6.29±0ms        5.51±0ms        7.50±0ms   
              ============================== =============== =============== =============== ===============

So the floordiv one is quite slow. I could maybe reduce the number of rows with a factor of 10 in that case.

@jreback jreback added this to the 1.3 milestone Feb 16, 2021
@jreback
Copy link
Contributor

jreback commented Feb 16, 2021

So the floordiv one is quite slow. I could maybe reduce the number of rows with a factor of 10 in that case.
sure

n_rows, n_cols = shape

if op is operator.floordiv:
# floordiv is much slower as the other operations -> use less data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"as" -> "than"

@jbrockmendel
Copy link
Member

nitpick, otherwise LGTM

@jorisvandenbossche jorisvandenbossche merged commit 0176228 into pandas-dev:master Feb 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmark Performance (ASV) benchmarks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants