New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Discrepancy in groupby methods #19165

Open
mroeschke opened this Issue Jan 10, 2018 · 1 comment

Comments

Projects
None yet
3 participants
@mroeschke
Member

mroeschke commented Jan 10, 2018

xref #8426 and comment

issue filter for groupby & perf

Some groupby methods (notably describe, mad, pct_change) are not as performant as others. Many of the less performant methods are pre-generated in a _common_apply_whitelist in pandas/core/groupby.py, so it may be worthwhile to revisit this implementation.

asv dev -b ^groupby.GroupByMethods
· Discovering benchmarks
· Running 1 total benchmarks (1 commits * 1 environments * 1 benchmarks)
[  0.00%] ·· Building for existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  0.00%] ·· Benchmarking existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[100.00%] ··· Running groupby.GroupByMethods.time_method                     ok
[100.00%] ···· 
               ======= ============== ========
                dtype      method             
               ------- -------------- --------
                 int        all        256ms  
                 int        any        255ms  
                 int       count       925μs  
                 int      cumcount     1.15ms 
                 int       cummax      1.13ms 
                 int       cummin      1.15ms 
                 int      cumprod      1.58ms 
                 int       cumsum      1.16ms 
                 int      describe     3.25s  
                 int       first       1.12ms 
                 int        head       1.37ms 
                 int        last       1.12ms 
                 int        mad        1.42s  
                 int        max        1.12ms 
                 int        min        1.16ms 
                 int       median      1.53ms 
                 int        mean       1.43ms 
                 int      nunique      1.40ms 
                 int     pct_change    1.56s  
                 int        prod       1.53ms 
                 int        rank       380ms  
                 int        sem        414ms  
                 int       shift       974μs  
                 int        size       858μs  
                 int        skew       414ms  
                 int        std        1.46ms 
                 int        sum        1.50ms 
                 int        tail       1.45ms 
                 int       unique      289ms  
                 int    value_counts   2.35ms 
                 int        var        1.34ms 
                float       all        402ms  
                float       any        406ms  
                float      count       1.18ms 
                float     cumcount     1.33ms 
                float      cummax      1.40ms 
                float      cummin      1.40ms 
                float     cumprod      1.75ms 
                float      cumsum      1.40ms 
                float     describe     5.02s  
                float      first       1.37ms 
                float       head       1.58ms 
                float       last       1.36ms 
                float       mad        2.01s  
                float       max        1.38ms 
                float       min        1.37ms 
                float      median      1.80ms 
                float       mean       1.79ms 
                float     nunique      1.60ms 
                float    pct_change    2.17s  
                float       prod       1.75ms 
                float       rank       623ms  
                float       sem        416ms  
                float      shift       1.18ms 
                float       size       1.09ms 
                float       skew       646ms  
                float       std        1.51ms 
                float       sum        1.77ms 
                float       tail       1.63ms 
                float      unique      457ms  
                float   value_counts   2.63ms 
                float       var        1.43ms 
               ======= ============== ========
@WillAyd

This comment has been minimized.

Member

WillAyd commented Feb 12, 2018

You should be able to check rank off of this list with that change being closed. I'm going to take a look at fillna next

@WillAyd WillAyd referenced this issue Feb 26, 2018

Merged

Cythonized GroupBy pct_change #19919

3 of 4 tasks complete

@WillAyd WillAyd referenced this issue Mar 7, 2018

Open

Cythonized GroupBy mad #20024

3 of 4 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment