Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed-up add_index #191

Merged
merged 11 commits into from Jul 14, 2023
Merged

Speed-up add_index #191

merged 11 commits into from Jul 14, 2023

Conversation

achoum
Copy link
Collaborator

@achoum achoum commented Jul 11, 2023

  • cc implementation
  • replace python implementation
  • make tests order independent
  • benchmark speed boost
  • Disable schema check when bug mode is not enabled
## Benchmark results

### SUMMARY


- 16x faster when adding one index to two indexes
- 17x faster when adding one index to two indexes (without checks)
- 16x faster when adding one index to no indexes (without checks)
- 1.70x faster when adding 5 indexes to two indexes
- 2.23x faster when adding 5 indexes to two indexes (without checks)

bazel run -c opt //benchmark:benchmark_time -- --functions add_index from_pandas

#### BEFORE


================================================================
Name                              Wall time (s)    CPU time (s)
================================================================
add_index:s:10_000:num_idx:1         0.01005       0.01005
add_index:s:10_000:num_idx:2         0.03271       0.03271
add_index:s:10_000:num_idx:3         0.07672       0.07672
add_index:s:10_000:num_idx:4         0.07143       0.07143
add_index:s:10_000:num_idx:5         0.05716       0.05714
add_index:s:100_000:num_idx:1        0.05413       0.05412
add_index:s:100_000:num_idx:2        0.07727       0.07726
add_index:s:100_000:num_idx:3        0.26668       0.26664
add_index:s:100_000:num_idx:4        0.57867       0.57826
add_index:s:100_000:num_idx:5        0.49905       0.49904
add_index:s:1_000_000:num_idx:1       0.35821       0.35820
add_index:s:1_000_000:num_idx:2       0.41092       0.41077
add_index:s:1_000_000:num_idx:3       0.60086       0.60072
add_index:s:1_000_000:num_idx:4       2.01881       2.01847
add_index:s:1_000_000:num_idx:5       5.99254       5.99067
----------------------------------------------------------------
from_pandas:s:10_000_numidx:0_numidxval:20_idx:int       0.00059       0.00059
from_pandas:s:10_000_numidx:0_numidxval:20_idx:str       0.00059       0.00059
from_pandas:s:10_000_numidx:1_numidxval:20_idx:int       0.00648       0.00645
from_pandas:s:10_000_numidx:1_numidxval:20_idx:str       0.01033       0.01030
from_pandas:s:10_000_numidx:3_numidxval:20_idx:int       0.05071       0.05071
from_pandas:s:10_000_numidx:3_numidxval:20_idx:str       0.05208       0.05205
from_pandas:s:10_000_numidx:5_numidxval:20_idx:int       0.07724       0.07723
from_pandas:s:10_000_numidx:5_numidxval:20_idx:str       0.08343       0.08340
================================================================

#### AFTER WITH SCHEMA CHECK

================================================================
Name                              Wall time (s)    CPU time (s)
================================================================
add_index:s:10_000:num_idx:1         0.00130       0.00130
add_index:s:10_000:num_idx:2         0.01022       0.01022
add_index:s:10_000:num_idx:3         0.03187       0.03187
add_index:s:10_000:num_idx:4         0.03717       0.03717
add_index:s:10_000:num_idx:5         0.03403       0.03403
add_index:s:100_000:num_idx:1        0.00277       0.00277
add_index:s:100_000:num_idx:2        0.01644       0.01644
add_index:s:100_000:num_idx:3        0.11291       0.11290
add_index:s:100_000:num_idx:4        0.36022       0.36019
add_index:s:100_000:num_idx:5        0.38954       0.38951
add_index:s:1_000_000:num_idx:1       0.02115       0.02115
add_index:s:1_000_000:num_idx:2       0.03923       0.03922
add_index:s:1_000_000:num_idx:3       0.15304       0.15303
add_index:s:1_000_000:num_idx:4       1.26816       1.26774
add_index:s:1_000_000:num_idx:5       3.54450       3.54254
----------------------------------------------------------------
from_pandas:s:10_000_numidx:0_numidxval:20_idx:int       0.00031       0.00031
from_pandas:s:10_000_numidx:0_numidxval:20_idx:str       0.00032       0.00032
from_pandas:s:10_000_numidx:1_numidxval:20_idx:int       0.00124       0.00124
from_pandas:s:10_000_numidx:1_numidxval:20_idx:str       0.00168       0.00168
from_pandas:s:10_000_numidx:3_numidxval:20_idx:int       0.02035       0.02034
from_pandas:s:10_000_numidx:3_numidxval:20_idx:str       0.02413       0.02413
from_pandas:s:10_000_numidx:5_numidxval:20_idx:int       0.03696       0.03696
from_pandas:s:10_000_numidx:5_numidxval:20_idx:str       0.05386       0.05386
================================================================


#### AFTER WITHOUT SCHEMA CHECK

================================================================
Name                              Wall time (s)    CPU time (s)
================================================================
add_index:s:10_000:num_idx:1         0.00086       0.00086
add_index:s:10_000:num_idx:2         0.00633       0.00633
add_index:s:10_000:num_idx:3         0.02065       0.02065
add_index:s:10_000:num_idx:4         0.02608       0.02608
add_index:s:10_000:num_idx:5         0.02557       0.02557
add_index:s:100_000:num_idx:1        0.00224       0.00224
add_index:s:100_000:num_idx:2        0.01146       0.01146
add_index:s:100_000:num_idx:3        0.08176       0.08175
add_index:s:100_000:num_idx:4        0.27585       0.27579
add_index:s:100_000:num_idx:5        0.31205       0.31202
add_index:s:1_000_000:num_idx:1       0.02053       0.02053
add_index:s:1_000_000:num_idx:2       0.03370       0.03370
add_index:s:1_000_000:num_idx:3       0.12094       0.12086
add_index:s:1_000_000:num_idx:4       0.91366       0.91347
add_index:s:1_000_000:num_idx:5       2.68351       2.68309
----------------------------------------------------------------
from_pandas:s:10_000_numidx:0_numidxval:20_idx:int       0.00030       0.00030
from_pandas:s:10_000_numidx:0_numidxval:20_idx:str       0.00031       0.00031
from_pandas:s:10_000_numidx:1_numidxval:20_idx:int       0.00060       0.00060
from_pandas:s:10_000_numidx:1_numidxval:20_idx:str       0.00160       0.00160
from_pandas:s:10_000_numidx:3_numidxval:20_idx:int       0.01288       0.01288
from_pandas:s:10_000_numidx:3_numidxval:20_idx:str       0.01710       0.01710
from_pandas:s:10_000_numidx:5_numidxval:20_idx:int       0.02424       0.02424
from_pandas:s:10_000_numidx:5_numidxval:20_idx:str       0.03734       0.03733
================================================================

@achoum achoum changed the title Implements add_index in cc Speed-up add_index (up to 17x) Jul 13, 2023
@achoum achoum changed the title Speed-up add_index (up to 17x) Speed-up add_index Jul 13, 2023
@achoum achoum marked this pull request as ready for review July 13, 2023 11:29
@achoum achoum requested a review from ianspektor July 13, 2023 11:29
Copy link
Collaborator

@DonBraulio DonBraulio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!
Feel free to ignore the comment about the ambiguity of index for this PR, it's just something to think about in the future.

@achoum
Copy link
Collaborator Author

achoum commented Jul 14, 2023

Thanks!

@achoum achoum merged commit 991cc04 into main Jul 14, 2023
12 checks passed
@achoum achoum deleted the gbm_add_index_cc branch July 14, 2023 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants