Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

~2x state_sim speedup via additional caching in get_crosslink_committee #316

Merged
merged 1 commit into from Jul 10, 2019

Conversation

@tersec
Copy link
Contributor

commented Jul 10, 2019

Not especially pretty, but not particularly 'contagious', either, in inter-function/module/etc coupling terms, and not especially complicated, marginally risky, having too much tech debt, flexibility-reducing, or assumption-reliant.

Some benchmarks -- all numbers relative to each other in terms of keeping an overall condition/context:

To start with, the existing status quo ante, both with and without BLS validation (which adds a roughly constant additional ~4 minutes for the overall state_sim parameters of 130 slots and 576 validators I was using):

Validators: 576, epoch length: 64                                                 
Validators per attestation (mean): 9.0                               
All time are ms                                                                                        
     Average,       StdDev,          Min,          Max,      Samples,         Test                 
     212.734,       52.805,      120.234,      326.187,          128, Process non-epoch slot with block
    3692.997,     2614.389,     1844.345,     5541.649,            2, Process epoch slot with block                        
       2.091,        1.240,        0.028,        4.271,          130, Tree-hash block               
       9.825,        0.569,        9.004,       13.862,          130, Retrieve committee once using get_crosslink_committee
      82.505,       24.850,       38.370,      134.047,         8320, Combine committee attestations
                 
real    12m26.008s
user    12m25.776s

...

Validators: 576, epoch length: 64                 
Validators per attestation (mean): 9.0
All time are ms                         
     Average,       StdDev,          Min,          Max,      Samples,         Test
Validation is turned off meaning that no BLS operations are performed
     195.724,       49.826,      105.906,      304.850,          128, Process non-epoch slot with block
    3605.308,     2534.054,     1813.462,     5397.155,            2, Process epoch slot with block
       0.545,        0.311,        0.025,        1.116,          130, Tree-hash block
       9.417,        0.547,        8.457,       12.856,          130, Retrieve committee once using get_crosslink_committee
      52.380,       24.107,       10.940,       97.878,         8320, Combine committee attestations

real    8m7.282s
user    8m7.015s

I added two cachings, and I wanted to make sure that both were incrementally worthwhile, and one didn't subsume the other, so, with only start_shard_cache:

Validators: 576, epoch length: 64                                    
Validators per attestation (mean): 9.0                                                                 
All time are ms                                                                                    
     Average,       StdDev,          Min,          Max,      Samples,         Test   
Validation is turned off meaning that no BLS operations are performed                                                      
     187.707,       42.837,      108.250,      279.735,          128, Process non-epoch slot with block
    1862.948,     1124.842,     1067.565,     2658.332,            2, Process epoch slot with block
       0.554,        0.311,        0.024,        1.102,          130, Tree-hash block
       5.986,        0.485,        5.125,        9.360,          130, Retrieve committee once using get_crosslink_committee
      37.313,       15.111,       11.191,       65.102,         8320, Combine committee attestations

real    5m56.844s
user    5m56.675s

...

Validators: 576, epoch length: 64
Validators per attestation (mean): 9.0
All time are ms
     Average,       StdDev,          Min,          Max,      Samples,         Test
     199.672,       45.361,      118.130,      299.778,          128, Process non-epoch slot with block
    1895.437,     1131.078,     1095.644,     2695.230,            2, Process epoch slot with block
       2.067,        1.223,        0.026,        4.308,          130, Tree-hash block
       6.089,        0.522,        5.269,        9.621,          130, Retrieve committee once using get_crosslink_committee
      65.275,       14.982,       38.325,       98.222,         8320, Combine committee attestations

real    9m56.982s
user    9m56.807s

Of the two individual caches, this is the better of the two, but will prove to benefit from the other, the committee_count_cache (here, shown alone; n=1, disclaimer):

Validators: 576, epoch length: 64
Validators per attestation (mean): 9.0
All time are ms
     Average,       StdDev,          Min,          Max,      Samples,         Test
Validation is turned off meaning that no BLS operations are performed
     191.776,       48.009,       99.682,      290.793,          128, Process non-epoch slot with block
    2939.635,     2091.379,     1460.807,     4418.463,            2, Process epoch slot with block
       0.549,        0.311,        0.024,        1.085,          130, Tree-hash block
       7.671,        0.416,        6.728,       10.118,          130, Retrieve committee once using get_crosslink_committee
      44.939,       19.696,       10.377,       82.976,         8320, Combine committee attestations

real    7m3.012s
user    7m2.847s

...

Validators: 576, epoch length: 64
Validators per attestation (mean): 9.0
All time are ms
     Average,       StdDev,          Min,          Max,      Samples,         Test
     200.351,       49.339,      114.493,      310.808,          128, Process non-epoch slot with block
    2966.934,     2169.549,     1432.831,     4501.037,            2, Process epoch slot with block
       2.017,        1.202,        0.026,        4.155,          130, Tree-hash block
       7.779,        0.444,        7.217,       10.954,          130, Retrieve committee once using get_crosslink_committee
      71.622,       19.366,       36.827,      116.399,         8320, Combine committee attestations

real    10m51.754s
user    10m51.320s

If one had to choose between committee_count_cache and start_shard_cache, the latter would be preferable. But, together they're worthwhile combined:

Validators: 576, epoch length: 64
Validators per attestation (mean): 9.0
All time are ms
     Average,       StdDev,          Min,          Max,      Samples,         Test
     182.741,       41.105,      110.444,      272.661,          128, Process non-epoch slot with block
    1137.930,      671.281,      663.263,     1612.597,            2, Process epoch slot with block
       1.955,        1.160,        0.025,        3.953,          130, Tree-hash block
       4.057,        0.431,        3.591,        7.056,          130, Retrieve committee once using get_crosslink_committee
      54.137,        9.776,       35.968,       75.955,         8320, Combine committee attestations

real    8m19.245s
user    8m18.907s

...

Validators: 576, epoch length: 64
Validators per attestation (mean): 9.0
All time are ms
     Average,       StdDev,          Min,          Max,      Samples,         Test
Validation is turned off meaning that no BLS operations are performed
     170.573,       40.397,      100.050,      255.251,          128, Process non-epoch slot with block
    1163.040,      678.859,      683.014,     1643.066,            2, Process epoch slot with block
       0.520,        0.295,        0.026,        1.141,          130, Tree-hash block
       3.905,        0.440,        3.455,        6.908,          130, Retrieve committee once using get_crosslink_committee
      27.464,        9.701,       10.441,       50.249,         8320, Combine committee attestations

real    4m30.645s
user    4m30.164s

So it goes from 8 minutes to 4:30 for 576 validators, with max slot time, even for epoch slots, of 1.6 seconds on a typical 15W/25W TDP laptop, with both new caches.

@tersec tersec requested review from arnetheduck and mratsim Jul 10, 2019

@tersec tersec merged commit 7bc2b81 into master Jul 10, 2019

2 of 4 checks passed

continuous-integration/appveyor/branch Waiting for AppVeyor build to complete
Details
continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@delete-merged-branch delete-merged-branch bot deleted the lla branch Jul 10, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.