Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up process_crosslinks(...) and get_crosslink_deltas(...) by 10x - 15x in state_sim #314

Merged
merged 5 commits into from Jul 8, 2019

Conversation

@tersec
Copy link
Contributor

commented Jul 8, 2019

Successor PR to #313 (issues with rebasing conflicts).

This switches inner/outer loop nesting order to get 10-15x function speedup for 128 and 512 validator cases by avoiding accidentally quadratic behavior, while keeping function signature unchanged and allowing easy ongoing verification of correctness of optimization.

This takes process_crosslinks(...) processing times down in my testing from 15-16s or so to 1s and from 22-24s to 1.5s. get_crosslink_deltas(...) sees similar speedups, since it was also bottlenecked on get_winning_crosslink_and_attesting_indices(...).

The other big epoch time-consumer is get_attestation_deltas(...), which 4590b69 decreases by 90x or so for me for 512 validators, from 178s to 2s.

Timings I get with 512 validators:

All time are ms
     Average,       StdDev,          Min,          Max,      Samples,         Test
     197.915,       46.600,      118.005,      309.630,          128, Process non-epoch slot with block
    3224.563,     2258.988,     1627.217,     4821.908,            2, Process epoch slot with block
       1.997,        1.181,        0.027,        4.247,          130, Tree-hash block
       8.371,        0.291,        7.516,        9.636,          130, Retrieve committee once using get_crosslink_committee
      64.856,       18.607,       31.645,      113.677,         8320, Combine committee attestations

tersec added some commits Jul 5, 2019

remove some low-hanging perf silliness from get_winning_crosslink_and…
…_attesting_indices(...) and switch inner/outer loop nesting order to get 10-15x function speedup for 128 and 512 validator cases by avoiding accidentally quadratic behavior, while keeping function signature unchanged and allowing easy ongoing verification of correctness of optimization
@tersec

This comment has been minimized.

Copy link
Contributor Author

commented Jul 8, 2019

move non-spec ValidatorSetDeltaFlags from datatypes.nim to sync_proto…
…col; fix mainnet MIN_ATTESTATION_INCLUSION_DELAY preset; update get_attestation_deltas(...) to 0.8.0; for 512 validators, get 90x speedup for get_attestation_deltas(...) from 179s to 2s
@mratsim

mratsim approved these changes Jul 8, 2019

Copy link
Member

left a comment

So I suppose the main gain was due to lots of seq allocations in inner loops?

There are a couple of GC profiling information available according to https://github.com/nim-lang/Nim/blob/devel/lib/system/gc.nim:

  • -d:memProfiler
  • GC_getStatistics()

It would be interesting to understand how we were stressing th GC before and after this change.

filterIt(attestations, it.data.crosslink == candidate_crosslink),
stateCache)
when false:
let crosslink_balance_uncached =

This comment has been minimized.

Copy link
@mratsim

mratsim Jul 8, 2019

Member

We should have a comment on the purpose of this when false, so we known that we can remove it if it's accomplished.

This comment has been minimized.

Copy link
@tersec

tersec Jul 8, 2019

Author Contributor

Done in b8618f1

@tersec

This comment has been minimized.

Copy link
Contributor Author

commented Jul 8, 2019

So I suppose the main gain was due to lots of seq allocations in inner loops?

There are a couple of GC profiling information available according to https://github.com/nim-lang/Nim/blob/devel/lib/system/gc.nim:

* `-d:memProfiler`

* `GC_getStatistics()`

It would be interesting to understand how we were stressing th GC before and after this change.

There are some distinct cases:

d8f63d2 I don't think is mostly working by memory allocations per se (though, it does help). Rather, there were lots of filterIt scans with very sparse results in an inner loop.

61f388f is similar:

        get_attesting_balance(
          state,
          filterIt(attestations, it.data.crosslink == candidate_crosslink),
          stateCache)

wasn't matching most attestations most times, but it was scanning them, and it was O(n^2) overall, because number of attestations is proportional to number of crosslinks. So this turns that into O(n) by avoiding repetitious scans.

The seq to openarray change in that commit is nice, but perf-wise negligible.

776833e actually adds memory allocation/churn. I put it there because the spec does and if it turns out to be problematic or unnecessary, it's an obvious target for removal. In particular, I don't see how it could ever be doing much useful -- it's basically deduplicating a list, but that list is internally generated by us, in a way that if there are duplicates, is already a bug. Because it's a set, order doesn't matter for the spec, so that shouldn't be a problem.

I think 4590b69 is, yes, to your point, substantially operating by reducing memory churn. It's not especially accidentally-quadratic, just, slow, because of the memory issues.

So it's nuanced, and what you point out is part of it, and would indeed be interesting to check, but isn't the whole explanation.

@tersec tersec merged commit 0a2d09e into master Jul 8, 2019

0 of 4 checks passed

continuous-integration/travis-ci/pr The Travis CI build failed
Details
continuous-integration/travis-ci/push The Travis CI build failed
Details
continuous-integration/appveyor/branch Waiting for AppVeyor build to complete
Details
continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details

@delete-merged-branch delete-merged-branch bot deleted the pas branch Jul 8, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.