Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Speed up process_crosslinks(...) and get_crosslink_deltas(...) by 10x - 15x in state_sim #314
Successor PR to #313 (issues with rebasing conflicts).
This switches inner/outer loop nesting order to get 10-15x function speedup for 128 and 512 validator cases by avoiding accidentally quadratic behavior, while keeping function signature unchanged and allowing easy ongoing verification of correctness of optimization.
The other big epoch time-consumer is
Timings I get with 512 validators:
Carrying over a still-relevant comment, https://github.com/status-im/nim-confutils/blame/2f9598611598c2351458635865e92bc408170037/confutils.nim#L272-L276 explains the CI failures.
mratsim left a comment
So I suppose the main gain was due to lots of seq allocations in inner loops?
There are a couple of GC profiling information available according to https://github.com/nim-lang/Nim/blob/devel/lib/system/gc.nim:
It would be interesting to understand how we were stressing th GC before and after this change.
There are some distinct cases:
d8f63d2 I don't think is mostly working by memory allocations per se (though, it does help). Rather, there were lots of
61f388f is similar:
wasn't matching most attestations most times, but it was scanning them, and it was O(n^2) overall, because number of attestations is proportional to number of crosslinks. So this turns that into O(n) by avoiding repetitious scans.
776833e actually adds memory allocation/churn. I put it there because the spec does and if it turns out to be problematic or unnecessary, it's an obvious target for removal. In particular, I don't see how it could ever be doing much useful -- it's basically deduplicating a list, but that list is internally generated by us, in a way that if there are duplicates, is already a bug. Because it's a set, order doesn't matter for the spec, so that shouldn't be a problem.
I think 4590b69 is, yes, to your point, substantially operating by reducing memory churn. It's not especially accidentally-quadratic, just, slow, because of the memory issues.
So it's nuanced, and what you point out is part of it, and would indeed be interesting to check, but isn't the whole explanation.