Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sha256: use freelist to avoid allocations #14

Closed
wants to merge 1 commit into from

Conversation

jsign
Copy link

@jsign jsign commented Jun 14, 2020

Running the benchmarks and inspecting profiles, it seems that most allocations were related to sha256 calculations to building the Merkle Tree.

The indirect slice created by appending the left and right child to calculate the hash value has a high cost since the slice isn't returned, so the cost and size of the allocation aren't amortized.

A usual alternative is re-using these kinds of slices to avoid allocs in further operations. I leveraged the sync.Pool type which tends to be a reasonable choice, because it comes from the stdlib and has some internal optimizations regarding thread locality and, of course, it's thread-safe.

Whenever the pool needs a new slice, it creates a slice with 2*NodeSize size.
This implementation works correctly if the total size of lChild and rChild is at most 2*NodeSize considering the nature of copy. If that isn't the case, this could be switched to using append most probably having the same kind of perf improvement. Using pools with "growable" elements may be a more delicate discussion.

Here's a bench compare between develop and this change:

name                   old time/op    new time/op    delta
NewTree-8                  114s ± 3%      112s ± 1%     ~     (p=0.200 n=4+4)
NewTreeSmall-8            3.60s ± 3%     3.43s ± 4%     ~     (p=0.200 n=4+4)
NewTreeNoHashing-8         112s ± 1%      110s ± 6%     ~     (p=0.343 n=4+4)
NewCachingTreeSmall-8     3.57s ± 1%     3.37s ± 1%   -5.66%  (p=0.029 n=4+4)
GenerateProof-8           15.1s ± 2%     15.0s ± 3%     ~     (p=0.343 n=4+4)
ValidatePartialTree-8    59.9µs ± 1%    57.6µs ± 0%   -3.91%  (p=0.029 n=4+4)

name                   old alloc/op   new alloc/op   delta
NewTree-8                34.4GB ± 0%    17.2GB ± 0%  -49.99%  (p=0.029 n=4+4)
NewTreeSmall-8           1.07GB ± 0%    0.54GB ± 0%  -49.99%  (p=0.029 n=4+4)
NewTreeNoHashing-8       34.4GB ± 0%    17.2GB ± 0%  -49.99%  (p=0.029 n=4+4)
NewCachingTreeSmall-8    1.09GB ± 0%    0.55GB ± 0%  -49.20%  (p=0.029 n=4+4)
GenerateProof-8          5.75GB ± 0%    5.21GB ± 0%   -9.34%  (p=0.029 n=4+4)
ValidatePartialTree-8    15.1kB ± 0%     5.9kB ± 0%  -60.62%  (p=0.029 n=4+4)

name                   old allocs/op  new allocs/op  delta
NewTree-8                  805M ± 0%      537M ± 0%  -33.33%  (p=0.029 n=4+4)
NewTreeSmall-8            25.2M ± 0%     16.8M ± 0%  -33.33%  (p=0.029 n=4+4)
NewTreeNoHashing-8         805M ± 0%      537M ± 0%  -33.33%  (p=0.029 n=4+4)
NewCachingTreeSmall-8     25.2M ± 0%     16.8M ± 0%  -33.33%  (p=0.029 n=4+4)
GenerateProof-8           58.8M ± 0%     50.4M ± 0%  -14.28%  (p=0.029 n=4+4)
ValidatePartialTree-8       302 ± 0%       159 ± 0%  -47.35%  (p=0.029 n=4+4)

So, both total bytes allocated and the number of allocations had a dramatic improvement. Total time/op didn't get affected or got slightly better in some cases.

I think that by the way the benchmarks have created the results (even before this change) are affected by a lot of core operations about tree building. Since this change is at the core of most operations, that's the explanation it improved mostly all benchs.

Also, since less garbage is created this change would reduce GC pressure in the big picture application lifecycle.

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
@jsign jsign marked this pull request as ready for review June 14, 2020 17:55
@dshulyak dshulyak closed this Jul 27, 2022
@countvonzero
Copy link

@dshulyak can you comment on the reason for closing this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants