sha256: use freelist to avoid allocations #14

jsign · 2020-06-14T17:53:56Z

Running the benchmarks and inspecting profiles, it seems that most allocations were related to sha256 calculations to building the Merkle Tree.

The indirect slice created by appending the left and right child to calculate the hash value has a high cost since the slice isn't returned, so the cost and size of the allocation aren't amortized.

A usual alternative is re-using these kinds of slices to avoid allocs in further operations. I leveraged the sync.Pool type which tends to be a reasonable choice, because it comes from the stdlib and has some internal optimizations regarding thread locality and, of course, it's thread-safe.

Whenever the pool needs a new slice, it creates a slice with 2*NodeSize size.
This implementation works correctly if the total size of lChild and rChild is at most 2*NodeSize considering the nature of copy. If that isn't the case, this could be switched to using append most probably having the same kind of perf improvement. Using pools with "growable" elements may be a more delicate discussion.

Here's a bench compare between develop and this change:

name                   old time/op    new time/op    delta
NewTree-8                  114s ± 3%      112s ± 1%     ~     (p=0.200 n=4+4)
NewTreeSmall-8            3.60s ± 3%     3.43s ± 4%     ~     (p=0.200 n=4+4)
NewTreeNoHashing-8         112s ± 1%      110s ± 6%     ~     (p=0.343 n=4+4)
NewCachingTreeSmall-8     3.57s ± 1%     3.37s ± 1%   -5.66%  (p=0.029 n=4+4)
GenerateProof-8           15.1s ± 2%     15.0s ± 3%     ~     (p=0.343 n=4+4)
ValidatePartialTree-8    59.9µs ± 1%    57.6µs ± 0%   -3.91%  (p=0.029 n=4+4)

name                   old alloc/op   new alloc/op   delta
NewTree-8                34.4GB ± 0%    17.2GB ± 0%  -49.99%  (p=0.029 n=4+4)
NewTreeSmall-8           1.07GB ± 0%    0.54GB ± 0%  -49.99%  (p=0.029 n=4+4)
NewTreeNoHashing-8       34.4GB ± 0%    17.2GB ± 0%  -49.99%  (p=0.029 n=4+4)
NewCachingTreeSmall-8    1.09GB ± 0%    0.55GB ± 0%  -49.20%  (p=0.029 n=4+4)
GenerateProof-8          5.75GB ± 0%    5.21GB ± 0%   -9.34%  (p=0.029 n=4+4)
ValidatePartialTree-8    15.1kB ± 0%     5.9kB ± 0%  -60.62%  (p=0.029 n=4+4)

name                   old allocs/op  new allocs/op  delta
NewTree-8                  805M ± 0%      537M ± 0%  -33.33%  (p=0.029 n=4+4)
NewTreeSmall-8            25.2M ± 0%     16.8M ± 0%  -33.33%  (p=0.029 n=4+4)
NewTreeNoHashing-8         805M ± 0%      537M ± 0%  -33.33%  (p=0.029 n=4+4)
NewCachingTreeSmall-8     25.2M ± 0%     16.8M ± 0%  -33.33%  (p=0.029 n=4+4)
GenerateProof-8           58.8M ± 0%     50.4M ± 0%  -14.28%  (p=0.029 n=4+4)
ValidatePartialTree-8       302 ± 0%       159 ± 0%  -47.35%  (p=0.029 n=4+4)

So, both total bytes allocated and the number of allocations had a dramatic improvement. Total time/op didn't get affected or got slightly better in some cases.

I think that by the way the benchmarks have created the results (even before this change) are affected by a lot of core operations about tree building. Since this change is at the core of most operations, that's the explanation it improved mostly all benchs.

Also, since less garbage is created this change would reduce GC pressure in the big picture application lifecycle.

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

countvonzero · 2022-08-08T22:59:34Z

@dshulyak can you comment on the reason for closing this PR?

sha256: use freelist to avoid allocations

54024a6

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

jsign marked this pull request as ready for review June 14, 2020 17:55

noamnelke mentioned this pull request Jul 20, 2022

PoET ticks spacemeshos/pm#124

Closed

dshulyak closed this Jul 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sha256: use freelist to avoid allocations #14

sha256: use freelist to avoid allocations #14

jsign commented Jun 14, 2020

countvonzero commented Aug 8, 2022

sha256: use freelist to avoid allocations #14

sha256: use freelist to avoid allocations #14

Conversation

jsign commented Jun 14, 2020

countvonzero commented Aug 8, 2022