crypto/merkle: hashing ~10% faster and ~60% fewer allocs #351

jsign · 2021-05-22T22:50:58Z

This PR makes some optimizations that produce the following results compared to the master implementation:

name                          old time/op    new time/op    delta
HashAlternatives/recursive-8    72.2µs ± 6%    65.0µs ± 1%   -9.94%  (p=0.000 n=10+10)
HashAlternatives/iterative-8    71.5µs ± 1%    65.6µs ± 1%   -8.26%  (p=0.000 n=8+8)

name                          old alloc/op   new alloc/op   delta
HashAlternatives/recursive-8    25.4kB ± 0%     6.5kB ± 0%  -74.45%  (p=0.000 n=10+10)
HashAlternatives/iterative-8    28.1kB ± 0%     9.2kB ± 0%  -67.33%  (p=0.000 n=10+10)

name                          old allocs/op  new allocs/op  delta
HashAlternatives/recursive-8       497 ± 0%       200 ± 0%  -59.76%  (p=0.000 n=10+10)
HashAlternatives/iterative-8       498 ± 0%       201 ± 0%  -59.64%  (p=0.000 n=10+10)

The gist of the change is basically avoiding allocs, and some nit gain by using a SIMD sha256 implementation. The latter could also be considered for the crypto/tmhash package.

There's a further optimization possible that might reduce allocs even more with pooling, but the main bottleneck is sha256 calc as expected so maybe isn't worth it.

I kept the original leftHash and innerHash methods since they're used in other places, but looks to me can be removed and used the ones I introduced which will avoid the same kind in allocations in other places producing some performance gains in other packages.

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

jsign · 2021-05-22T22:54:37Z

crypto/merkle/hash.go

 // returns tmhash(0x01 || left || right)
 func innerHash(left []byte, right []byte) []byte {
 	return tmhash.Sum(append(innerPrefix, append(left, right...)...))
 }
+
+func innerHashOpt(s hash.Hash, left []byte, right []byte) []byte {


The main idea is to avoid appending and re-use a provided hash.Hash.

jsign · 2021-05-22T22:56:08Z

crypto/merkle/proof_key_path_test.go

@@ -35,6 +35,7 @@ func TestKeyPath(t *testing.T) {

 		res, err := KeyPathToKeys(path.String())
 		require.Nil(t, err)
+		require.Equal(t, len(keys), len(res))


The loop below could possibly PASS even if res was longer.

codecov-commenter · 2021-05-22T23:27:44Z

Codecov Report

Merging #351 (adc5269) into master (880ede1) will decrease coverage by 0.16%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #351      +/-   ##
==========================================
- Coverage   61.98%   61.81%   -0.17%     
==========================================
  Files         263      262       -1     
  Lines       22931    22927       -4     
==========================================
- Hits        14213    14172      -41     
- Misses       7207     7253      +46     
+ Partials     1511     1502       -9

Impacted Files	Coverage Δ
crypto/merkle/hash.go	`100.00% <100.00%> (ø)`
crypto/merkle/proof.go	`87.03% <100.00%> (ø)`
crypto/merkle/tree.go	`89.74% <100.00%> (+0.55%)`	⬆️
light/provider/http/http.go	`40.86% <0.00%> (-15.03%)`	⬇️
light/client.go	`64.48% <0.00%> (-4.56%)`	⬇️
rpc/core/blocks.go	`30.76% <0.00%> (-4.02%)`	⬇️
types/tx.go	`81.48% <0.00%> (-3.71%)`	⬇️
light/verifier.go	`69.23% <0.00%> (-2.06%)`	⬇️
blockchain/v0/pool.go	`79.84% <0.00%> (-1.91%)`	⬇️
statesync/snapshots.go	`91.59% <0.00%> (-1.69%)`	⬇️
... and 22 more

jsign · 2021-05-22T23:40:33Z

Looking now at crytpo/merkle/proof.go, definitely looks like a good idea to use the *Opt versions introduced here that would result in a similar gain regarding avoiding allocs for ProofsFromByteSlices.

I didn't want to cascade the change to more places until having some extra feedback if this feels correct. I can extend this PR to remove the previous functions if that sounds ok.

liamsi · 2021-05-23T00:33:11Z

Wow, thanks for doing this (ref #272 / tendermint/tendermint#2186) and for even providing proper benchmarks! I'll do a proper review shortly.

ValarDragon

This change LGTM, glad the reset API is getting used =)

tac0turtle

This is awesome!!

we should see if it can be upstreamed

liamsi

First of all, thanks again for your contribution 💙 This is cool! And it even reminded me that we recently changes something in another repo in a non-optimal way (reverted here: celestiaorg/nmt#40).

I didn't want to cascade the change to more places until having some extra feedback if this feels correct. I can extend this PR to remove the previous functions if that sounds ok.

Yeah, this feels correct. The change almost looks good as is. The only thing that needs addressing before merging: the simd version is currently added in go mod but not used.
Related to this, it is not clear if the benchmarks improved with the current state or did you actually use the simd version when you ran those?

It might be worth noting that we use this merkle tree only for tiny inputs. The only lazyledger-specific place will be computing the data hash from the rows/columns. Hence the actual impact of these changes will be marginal but we should still merge them! The merkle tree we use for actual data is in this repo: https://github.com/lazyledger/nmt
There is plenty of room for improvements in that one too.

BTW: Alternatively, you could also open this PR on tendermint directly.
@marbar3778 signaled that the team there might merge your changes as well.

To make the turnaround faster on the vanilla tendermint repo, I would suggest splitting the changes into two PRs: 1) the totally uncontroversial use of hash.Hash and reset and 2) replacing standard sha2 with the simd version.

crypto/merkle/tree.go

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

jsign · 2021-05-24T22:36:47Z

@liamsi, I prefer going the route of leaving SIMD out of this PR as to prove the impact of reducing allocs.

As to be totally sure about the benchstat result without SIMD, I've run it again:

benchstat master.bench opt.bench                                       
name                          old time/op    new time/op    delta
HashAlternatives/recursive-8    75.4µs ± 7%    66.5µs ± 2%  -11.85%  (p=0.000 n=10+9)
HashAlternatives/iterative-8    76.3µs ± 4%    64.7µs ± 1%  -15.18%  (p=0.000 n=10+9)

name                          old alloc/op   new alloc/op   delta
HashAlternatives/recursive-8    25.4kB ± 0%     6.5kB ± 0%  -74.45%  (p=0.000 n=10+10)
HashAlternatives/iterative-8    28.1kB ± 0%     9.2kB ± 0%  -67.33%  (p=0.000 n=10+10)

name                          old allocs/op  new allocs/op  delta
HashAlternatives/recursive-8       497 ± 0%       200 ± 0%  -59.76%  (p=0.000 n=10+10)
HashAlternatives/iterative-8       498 ± 0%       201 ± 0%  -59.64%  (p=0.000 n=10+10)

My laptop is a bit noisy, but this proves that the gains are mostly from reducing allocs.

The test data is also quite tiny as to have much impact from SIMD instructions, so makes sense.

I agree on leaving the dependency change to another PR.

liamsi · 2021-05-24T23:19:56Z

crypto/merkle/tree.go

 	"math/bits"
 )

 // HashFromByteSlices computes a Merkle tree where the leaves are the byte slice,
 // in the provided order. It follows RFC-6962.
 func HashFromByteSlices(items [][]byte) []byte {
+	return hashFromByteSlices(sha256.New(), items)


Note to myself: at some point, we need to search for all occurrences like this and make sure they use a central variable / constant instead (just in case we decide to switch to another hash function instead)

liamsi

Thanks again 👍🏼

@odeke-em

## Description Upstream celestiaorg/celestia-core#351 to optimize merkle tree hashing ### Benchmarking: ``` benchmark old ns/op new ns/op delta BenchmarkHashAlternatives/recursive-8 22914 21949 -4.21% BenchmarkHashAlternatives/iterative-8 21634 21939 +1.41% benchmark old allocs new allocs delta BenchmarkHashAlternatives/recursive-8 398 200 -49.75% BenchmarkHashAlternatives/iterative-8 399 301 -24.56% benchmark old bytes new bytes delta BenchmarkHashAlternatives/recursive-8 19088 6496 -65.97% BenchmarkHashAlternatives/iterative-8 21776 13984 -35.78% ``` cc @odeke-em @cuonglm

@odeke-em

## Description Upstream celestiaorg/celestia-core#351 to optimize merkle tree hashing ### Benchmarking: ``` benchmark old ns/op new ns/op delta BenchmarkHashAlternatives/recursive-8 22914 21949 -4.21% BenchmarkHashAlternatives/iterative-8 21634 21939 +1.41% benchmark old allocs new allocs delta BenchmarkHashAlternatives/recursive-8 398 200 -49.75% BenchmarkHashAlternatives/iterative-8 399 301 -24.56% benchmark old bytes new bytes delta BenchmarkHashAlternatives/recursive-8 19088 6496 -65.97% BenchmarkHashAlternatives/iterative-8 21776 13984 -35.78% ``` cc @odeke-em @cuonglm

@odeke-em

Upstream celestiaorg/celestia-core#351 to optimize merkle tree hashing ``` benchmark old ns/op new ns/op delta BenchmarkHashAlternatives/recursive-8 22914 21949 -4.21% BenchmarkHashAlternatives/iterative-8 21634 21939 +1.41% benchmark old allocs new allocs delta BenchmarkHashAlternatives/recursive-8 398 200 -49.75% BenchmarkHashAlternatives/iterative-8 399 301 -24.56% benchmark old bytes new bytes delta BenchmarkHashAlternatives/recursive-8 19088 6496 -65.97% BenchmarkHashAlternatives/iterative-8 21776 13984 -35.78% ``` cc @odeke-em @cuonglm

@odeke-em

* crypto/merkle: optimize merkle tree hashing (#6513) Upstream celestiaorg/celestia-core#351 to optimize merkle tree hashing ``` benchmark old ns/op new ns/op delta BenchmarkHashAlternatives/recursive-8 22914 21949 -4.21% BenchmarkHashAlternatives/iterative-8 21634 21939 +1.41% benchmark old allocs new allocs delta BenchmarkHashAlternatives/recursive-8 398 200 -49.75% BenchmarkHashAlternatives/iterative-8 399 301 -24.56% benchmark old bytes new bytes delta BenchmarkHashAlternatives/recursive-8 19088 6496 -65.97% BenchmarkHashAlternatives/iterative-8 21776 13984 -35.78% ``` cc @odeke-em @cuonglm * update pending log Co-authored-by: Marko <marbar3778@yahoo.com>

adaptation/backport of celestiaorg#351

#351) * Rename package back to tendermint/tendermint and use replace to point to the current repo * Rename package back to tendermint/tendermint in the proto files and generated files * Rename package back to tendermint/tendermint in the mocks * Remove useless replace * Apply suggestions from code review * Apply suggestions from code review

@odeke-em

* crypto/merkle: optimize merkle tree hashing (#6513) Upstream celestiaorg/celestia-core#351 to optimize merkle tree hashing ``` benchmark old ns/op new ns/op delta BenchmarkHashAlternatives/recursive-8 22914 21949 -4.21% BenchmarkHashAlternatives/iterative-8 21634 21939 +1.41% benchmark old allocs new allocs delta BenchmarkHashAlternatives/recursive-8 398 200 -49.75% BenchmarkHashAlternatives/iterative-8 399 301 -24.56% benchmark old bytes new bytes delta BenchmarkHashAlternatives/recursive-8 19088 6496 -65.97% BenchmarkHashAlternatives/iterative-8 21776 13984 -35.78% ``` cc @odeke-em @cuonglm * update pending log Co-authored-by: Marko <marbar3778@yahoo.com>

@odeke-em

* crypto/merkle: optimize merkle tree hashing (#6513) Upstream celestiaorg/celestia-core#351 to optimize merkle tree hashing ``` benchmark old ns/op new ns/op delta BenchmarkHashAlternatives/recursive-8 22914 21949 -4.21% BenchmarkHashAlternatives/iterative-8 21634 21939 +1.41% benchmark old allocs new allocs delta BenchmarkHashAlternatives/recursive-8 398 200 -49.75% BenchmarkHashAlternatives/iterative-8 399 301 -24.56% benchmark old bytes new bytes delta BenchmarkHashAlternatives/recursive-8 19088 6496 -65.97% BenchmarkHashAlternatives/iterative-8 21776 13984 -35.78% ``` cc @odeke-em @cuonglm * update pending log Co-authored-by: Marko <marbar3778@yahoo.com> (cherry picked from commit 8e787f0)

@odeke-em

…25) (#26) * crypto/merkle: optimize merkle tree hashing (#6513) (#9446) * crypto/merkle: optimize merkle tree hashing (#6513) Upstream celestiaorg/celestia-core#351 to optimize merkle tree hashing ``` benchmark old ns/op new ns/op delta BenchmarkHashAlternatives/recursive-8 22914 21949 -4.21% BenchmarkHashAlternatives/iterative-8 21634 21939 +1.41% benchmark old allocs new allocs delta BenchmarkHashAlternatives/recursive-8 398 200 -49.75% BenchmarkHashAlternatives/iterative-8 399 301 -24.56% benchmark old bytes new bytes delta BenchmarkHashAlternatives/recursive-8 19088 6496 -65.97% BenchmarkHashAlternatives/iterative-8 21776 13984 -35.78% ``` cc @odeke-em @cuonglm * update pending log Co-authored-by: Marko <marbar3778@yahoo.com> (cherry picked from commit 8e787f0) * add changelog (cherry picked from commit 9e015e0) * add back rootHash nil check (cherry picked from commit 4b602f8) * update changelog (cherry picked from commit 6b1d05f) --------- Co-authored-by: JayT106 <JayT106@users.noreply.github.com> Co-authored-by: Adam Tucker <adamleetucker@outlook.com>

crypto/merkle: optimize cpu and allocations

dc58692

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

jsign commented May 22, 2021

View reviewed changes

jsign marked this pull request as ready for review May 22, 2021 22:56

jsign requested review from evan-forbes, liamsi and tac0turtle as code owners May 22, 2021 22:56

jsign changed the title ~~crypto/merkle: hashing ~10% faster and ~60% less allocs~~ crypto/merkle: hashing ~10% faster and ~60% fewer allocs May 22, 2021

liamsi mentioned this pull request May 23, 2021

Use sha256-simd instead of native Golang implementation #272

Closed

ValarDragon approved these changes May 23, 2021

View reviewed changes

liamsi mentioned this pull request May 23, 2021

Reuse hash celestiaorg/nmt#40

Merged

tac0turtle approved these changes May 23, 2021

View reviewed changes

liamsi requested changes May 24, 2021

View reviewed changes

crypto/merkle/tree.go Show resolved Hide resolved

mod tidy

adc5269

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

liamsi reviewed May 24, 2021

View reviewed changes

liamsi approved these changes May 24, 2021

View reviewed changes

liamsi merged commit 7fca45b into celestiaorg:master May 25, 2021

tac0turtle mentioned this pull request May 31, 2021

crypto/merkle: optimize merkle tree hashing tendermint/tendermint#6513

Merged

JayT106 mentioned this pull request Sep 16, 2022

crypto/merkle: optimize merkle tree hashing (#6513) tendermint/tendermint#9446

Merged

3 tasks

shyba mentioned this pull request Jan 30, 2023

feat: use sha256-simd #955

Closed

shyba added a commit to contribforks/celestia-core that referenced this pull request Jan 31, 2023

crypto/merkle: reuse hasher, reduce allocations

71f9333

adaptation/backport of celestiaorg#351

czarcas7ic mentioned this pull request Apr 23, 2024

crypto/merkle: optimize merkle tree hashing (#6513) (#9446) osmosis-labs/cometbft#25

Merged

3 tasks

mergify bot mentioned this pull request Apr 24, 2024

crypto/merkle: optimize merkle tree hashing (#6513) (#9446) (backport #25) osmosis-labs/cometbft#26

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

crypto/merkle: hashing ~10% faster and ~60% fewer allocs #351

crypto/merkle: hashing ~10% faster and ~60% fewer allocs #351

jsign commented May 22, 2021

jsign May 22, 2021

jsign May 22, 2021

codecov-commenter commented May 22, 2021 •

edited

Loading

jsign commented May 22, 2021 •

edited

Loading

liamsi commented May 23, 2021 •

edited

Loading

ValarDragon left a comment

tac0turtle left a comment

liamsi left a comment •

edited

Loading

jsign commented May 24, 2021

liamsi May 24, 2021

liamsi left a comment

crypto/merkle: hashing ~10% faster and ~60% fewer allocs #351

crypto/merkle: hashing ~10% faster and ~60% fewer allocs #351

Conversation

jsign commented May 22, 2021

jsign May 22, 2021

Choose a reason for hiding this comment

jsign May 22, 2021

Choose a reason for hiding this comment

codecov-commenter commented May 22, 2021 • edited Loading

Codecov Report

jsign commented May 22, 2021 • edited Loading

liamsi commented May 23, 2021 • edited Loading

ValarDragon left a comment

Choose a reason for hiding this comment

tac0turtle left a comment

Choose a reason for hiding this comment

liamsi left a comment • edited Loading

Choose a reason for hiding this comment

jsign commented May 24, 2021

liamsi May 24, 2021

Choose a reason for hiding this comment

liamsi left a comment

Choose a reason for hiding this comment

codecov-commenter commented May 22, 2021 •

edited

Loading

jsign commented May 22, 2021 •

edited

Loading

liamsi commented May 23, 2021 •

edited

Loading

liamsi left a comment •

edited

Loading