Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/merkle: hashing ~10% faster and ~60% fewer allocs #351

Merged
merged 2 commits into from
May 25, 2021

Conversation

jsign
Copy link
Contributor

@jsign jsign commented May 22, 2021

This PR makes some optimizations that produce the following results compared to the master implementation:

name                          old time/op    new time/op    delta
HashAlternatives/recursive-8    72.2µs ± 6%    65.0µs ± 1%   -9.94%  (p=0.000 n=10+10)
HashAlternatives/iterative-8    71.5µs ± 1%    65.6µs ± 1%   -8.26%  (p=0.000 n=8+8)

name                          old alloc/op   new alloc/op   delta
HashAlternatives/recursive-8    25.4kB ± 0%     6.5kB ± 0%  -74.45%  (p=0.000 n=10+10)
HashAlternatives/iterative-8    28.1kB ± 0%     9.2kB ± 0%  -67.33%  (p=0.000 n=10+10)

name                          old allocs/op  new allocs/op  delta
HashAlternatives/recursive-8       497 ± 0%       200 ± 0%  -59.76%  (p=0.000 n=10+10)
HashAlternatives/iterative-8       498 ± 0%       201 ± 0%  -59.64%  (p=0.000 n=10+10)

The gist of the change is basically avoiding allocs, and some nit gain by using a SIMD sha256 implementation. The latter could also be considered for the crypto/tmhash package.

There's a further optimization possible that might reduce allocs even more with pooling, but the main bottleneck is sha256 calc as expected so maybe isn't worth it.

I kept the original leftHash and innerHash methods since they're used in other places, but looks to me can be removed and used the ones I introduced which will avoid the same kind in allocations in other places producing some performance gains in other packages.

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
// returns tmhash(0x01 || left || right)
func innerHash(left []byte, right []byte) []byte {
return tmhash.Sum(append(innerPrefix, append(left, right...)...))
}

func innerHashOpt(s hash.Hash, left []byte, right []byte) []byte {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main idea is to avoid appending and re-use a provided hash.Hash.

@@ -35,6 +35,7 @@ func TestKeyPath(t *testing.T) {

res, err := KeyPathToKeys(path.String())
require.Nil(t, err)
require.Equal(t, len(keys), len(res))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop below could possibly PASS even if res was longer.

@jsign jsign marked this pull request as ready for review May 22, 2021 22:56
@codecov-commenter
Copy link

codecov-commenter commented May 22, 2021

Codecov Report

Merging #351 (adc5269) into master (880ede1) will decrease coverage by 0.16%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #351      +/-   ##
==========================================
- Coverage   61.98%   61.81%   -0.17%     
==========================================
  Files         263      262       -1     
  Lines       22931    22927       -4     
==========================================
- Hits        14213    14172      -41     
- Misses       7207     7253      +46     
+ Partials     1511     1502       -9     
Impacted Files Coverage Δ
crypto/merkle/hash.go 100.00% <100.00%> (ø)
crypto/merkle/proof.go 87.03% <100.00%> (ø)
crypto/merkle/tree.go 89.74% <100.00%> (+0.55%) ⬆️
light/provider/http/http.go 40.86% <0.00%> (-15.03%) ⬇️
light/client.go 64.48% <0.00%> (-4.56%) ⬇️
rpc/core/blocks.go 30.76% <0.00%> (-4.02%) ⬇️
types/tx.go 81.48% <0.00%> (-3.71%) ⬇️
light/verifier.go 69.23% <0.00%> (-2.06%) ⬇️
blockchain/v0/pool.go 79.84% <0.00%> (-1.91%) ⬇️
statesync/snapshots.go 91.59% <0.00%> (-1.69%) ⬇️
... and 22 more

@jsign
Copy link
Contributor Author

jsign commented May 22, 2021

Looking now at crytpo/merkle/proof.go, definitely looks like a good idea to use the *Opt versions introduced here that would result in a similar gain regarding avoiding allocs for ProofsFromByteSlices.

I didn't want to cascade the change to more places until having some extra feedback if this feels correct. I can extend this PR to remove the previous functions if that sounds ok.

@jsign jsign changed the title crypto/merkle: hashing ~10% faster and ~60% less allocs crypto/merkle: hashing ~10% faster and ~60% fewer allocs May 22, 2021
@liamsi
Copy link
Member

liamsi commented May 23, 2021

Wow, thanks for doing this (ref #272 / tendermint/tendermint#2186) and for even providing proper benchmarks! I'll do a proper review shortly.

Copy link
Contributor

@ValarDragon ValarDragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change LGTM, glad the reset API is getting used =)

@liamsi liamsi mentioned this pull request May 23, 2021
Copy link
Member

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome!!

we should see if it can be upstreamed

Copy link
Member

@liamsi liamsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, thanks again for your contribution 💙 This is cool! And it even reminded me that we recently changes something in another repo in a non-optimal way (reverted here: celestiaorg/nmt#40).

I didn't want to cascade the change to more places until having some extra feedback if this feels correct. I can extend this PR to remove the previous functions if that sounds ok.

Yeah, this feels correct. The change almost looks good as is. The only thing that needs addressing before merging: the simd version is currently added in go mod but not used.
Related to this, it is not clear if the benchmarks improved with the current state or did you actually use the simd version when you ran those?

It might be worth noting that we use this merkle tree only for tiny inputs. The only lazyledger-specific place will be computing the data hash from the rows/columns. Hence the actual impact of these changes will be marginal but we should still merge them! The merkle tree we use for actual data is in this repo: https://github.com/lazyledger/nmt
There is plenty of room for improvements in that one too.

BTW: Alternatively, you could also open this PR on tendermint directly.
@marbar3778 signaled that the team there might merge your changes as well.

To make the turnaround faster on the vanilla tendermint repo, I would suggest splitting the changes into two PRs: 1) the totally uncontroversial use of hash.Hash and reset and 2) replacing standard sha2 with the simd version.

crypto/merkle/tree.go Show resolved Hide resolved
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
@jsign
Copy link
Contributor Author

jsign commented May 24, 2021

@liamsi, I prefer going the route of leaving SIMD out of this PR as to prove the impact of reducing allocs.

As to be totally sure about the benchstat result without SIMD, I've run it again:

benchstat master.bench opt.bench                                       
name                          old time/op    new time/op    delta
HashAlternatives/recursive-8    75.4µs ± 7%    66.5µs ± 2%  -11.85%  (p=0.000 n=10+9)
HashAlternatives/iterative-8    76.3µs ± 4%    64.7µs ± 1%  -15.18%  (p=0.000 n=10+9)

name                          old alloc/op   new alloc/op   delta
HashAlternatives/recursive-8    25.4kB ± 0%     6.5kB ± 0%  -74.45%  (p=0.000 n=10+10)
HashAlternatives/iterative-8    28.1kB ± 0%     9.2kB ± 0%  -67.33%  (p=0.000 n=10+10)

name                          old allocs/op  new allocs/op  delta
HashAlternatives/recursive-8       497 ± 0%       200 ± 0%  -59.76%  (p=0.000 n=10+10)
HashAlternatives/iterative-8       498 ± 0%       201 ± 0%  -59.64%  (p=0.000 n=10+10)

My laptop is a bit noisy, but this proves that the gains are mostly from reducing allocs.

The test data is also quite tiny as to have much impact from SIMD instructions, so makes sense.

I agree on leaving the dependency change to another PR.

"math/bits"
)

// HashFromByteSlices computes a Merkle tree where the leaves are the byte slice,
// in the provided order. It follows RFC-6962.
func HashFromByteSlices(items [][]byte) []byte {
return hashFromByteSlices(sha256.New(), items)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to myself: at some point, we need to search for all occurrences like this and make sure they use a central variable / constant instead (just in case we decide to switch to another hash function instead)

Copy link
Member

@liamsi liamsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again 👍🏼

@liamsi liamsi merged commit 7fca45b into celestiaorg:master May 25, 2021
mergify bot pushed a commit to tendermint/tendermint that referenced this pull request Jun 1, 2021
## Description 

Upstream celestiaorg/celestia-core#351 to optimize merkle tree hashing 

### Benchmarking:

```
benchmark                                 old ns/op     new ns/op     delta
BenchmarkHashAlternatives/recursive-8     22914         21949         -4.21%
BenchmarkHashAlternatives/iterative-8     21634         21939         +1.41%

benchmark                                 old allocs     new allocs     delta
BenchmarkHashAlternatives/recursive-8     398            200            -49.75%
BenchmarkHashAlternatives/iterative-8     399            301            -24.56%

benchmark                                 old bytes     new bytes     delta
BenchmarkHashAlternatives/recursive-8     19088         6496          -65.97%
BenchmarkHashAlternatives/iterative-8     21776         13984         -35.78%
```

cc @odeke-em @cuonglm
NexZhu pushed a commit to crpt/go-merkle that referenced this pull request Dec 1, 2021
## Description 

Upstream celestiaorg/celestia-core#351 to optimize merkle tree hashing 

### Benchmarking:

```
benchmark                                 old ns/op     new ns/op     delta
BenchmarkHashAlternatives/recursive-8     22914         21949         -4.21%
BenchmarkHashAlternatives/iterative-8     21634         21939         +1.41%

benchmark                                 old allocs     new allocs     delta
BenchmarkHashAlternatives/recursive-8     398            200            -49.75%
BenchmarkHashAlternatives/iterative-8     399            301            -24.56%

benchmark                                 old bytes     new bytes     delta
BenchmarkHashAlternatives/recursive-8     19088         6496          -65.97%
BenchmarkHashAlternatives/iterative-8     21776         13984         -35.78%
```

cc @odeke-em @cuonglm
JayT106 pushed a commit to JayT106/tendermint that referenced this pull request Sep 16, 2022
Upstream celestiaorg/celestia-core#351 to optimize merkle tree hashing

```
benchmark                                 old ns/op     new ns/op     delta
BenchmarkHashAlternatives/recursive-8     22914         21949         -4.21%
BenchmarkHashAlternatives/iterative-8     21634         21939         +1.41%

benchmark                                 old allocs     new allocs     delta
BenchmarkHashAlternatives/recursive-8     398            200            -49.75%
BenchmarkHashAlternatives/iterative-8     399            301            -24.56%

benchmark                                 old bytes     new bytes     delta
BenchmarkHashAlternatives/recursive-8     19088         6496          -65.97%
BenchmarkHashAlternatives/iterative-8     21776         13984         -35.78%
```

cc @odeke-em @cuonglm
cmwaters pushed a commit to tendermint/tendermint that referenced this pull request Sep 21, 2022
* crypto/merkle: optimize merkle tree hashing (#6513)

Upstream celestiaorg/celestia-core#351 to optimize merkle tree hashing

```
benchmark                                 old ns/op     new ns/op     delta
BenchmarkHashAlternatives/recursive-8     22914         21949         -4.21%
BenchmarkHashAlternatives/iterative-8     21634         21939         +1.41%

benchmark                                 old allocs     new allocs     delta
BenchmarkHashAlternatives/recursive-8     398            200            -49.75%
BenchmarkHashAlternatives/iterative-8     399            301            -24.56%

benchmark                                 old bytes     new bytes     delta
BenchmarkHashAlternatives/recursive-8     19088         6496          -65.97%
BenchmarkHashAlternatives/iterative-8     21776         13984         -35.78%
```

cc @odeke-em @cuonglm

* update pending log

Co-authored-by: Marko <marbar3778@yahoo.com>
@shyba shyba mentioned this pull request Jan 30, 2023
shyba added a commit to contribforks/celestia-core that referenced this pull request Jan 31, 2023
cmwaters pushed a commit that referenced this pull request Mar 13, 2023
#351)

* Rename package back to tendermint/tendermint and use replace to point to the current repo

* Rename package back to tendermint/tendermint in the proto files and generated files

* Rename package back to tendermint/tendermint in the mocks

* Remove useless replace

* Apply suggestions from code review

* Apply suggestions from code review
czarcas7ic pushed a commit to osmosis-labs/cometbft that referenced this pull request Apr 23, 2024
* crypto/merkle: optimize merkle tree hashing (#6513)

Upstream celestiaorg/celestia-core#351 to optimize merkle tree hashing

```
benchmark                                 old ns/op     new ns/op     delta
BenchmarkHashAlternatives/recursive-8     22914         21949         -4.21%
BenchmarkHashAlternatives/iterative-8     21634         21939         +1.41%

benchmark                                 old allocs     new allocs     delta
BenchmarkHashAlternatives/recursive-8     398            200            -49.75%
BenchmarkHashAlternatives/iterative-8     399            301            -24.56%

benchmark                                 old bytes     new bytes     delta
BenchmarkHashAlternatives/recursive-8     19088         6496          -65.97%
BenchmarkHashAlternatives/iterative-8     21776         13984         -35.78%
```

cc @odeke-em @cuonglm

* update pending log

Co-authored-by: Marko <marbar3778@yahoo.com>
mergify bot pushed a commit to osmosis-labs/cometbft that referenced this pull request Apr 24, 2024
* crypto/merkle: optimize merkle tree hashing (#6513)

Upstream celestiaorg/celestia-core#351 to optimize merkle tree hashing

```
benchmark                                 old ns/op     new ns/op     delta
BenchmarkHashAlternatives/recursive-8     22914         21949         -4.21%
BenchmarkHashAlternatives/iterative-8     21634         21939         +1.41%

benchmark                                 old allocs     new allocs     delta
BenchmarkHashAlternatives/recursive-8     398            200            -49.75%
BenchmarkHashAlternatives/iterative-8     399            301            -24.56%

benchmark                                 old bytes     new bytes     delta
BenchmarkHashAlternatives/recursive-8     19088         6496          -65.97%
BenchmarkHashAlternatives/iterative-8     21776         13984         -35.78%
```

cc @odeke-em @cuonglm

* update pending log

Co-authored-by: Marko <marbar3778@yahoo.com>
(cherry picked from commit 8e787f0)
czarcas7ic added a commit to osmosis-labs/cometbft that referenced this pull request Apr 24, 2024
…25) (#26)

* crypto/merkle: optimize merkle tree hashing (#6513) (#9446)

* crypto/merkle: optimize merkle tree hashing (#6513)

Upstream celestiaorg/celestia-core#351 to optimize merkle tree hashing

```
benchmark                                 old ns/op     new ns/op     delta
BenchmarkHashAlternatives/recursive-8     22914         21949         -4.21%
BenchmarkHashAlternatives/iterative-8     21634         21939         +1.41%

benchmark                                 old allocs     new allocs     delta
BenchmarkHashAlternatives/recursive-8     398            200            -49.75%
BenchmarkHashAlternatives/iterative-8     399            301            -24.56%

benchmark                                 old bytes     new bytes     delta
BenchmarkHashAlternatives/recursive-8     19088         6496          -65.97%
BenchmarkHashAlternatives/iterative-8     21776         13984         -35.78%
```

cc @odeke-em @cuonglm

* update pending log

Co-authored-by: Marko <marbar3778@yahoo.com>
(cherry picked from commit 8e787f0)

* add changelog

(cherry picked from commit 9e015e0)

* add back rootHash nil check

(cherry picked from commit 4b602f8)

* update changelog

(cherry picked from commit 6b1d05f)

---------

Co-authored-by: JayT106 <JayT106@users.noreply.github.com>
Co-authored-by: Adam Tucker <adamleetucker@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants