Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary Merkle Tree Hash #14334

Merged
merged 5 commits into from Sep 5, 2017
Merged

Binary Merkle Tree Hash #14334

merged 5 commits into from Sep 5, 2017

Conversation

zelig
Copy link
Contributor

@zelig zelig commented Apr 15, 2017

bmt is a new package that provides hashers for binary merkle tree hashes on size-limited chunks
the main motivation is that using BMT hash as the chunk hash of the swarm hash offers logsize inclusion proofs for arbitrary files on a 32-byte resolution completely viable to use in challenges on the blockchain.

@zelig zelig self-assigned this Apr 15, 2017
@zelig zelig mentioned this pull request Jun 26, 2017
13 tasks
@jmozah jmozah force-pushed the bmt branch 2 times, most recently from 0200fce to 6170e9e Compare June 28, 2017 20:26
Copy link
Contributor

@lmars lmars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some general comments on the Go code, I haven't really reviewed the algorithm thoroughly yet though.

bmt/bmt.go Outdated
select {
case <-self.c:
self.count--
default:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This default clause is redundant, it will just lead to the next loop iteration trying to select on the channel again, so it may as well just be:

for len(self.c) > n {
	<-self.c
	self.count--
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true as the lock makes sure it is called nonconcurrently.

bmt/bmt.go Outdated

// blocks until it returns an available BMTree
// it reuses free BMTrees or creates a new one if size is not reached
func (self *BMTreePool) Reserve() *BMTree {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this benefit from using sync.Pool instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of that but it is ok

bmt/bmt.go Outdated
return NewEOC(nil)
}
rightmost := i == int(max-1)
last := atomic.AddInt32(&self.max, 1) == max
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This look like a race: two goroutines atomically read self.max, both find int(max) > i, both potentially calculate rightmost to be true and race to set self.segment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code is unused, will remove for now

bmt/bmt_test.go Outdated
}

func TestBMTHasherReuseWithRelease(t *testing.T) {
testBMTHasherReuse(t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is identical to TestBMTHasherReuseWithoutRelease above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

orenyodfat and others added 3 commits August 10, 2017 14:42
  * remove fmt prints in test
  * remove unused segmentWriter related code
  * added missing comments and package premable doc
  * fix pool reuse benchmarks
  * rename functions
l := len(d)
left := d
var right []byte
if l > rh.section {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not be handling the case where hashsize < l <= 2*hashsize here and calculating the hash of both left and right?

For example, consider the BMT of 64 zero bytes. I would expect it to be the sha3 of the concatenation of two sha3 hashes of 32 zero bytes:

sha3 := func(data ...[]byte) []byte {
	h := sha3.NewKeccak256()
	for _, v := range data {
		h.Write(v)
	}
	return h.Sum(nil)
}
zeroes := make([]byte, 32)
fmt.Printf("BMT of 64 bytes: %x\n", sha3(sha3(zeroes), sha3(zeroes)))

Running that I get:

$ go run bmt.go 
BMT of 64 bytes: 633dc4d7da7256660a892f8f1604a44b5432649cc8ec5cb3ced4c4e6ac94dd1d

However this reference implementation gives a different hash (which is just sha3(zeroes + zeroes)):

h := bmt.NewRefHasher(sha3.NewKeccak256, 128)
fmt.Printf("Ref BMT of 64 bytes: %x\n", h.Hash(make([]byte, 64)))
$ go run ref-bmt.go 
Ref BMT of 64 bytes: ad3228b676f7d3cd4284a5443f17f1962b36e491b30a40b2405849e597ba5fb5

Am I missing something here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is defined as per the reference implementation.
Did you find inconsistency between the ref result and the concurrent one?
or do you disagree with the definition?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am disagreeing with the definition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To summarise, the ref implementation generates the BMT of 64 bytes as just sha3(64 bytes) whereas I think it should be sha3(sha3(first 32 bytes) + sha3(second 32 bytes))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I side with @zelig here: the change proposed by @lmars would slightly decrease the complexity of the control logic of verification, but at the cost of having to calculate more hashes. There is no way that it will result in less gas use in the verification contract, which is one of the most important design considerations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok that's fine, so I think we need to explicitly document that this is the intended implementation to avoid any confusion, and perhaps add an explicit test demonstrating the behaviour.

Copy link

@nagydani nagydani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear from the test source that important corner cases are explcitly tested such as

  • Empty chunk (eight zero bytes)
  • Chunks shorter than one hash length
  • Chunks of lengths that are not integer multiples of one hash length

@lmars
Copy link
Contributor

lmars commented Aug 14, 2017

I agree with @nagydani, we need some exhaustive tests for data of various lengths.

@ethereum ethereum deleted a comment from GitCop Aug 16, 2017
@ethereum ethereum deleted a comment from GitCop Aug 16, 2017
@ethereum ethereum deleted a comment from GitCop Aug 16, 2017
@ethereum ethereum deleted a comment from GitCop Aug 16, 2017
@ethereum ethereum deleted a comment from GitCop Aug 16, 2017
@ethereum ethereum deleted a comment from GitCop Aug 16, 2017
@zelig
Copy link
Contributor Author

zelig commented Aug 16, 2017

@lmars
Copy link
Contributor

lmars commented Aug 16, 2017

@zelig so two comments on the tests:

  • there seems to be a lot of indirection in the tests with the numerous testXXX functions, and also not a single comment of what is being tested, which I believe is leading us to ask these questions
  • I think the reference implementation itself should be thoroughly tested, it looks like it is currently untested

@lmars
Copy link
Contributor

lmars commented Aug 16, 2017

@zelig I decided to just write some tests and I'm still not sure if I know what the reference implementation is supposed to be doing.

For example, I tried to hash 65 bytes of data and expected it to be:

sha3(sha3(data[0:64]), sha3(data[64:65]))

but I get a different result? Is my expectation again incorrect?

Here is the test: https://gist.github.com/lmars/67f232dfbdbf8635364ec1901343e51b

I really think it would be beneficial to be explicit and document the expected hashes and how they are constructed just using sha3 for say 0 <= length <= 256 (ideally in a test).

@zelig
Copy link
Contributor Author

zelig commented Aug 16, 2017

you assumption is again incorrect indeed.
But your suggestion is completely valid.
You can rewrite the tests if you like and we should indeed document the spec :)
singleton branches (no longer than a segment) are not hashed, just appended to the left branch hash
as per line 76, so

BMTHash(d_65) := sha3(sha3(d[0:64]), d[65:65]) 

Signed-off-by: Lewis Marshall <lewis@lmars.net>
@lmars
Copy link
Contributor

lmars commented Aug 18, 2017

@zelig I've added a test of RefHasher in c179801, it is pretty verbose and exhaustive but I find it avoids any confusion on what the hashes should be.

@zelig
Copy link
Contributor Author

zelig commented Aug 18, 2017

@lmars great stuff like.

Copy link

@nagydani nagydani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After an interactive review session with @zelig , approved.

@fjl fjl merged commit 2bacf36 into ethereum:master Sep 5, 2017
@karalabe karalabe added this to the 1.7.0 milestone Sep 5, 2017
@gbalint gbalint deleted the bmt branch May 25, 2018 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants