bmt, param: Introduce SectionHasher interface, implement in bmt #2021

nolash · 2019-12-10T17:03:30Z

This PR is part of a series of PRs that introduces an interface that allows chaining of components that receive a data stream and generate hashes and intermediate Merkle-Tree chunks. The individual PR steps will be partitioned from #2022 (branch https://github.com/nolash/swarm/tree/filehasher-avenged ) as follows:

Introduce SectionWriter, implement this interface in bmt, make AsyncHasher standalone (this PR)
Move AsyncHasher to file/hasher
Add reference implementation of the Filehasher algorithm
Add implementation of SectionWriter sub-component for hashing intermediate Merkle Tree levels.
Add implementation of SectionWriter component executing the FileHasher algorithm
Add a "splitter" that bridges io.Reader and SectionWriter, and an implementation of SectionWriter component that provides Chunk output.
Add implementation of SectionWriter that provides encryption, along with a test utility SectionWriter implentation of a data cache.
Evaluate and prune bmt.Hasher exports wtr AsyncHasher

Introduce `SectionWriter` interface and implement this interface in `bmt`

The objectives of this PR are:

Introduce the interface
Implement interface in bmt.Hasher
Enable use of bmt.Hasher by using only the hash.Hash interface
Prepare for moving AsyncHasher to separate package
Avoid any dependencies on storage.SwarmHash outside the storage package

`SectionWriter` interface

The interface is defined in the package /file

type SectionWriter interface {
        hash.Hash
        SetWriter(hashFunc SectionWriterFunc) SectionWriter
        SetLength(length int)
        SetSpan(length int)
        SectionSize() int
        Branches() int
}

hash.Hash

Essentially the FileHasher is a hashing operation. Thus it makes sense that the components can be used through the same interface as other hashing components provided in golang.

SetWriter

Chains SectionWriter to a subsequent SectionWriter. It should be optional for the SectionWriter to provide chaning. The method is itself chainable.

SetSpan

Sets the "span," meaning the amount of data represented by the data written to the SectionWriter. Eg. the references constituting the data of an intermediate chunk "repesents" more data than the actual data bytes. For bmt.Hasher this was previously provided by the ResetWithLength call, and lack of a separate way of setting the span made it impossible to use bmt.Hasher with a pure hash.Hash interface.

SectionSize

Informs the caller about the underlying SectionSize of the SectionWriter. In some cases this will be the same as for the chained SectionWriter, in some cases the SectionWriter may buffer and/or pad data, and translate the SectionSize accordingly.

Branches

Informs the caller about the underlying Branches a.k.a. branch-factor, with same rationale as for SectionSize above.

`bmt` implementations

Neither bmt implementation currently provides any chaining, and will raise errors on calls to SetWriter.

`bmt.Hasher`

Can now be used as hash.Hash, where the span is merely calcuated from the amount of bytes written to it. If a different span is needed, the SetSpan method can be used.

Since the SetLength call has no practical utility for bmt.Hasher currently, it is ignored.

Exports are added to make it possible to move AsyncHasher to a separate package. Excess exports will be pruned later.

`bmt.AsyncHasher`

bmt.AsyncHasher is now ready to be moved to a separate package. It`s left in place for this PR to make it easy to see the changes that were made.

WriteIndexed and SumIndexed replace the original Write and Sum calls. It can still be used as a bmt.Hasher (and thus hash.Hash) transparently by using the usual Write and Sum calls.

`storage.SwarmHash`

ResetWithLength in storage.SwarmHash interface has been changed to SetSpanBytes. bmt.Hasher provides this method, which performs the same function as SetSpanalbeit with 8-byte serialized uint instead.

By the way, a bug was unearthed through the reworking of the bmt, in which the hash result for zero-length data was different between RefHasher and bmt.Hasher (but not bmt.AsyncHasher). This has been fixed.

janos

I've reviewed only the technical aspects. I will review functional changes in another round.

param/io.go

param/hash.go

bmt/bmt.go

zelig

Absolutely brilliant.

zelig · 2020-01-10T18:05:17Z

bmt/bmt.go

+			//t.GetSection() = make([]byte, sw.secsize)
+			//copy(t.GetSection(), section)
+			// TODO: Consider whether the section here needs to be copied, maybe we can enforce not change the original slice
+			copySection := make([]byte, sw.secsize)


why the copying not part of SetSection then?

There's no member in tree that remembers the section size, so either we must add a member or we must pass it with the function. The latter seems clumsy.

In general I think it's a good idea to introduce as few side-effects in the lower level components as possible; the tree component could be used without any copying, after all.

bmt/bmt.go

zelig · 2020-01-10T18:12:30Z

bmt/bmt_test.go

@@ -346,11 +362,16 @@ func testHasherCorrectness(bmt *Hasher, hasher BaseHasherFunc, d []byte, n, coun
 	if len(d) < n {
 		n = len(d)
 	}
-	binary.BigEndian.PutUint64(span, uint64(n))
+	binary.LittleEndian.PutUint64(span, uint64(n))


why LittleEndian suddenly?

I can't remember off the top of my head, but at least it's same as in storage/types.go?

bmt/bmt_test.go

zelig · 2020-01-10T18:16:50Z

storage/types.go

@@ -93,7 +93,8 @@ func GenerateRandomChunk(dataSize int64) Chunk {
 	sdata := make([]byte, dataSize+8)
 	rand.Read(sdata[8:])
 	binary.LittleEndian.PutUint64(sdata[:8], uint64(dataSize))
-	hasher.ResetWithLength(sdata[:8])
+	hasher.Reset()


now this is called twice

I'm sorry I don't understand what you mean? You mean since we actually construct the hasher then Reset is redundant?

zelig · 2020-01-10T18:24:37Z

PR is great
fix spurious return as per https://travis-ci.org/ethersphere/swarm/jobs/626578539#L495

the plan also looks great.

Make sure there is a way for erasure coding section writer will have access to the child chunk data in order to generate parity data chunks or is there a better way?

nolash · 2020-02-03T09:22:14Z

@zelig One of the async tests suddenly failed locally before the last commit adc45db . I will have to investigate :/

nolash · 2020-02-07T13:59:59Z

Benchmarks are fine after closer inspection. Thanks to @janos for hint on stabilizing benchmark results.

pradovic

LGTM implementation wise, nice work! Business logic wise, I am not 100% sure as I don't have big experience about this part yet. Left just one minor question.

pradovic · 2020-02-07T19:10:54Z

storage/chunker_test.go

@@ -151,7 +151,8 @@ func TestSha3ForCorrectness(t *testing.T) {
 	rawSha3Output := rawSha3.Sum(nil)

 	sha3FromMakeFunc := MakeHashFunc(SHA3Hash)()


🌷 I know it's not part of this PR, but why not use constructor for Hasher instead of a func? If the func is needed maybe the builder can be extracted instead of this?

@pradovic The SwarmHash needs the length of the data it represents prepended as an 64-bit value. The BMT hash has this builtin, and we extend the other types with HashWithLength to allow setting the length (SetSpanBytes, see storage/swarmhasher.go)

bmt, param: Introduce SectionHasher interface, implement in bmt

f47f9d0

nolash self-assigned this Dec 10, 2019

nolash added enhancement hashing in progress labels Dec 10, 2019

nolash added 2 commits December 11, 2019 10:12

bmt: Cleanup

ee1ad2c

bmt, param: Improve comments

679b811

nolash added ready for review and removed in progress labels Dec 11, 2019

nolash requested review from janos and zelig December 11, 2019 14:38

janos reviewed Dec 11, 2019

View reviewed changes

nolash added 10 commits December 15, 2019 18:24

Move writer interface til file

851cab0

file: Move interface from param to file

9f0f874

bmt, file: Move asynchasher to file/hasher

7a6e8b2

bmt, file: Make AsyncHasher compile

57cf86d

file, bmt: Add naive exports in bmt to provide for asynchasher move

859a48e

file: Add ctx and errFunc to async hasher constructor

8b59531

file: Enable sync hash.Hash usage through async hasher

1ada8d3

bmt: Merge Standalone asynchasher to bmt package for easier diff

0a3ec03

bmt, file: Remove redundant SetLength method from interface

af6f1c9

bmt: Remove redundant return values in WriteIndexed

47b9667

nolash added ready for another review and removed ready for review labels Dec 18, 2019

zelig approved these changes Jan 10, 2020

View reviewed changes

bmt: Add comments, use GetZeroHash in hasher.Sum

adc45db

bmt: Fix races, remove sync hash run in async correctness test

8221d1b

zelig approved these changes Feb 6, 2020

View reviewed changes

bmt: Assign bmthash result in test for profiling w/o optimz

fda2831

pradovic approved these changes Feb 7, 2020

View reviewed changes

nolash merged commit ac0845d into ethersphere:master Feb 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bmt, param: Introduce SectionHasher interface, implement in bmt #2021

bmt, param: Introduce SectionHasher interface, implement in bmt #2021

nolash commented Dec 10, 2019 •

edited

Loading

janos left a comment

zelig left a comment

zelig Jan 10, 2020

nolash Feb 3, 2020

zelig Jan 10, 2020

nolash Feb 3, 2020

zelig Jan 10, 2020

nolash Feb 3, 2020 •

edited

Loading

zelig Feb 6, 2020

zelig commented Jan 10, 2020

nolash commented Feb 3, 2020

nolash commented Feb 7, 2020

pradovic left a comment

pradovic Feb 7, 2020

nolash Feb 7, 2020 •

edited

Loading

		@@ -151,7 +151,8 @@ func TestSha3ForCorrectness(t *testing.T) {
		rawSha3Output := rawSha3.Sum(nil)

		sha3FromMakeFunc := MakeHashFunc(SHA3Hash)()

bmt, param: Introduce SectionHasher interface, implement in bmt #2021

bmt, param: Introduce SectionHasher interface, implement in bmt #2021

Conversation

nolash commented Dec 10, 2019 • edited Loading

Introduce SectionWriter interface and implement this interface in bmt

SectionWriter interface

hash.Hash

SetWriter

SetSpan

SectionSize

Branches

bmt implementations

bmt.Hasher

bmt.AsyncHasher

storage.SwarmHash

janos left a comment

Choose a reason for hiding this comment

zelig left a comment

Choose a reason for hiding this comment

zelig Jan 10, 2020

Choose a reason for hiding this comment

nolash Feb 3, 2020

Choose a reason for hiding this comment

zelig Jan 10, 2020

Choose a reason for hiding this comment

nolash Feb 3, 2020

Choose a reason for hiding this comment

zelig Jan 10, 2020

Choose a reason for hiding this comment

nolash Feb 3, 2020 • edited Loading

Choose a reason for hiding this comment

zelig Feb 6, 2020

Choose a reason for hiding this comment

zelig commented Jan 10, 2020

nolash commented Feb 3, 2020

nolash commented Feb 7, 2020

pradovic left a comment

Choose a reason for hiding this comment

pradovic Feb 7, 2020

Choose a reason for hiding this comment

nolash Feb 7, 2020 • edited Loading

Choose a reason for hiding this comment

nolash commented Dec 10, 2019 •

edited

Loading

Introduce `SectionWriter` interface and implement this interface in `bmt`

`SectionWriter` interface

`bmt` implementations

`bmt.Hasher`

`bmt.AsyncHasher`

`storage.SwarmHash`

nolash Feb 3, 2020 •

edited

Loading

nolash Feb 7, 2020 •

edited

Loading