make multihash pluggable #88

samli88 · 2018-10-10T18:36:18Z

This adds the ability to register hash funcs independently and not required them in this library.

Note: this is a suggestion/WIP which can be refactored as needed.

This adds a map var which maps codecs to hash functions. The hash funcs must
have signature ~~which accepts one byte slice, and returns one byte slice~~ : func(data []byte, length int) ([]byte, error).

It adds a RegisterHashFunc method which allows registration for any valid
codec. It prevents double registration or overwriting of hash funcs already
registered.

It also creates two funcs which are called by init() to register the funcs
which is already the current behaviour of the code:

registerStdlibHashFuncs
registerNonStdlibHashFuncs

If desired, the non-stdlib funcs can then easily be removed. Or it can be
implemented as-is, and required that future hash funcs use the Registration
func. Or refactored in some other way if necessary.

I have tested this and it is working with IPLD Dash resolver. Can also add
example in the README if this is approved / agreed.

Note: this is based upon 'cleanup' PR, #87 and needs to wait on that to be merged before this one. I can rebase after merged.

Fixes #78.

Note: this is a suggestion/WIP which can be re-worked as needed. This adds a map var which maps codecs to hash functions. The hash funcs must have signature which accepts one byte slice, and returns one byte slice. It adds a RegisterHashFunc method which allows registration for any valid codec. It prevents double registration or overwriting of hash funcs already registered. It also creates two funcs which are called by init() to register the funcs which is already the current behaviour of the code: * registerStdlibHashFuncs * registerNonStdlibHashFuncs If desired, the non-stdlib funcs can then easily be removed. Or it can be implemented as-is, and required that future hash funcs use the Registration func. Or refactored in some other way if necessary. I have tested this and it is working with IPLD Dash resolver. Can also add example in the README if this is approved / agreed.

samli88 · 2018-10-12T17:02:29Z

Rebased onto latest master since #87 was merged.

Stebalien · 2018-10-22T23:10:20Z

Sorry for the delay, we've all been on a synchronous work sprint.

This looks great!

The hash funcs must have signature which accepts one byte slice, and returns one byte slice.

We should probably allow them to return errors, even though they don't in practice. Some hash functions may have restrictions on inputs. We should probably also pass in the length (as we do for identity) as there may exist other hash functions that will simply refuse to work when the length is too short. That is: func(data []byte, length int) ([]byte, error).

To avoid writing a ton of wrapper functions, I'd just modify the signatures of all the sum* functions.

Also move inline double sha256 func to own proper func since it is now more complicated with multiple return values.

Also simplify outer switch statement logic

samli88 · 2018-11-07T18:40:54Z

@Stebalien Thanks, I also have been busy.

I agree that allowing for error is a good idea. I have changed the required signature as you suggested, and also refactored blake func wrappers so they work with this change, and removed panics from one (they now return errors). Now all funcs are able to be registered in init method.

Please see individual commits for more info.

Stebalien · 2018-11-07T19:29:11Z

sum.go

 	if err != nil {
-		panic(err)
+		return []byte{}, err


This should probably return nil, not []byte{}.

Agree, thank you!

Stebalien · 2018-11-07T19:29:20Z

sum.go

 	}

 	if _, err := hasher.Write(data); err != nil {
-		panic(err)
+		return []byte{}, err


Stebalien · 2018-11-07T19:31:32Z

sum.go

+}
+
+// RegisterHashFunc adds an entry to the package-level code -> hash func map.
+func RegisterHashFunc(code uint64, hashFunc func([]byte, int) ([]byte, error)) error {


We should document the truncation behavior:

The hash function must return at least the requested number of bytes. If it returns more, the hash will be truncated.

Stebalien · 2018-11-07T19:36:29Z

sum.go

+		return m, ErrSumNotSupported
+	}
+
+	var size int


So, we actually need to use length here. That is, we need to tell the hash function "we're going to truncate to size X" so it can return an error if that would be nonsensical. Unfortunately, I think we'll either still need to special case blake32, create a new function per size (probably the easiest), or pass in the size as an argument.

Basically, we need to handle the case where we request a 512bit blake2b hash and then truncate it to 256 bits.

I do not understand -- logic is the same here, right? Or is this some new logic you are suggesting for blake?

Currently, nothing is truncated from output of blake2s.Sum256(data), or any blake2b from what I see. How is a special case needed? I think I cannot see your case yet. (Or you just prefer the name length instead of size for blake funcs?)

You're right, the issue is actually with the identity function, not blake. There are two sizes here:

The blake "size". Really, this isn't a parameter, we just have one code per size.

The hash length. This is the length we need to pass to the hash function.

With the current code, sumID will never fail because we pass in size (len(data)), not length. Basically, we need to pass length so functions know how many bytes of the hash we're going to use.

So, my proposal is to get rid of all special casing in Sum. Instead, just call funcTable[code](data, length).

However, we do somehow have to plumb the size through to the blake hash functions. Above, I proposed just creating a separate function per variant (using closures). We could also change the hash function signature to func(data []byte, code, length int) ([]byte, error) and let the blake2 function extract the size from the code.

Ah, I think I understand now.

We could also change the hash function signature to func(data []byte, code, length int) ([]byte, error)

This seems ugly to me because this code is only used for blake.

Also, the more I look at it, the more I think this is just a matter of names except for the ID case, where I got the logic wrong.

If I change blake funcs to use closures w/olen, it is still the same thing getting passed in w/just a different name, but these funcs (since before I touched them) never use both "length" and "size" together, only olen, which is equivalent to "size" for other funcs, with the exception of ID.

If I push a special case for ID, this should correct that logic, right? Any other case where both length and size are needed in the func would be introducing new logic which does not yet exist.

If I push a special case for ID, this should correct that logic, right?

It corrects the logic but the parameter still ends up meaning different things in different cases. As this is going to be an external API, it really needs to be as consistent as possible. We need to be able to unambiguously define the API.

crackcomm · 2018-11-21T22:08:43Z

In hopes it can be useful here is some package I implemented lately: https://github.com/ipfn/go-digest/blob/master/digest/digest.go#L47

It does not deal with hashes other than 32 bytes but I think you could try copying:

// SumBytes - Sums hash digest using provided hasher.
func SumBytes(h hash.Hash, data ...[]byte) (digest []byte) {
	h.Reset()
	for _, body := range data {
		h.Write(body)
	}
	return h.Sum(nil)
}

It could remove a lot of boilerplate code.

Example usage:

// SumSha256 - Sums Sha256 secure hash.
func SumSha256(data ...[]byte) []byte {
	return SumBytes(sha256.New(), data...)
}

Stebalien · 2018-12-03T21:34:38Z

Status: Blocked on #88 (comment)

Stebalien · 2018-12-05T23:39:34Z

@samli88 the changes I was looking for were rather trivial so I went ahead and made them. Can I get a review from you this time?

samli88 · 2018-12-09T23:52:02Z

Yes, thank you. Now I see what you mean by closures for the blake funcs. Thanks for helping w/this.

I cannot give an approval through the GitHub UI because I opened the Pull Request, but yes, I reviewed and approve these changes. 👍

samli88 force-pushed the multihash-pluggable branch from df1e137 to 4300f39 Compare October 12, 2018 16:51

Stebalien mentioned this pull request Oct 22, 2018

add x11 hash #85

Merged

samli88 added 5 commits November 7, 2018 15:04

change signature to allow for errors and pass len

f8003f9

Also move inline double sha256 func to own proper func since it is now more complicated with multiple return values.

register sumID and remove internal switch

04d1701

no free codes left

c553320

swap args for sumBlake2b and match sig

39b1343

add sumBlake2s, register blake funcs

19fa961

Also simplify outer switch statement logic

Stebalien requested changes Nov 7, 2018

View reviewed changes

samli88 added 2 commits November 7, 2018 23:58

address some PR comments

d250d42

use length for ID sum func

6243c5d

Stebalien added 4 commits December 5, 2018 15:27

register a new hash function per blake size

d8a63d5

cleanup: follow go coding conventions

46bd4b1

remove dead code

5ae4591

define and document the hash function type

e8a1b76

ghost assigned Stebalien Dec 5, 2018

ghost added the status/in-progress In progress label Dec 5, 2018

ci: update go to 1.11

678d1f6

Stebalien approved these changes Dec 10, 2018

View reviewed changes

Stebalien merged commit 1dbee63 into multiformats:master Dec 10, 2018

ghost removed the status/in-progress In progress label Dec 10, 2018

samli88 deleted the multihash-pluggable branch December 11, 2018 19:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make multihash pluggable #88

make multihash pluggable #88

samli88 commented Oct 10, 2018 •

edited

samli88 commented Oct 12, 2018

Stebalien commented Oct 22, 2018

samli88 commented Nov 7, 2018

Stebalien Nov 7, 2018

samli88 Nov 8, 2018

Stebalien Nov 7, 2018

Stebalien Nov 7, 2018

Stebalien Nov 7, 2018

samli88 Nov 8, 2018

Stebalien Nov 8, 2018

samli88 Nov 18, 2018

Stebalien Nov 20, 2018

crackcomm commented Nov 21, 2018 •

edited

Stebalien commented Dec 3, 2018

Stebalien commented Dec 5, 2018

samli88 commented Dec 9, 2018

make multihash pluggable #88

make multihash pluggable #88

Conversation

samli88 commented Oct 10, 2018 • edited

samli88 commented Oct 12, 2018

Stebalien commented Oct 22, 2018

samli88 commented Nov 7, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crackcomm commented Nov 21, 2018 • edited

Stebalien commented Dec 3, 2018

Stebalien commented Dec 5, 2018

samli88 commented Dec 9, 2018

samli88 commented Oct 10, 2018 •

edited

crackcomm commented Nov 21, 2018 •

edited