Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make multihash pluggable #88

Merged
merged 13 commits into from Dec 10, 2018
Merged

Conversation

samli88
Copy link
Contributor

@samli88 samli88 commented Oct 10, 2018

This adds the ability to register hash funcs independently and not required them in this library.

Note: this is a suggestion/WIP which can be refactored as needed.

This adds a map var which maps codecs to hash functions. The hash funcs must
have signature which accepts one byte slice, and returns one byte slice : func(data []byte, length int) ([]byte, error).

It adds a RegisterHashFunc method which allows registration for any valid
codec. It prevents double registration or overwriting of hash funcs already
registered.

It also creates two funcs which are called by init() to register the funcs
which is already the current behaviour of the code:

  • registerStdlibHashFuncs
  • registerNonStdlibHashFuncs

If desired, the non-stdlib funcs can then easily be removed. Or it can be
implemented as-is, and required that future hash funcs use the Registration
func. Or refactored in some other way if necessary.

I have tested this and it is working with IPLD Dash resolver. Can also add
example in the README if this is approved / agreed.

Note: this is based upon 'cleanup' PR, #87 and needs to wait on that to be merged before this one. I can rebase after merged.

Fixes #78.

Note: this is a suggestion/WIP which can be re-worked as needed.

This adds a map var which maps codecs to hash functions. The hash funcs must
have signature which accepts one byte slice, and returns one byte slice.

It adds a RegisterHashFunc method which allows registration for any valid
codec. It prevents double registration or overwriting of hash funcs already
registered.

It also creates two funcs which are called by init() to register the funcs
which is already the current behaviour of the code:

  * registerStdlibHashFuncs
  * registerNonStdlibHashFuncs

If desired, the non-stdlib funcs can then easily be removed. Or it can be
implemented as-is, and required that future hash funcs use the Registration
func. Or refactored in some other way if necessary.

I have tested this and it is working with IPLD Dash resolver. Can also add
example in the README if this is approved / agreed.
@samli88
Copy link
Contributor Author

samli88 commented Oct 12, 2018

Rebased onto latest master since #87 was merged.

@Stebalien
Copy link
Member

Sorry for the delay, we've all been on a synchronous work sprint.

This looks great!

The hash funcs must have signature which accepts one byte slice, and returns one byte slice.

We should probably allow them to return errors, even though they don't in practice. Some hash functions may have restrictions on inputs. We should probably also pass in the length (as we do for identity) as there may exist other hash functions that will simply refuse to work when the length is too short. That is: func(data []byte, length int) ([]byte, error).

To avoid writing a ton of wrapper functions, I'd just modify the signatures of all the sum* functions.

@Stebalien Stebalien mentioned this pull request Oct 22, 2018
Also move inline double sha256 func to own proper func since it is now more
complicated with multiple return values.
Also simplify outer switch statement logic
@samli88
Copy link
Contributor Author

samli88 commented Nov 7, 2018

@Stebalien Thanks, I also have been busy.

I agree that allowing for error is a good idea. I have changed the required signature as you suggested, and also refactored blake func wrappers so they work with this change, and removed panics from one (they now return errors). Now all funcs are able to be registered in init method.

Please see individual commits for more info.

sum.go Outdated
if err != nil {
panic(err)
return []byte{}, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably return nil, not []byte{}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, thank you!

sum.go Outdated
}

if _, err := hasher.Write(data); err != nil {
panic(err)
return []byte{}, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

sum.go Outdated
}

// RegisterHashFunc adds an entry to the package-level code -> hash func map.
func RegisterHashFunc(code uint64, hashFunc func([]byte, int) ([]byte, error)) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document the truncation behavior:

The hash function must return at least the requested number of bytes. If it returns more, the hash will be truncated.

sum.go Outdated
return m, ErrSumNotSupported
}

var size int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we actually need to use length here. That is, we need to tell the hash function "we're going to truncate to size X" so it can return an error if that would be nonsensical. Unfortunately, I think we'll either still need to special case blake32, create a new function per size (probably the easiest), or pass in the size as an argument.

Basically, we need to handle the case where we request a 512bit blake2b hash and then truncate it to 256 bits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand -- logic is the same here, right? Or is this some new logic you are suggesting for blake?

Currently, nothing is truncated from output of blake2s.Sum256(data), or any blake2b from what I see. How is a special case needed? I think I cannot see your case yet. (Or you just prefer the name length instead of size for blake funcs?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, the issue is actually with the identity function, not blake. There are two sizes here:

  1. The blake "size". Really, this isn't a parameter, we just have one code per size.
  2. The hash length. This is the length we need to pass to the hash function.

With the current code, sumID will never fail because we pass in size (len(data)), not length. Basically, we need to pass length so functions know how many bytes of the hash we're going to use.

So, my proposal is to get rid of all special casing in Sum. Instead, just call funcTable[code](data, length).

However, we do somehow have to plumb the size through to the blake hash functions. Above, I proposed just creating a separate function per variant (using closures). We could also change the hash function signature to func(data []byte, code, length int) ([]byte, error) and let the blake2 function extract the size from the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I understand now.

We could also change the hash function signature to func(data []byte, code, length int) ([]byte, error)

This seems ugly to me because this code is only used for blake.

Also, the more I look at it, the more I think this is just a matter of names except for the ID case, where I got the logic wrong.

If I change blake funcs to use closures w/olen, it is still the same thing getting passed in w/just a different name, but these funcs (since before I touched them) never use both "length" and "size" together, only olen, which is equivalent to "size" for other funcs, with the exception of ID.

If I push a special case for ID, this should correct that logic, right? Any other case where both length and size are needed in the func would be introducing new logic which does not yet exist.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I push a special case for ID, this should correct that logic, right?

It corrects the logic but the parameter still ends up meaning different things in different cases. As this is going to be an external API, it really needs to be as consistent as possible. We need to be able to unambiguously define the API.

@crackcomm
Copy link

crackcomm commented Nov 21, 2018

In hopes it can be useful here is some package I implemented lately: https://github.com/ipfn/go-digest/blob/master/digest/digest.go#L47

It does not deal with hashes other than 32 bytes but I think you could try copying:

// SumBytes - Sums hash digest using provided hasher.
func SumBytes(h hash.Hash, data ...[]byte) (digest []byte) {
	h.Reset()
	for _, body := range data {
		h.Write(body)
	}
	return h.Sum(nil)
}

It could remove a lot of boilerplate code.

Example usage:

// SumSha256 - Sums Sha256 secure hash.
func SumSha256(data ...[]byte) []byte {
	return SumBytes(sha256.New(), data...)
}

@Stebalien
Copy link
Member

Status: Blocked on #88 (comment)

@ghost ghost assigned Stebalien Dec 5, 2018
@ghost ghost added the status/in-progress In progress label Dec 5, 2018
@Stebalien
Copy link
Member

@samli88 the changes I was looking for were rather trivial so I went ahead and made them. Can I get a review from you this time?

@samli88
Copy link
Contributor Author

samli88 commented Dec 9, 2018

Yes, thank you. Now I see what you mean by closures for the blake funcs. Thanks for helping w/this.

I cannot give an approval through the GitHub UI because I opened the Pull Request, but yes, I reviewed and approve these changes. 👍

@Stebalien Stebalien merged commit 1dbee63 into multiformats:master Dec 10, 2018
@ghost ghost removed the status/in-progress In progress label Dec 10, 2018
@samli88 samli88 deleted the multihash-pluggable branch December 11, 2018 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider removing Sum or moving it to a subpackage
3 participants