New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make multihash pluggable #88
Conversation
Note: this is a suggestion/WIP which can be re-worked as needed. This adds a map var which maps codecs to hash functions. The hash funcs must have signature which accepts one byte slice, and returns one byte slice. It adds a RegisterHashFunc method which allows registration for any valid codec. It prevents double registration or overwriting of hash funcs already registered. It also creates two funcs which are called by init() to register the funcs which is already the current behaviour of the code: * registerStdlibHashFuncs * registerNonStdlibHashFuncs If desired, the non-stdlib funcs can then easily be removed. Or it can be implemented as-is, and required that future hash funcs use the Registration func. Or refactored in some other way if necessary. I have tested this and it is working with IPLD Dash resolver. Can also add example in the README if this is approved / agreed.
df1e137
to
4300f39
Compare
Rebased onto latest master since #87 was merged. |
Sorry for the delay, we've all been on a synchronous work sprint. This looks great!
We should probably allow them to return errors, even though they don't in practice. Some hash functions may have restrictions on inputs. We should probably also pass in the length (as we do for identity) as there may exist other hash functions that will simply refuse to work when the length is too short. That is: To avoid writing a ton of wrapper functions, I'd just modify the signatures of all the |
Also move inline double sha256 func to own proper func since it is now more complicated with multiple return values.
Also simplify outer switch statement logic
@Stebalien Thanks, I also have been busy. I agree that allowing for error is a good idea. I have changed the required signature as you suggested, and also refactored blake func wrappers so they work with this change, and removed panics from one (they now return errors). Now all funcs are able to be registered in init method. Please see individual commits for more info. |
sum.go
Outdated
if err != nil { | ||
panic(err) | ||
return []byte{}, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably return nil
, not []byte{}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, thank you!
sum.go
Outdated
} | ||
|
||
if _, err := hasher.Write(data); err != nil { | ||
panic(err) | ||
return []byte{}, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
sum.go
Outdated
} | ||
|
||
// RegisterHashFunc adds an entry to the package-level code -> hash func map. | ||
func RegisterHashFunc(code uint64, hashFunc func([]byte, int) ([]byte, error)) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should document the truncation behavior:
The hash function must return at least the requested number of bytes. If it returns more, the hash will be truncated.
sum.go
Outdated
return m, ErrSumNotSupported | ||
} | ||
|
||
var size int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we actually need to use length
here. That is, we need to tell the hash function "we're going to truncate to size X" so it can return an error if that would be nonsensical. Unfortunately, I think we'll either still need to special case blake32, create a new function per size (probably the easiest), or pass in the size as an argument.
Basically, we need to handle the case where we request a 512bit blake2b hash and then truncate it to 256 bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand -- logic is the same here, right? Or is this some new logic you are suggesting for blake?
Currently, nothing is truncated from output of blake2s.Sum256(data)
, or any blake2b from what I see. How is a special case needed? I think I cannot see your case yet. (Or you just prefer the name length
instead of size
for blake funcs?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, the issue is actually with the identity function, not blake. There are two sizes here:
- The blake "size". Really, this isn't a parameter, we just have one code per size.
- The hash length. This is the length we need to pass to the hash function.
With the current code, sumID
will never fail because we pass in size
(len(data)
), not length
. Basically, we need to pass length
so functions know how many bytes of the hash we're going to use.
So, my proposal is to get rid of all special casing in Sum
. Instead, just call funcTable[code](data, length)
.
However, we do somehow have to plumb the size
through to the blake hash functions. Above, I proposed just creating a separate function per variant (using closures). We could also change the hash function signature to func(data []byte, code, length int) ([]byte, error)
and let the blake2 function extract the size from the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think I understand now.
We could also change the hash function signature to func(data []byte, code, length int) ([]byte, error)
This seems ugly to me because this code is only used for blake.
Also, the more I look at it, the more I think this is just a matter of names except for the ID case, where I got the logic wrong.
If I change blake funcs to use closures w/olen, it is still the same thing getting passed in w/just a different name, but these funcs (since before I touched them) never use both "length" and "size" together, only olen
, which is equivalent to "size" for other funcs, with the exception of ID.
If I push a special case for ID
, this should correct that logic, right? Any other case where both length and size are needed in the func would be introducing new logic which does not yet exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I push a special case for ID, this should correct that logic, right?
It corrects the logic but the parameter still ends up meaning different things in different cases. As this is going to be an external API, it really needs to be as consistent as possible. We need to be able to unambiguously define the API.
In hopes it can be useful here is some package I implemented lately: https://github.com/ipfn/go-digest/blob/master/digest/digest.go#L47 It does not deal with hashes other than 32 bytes but I think you could try copying: // SumBytes - Sums hash digest using provided hasher.
func SumBytes(h hash.Hash, data ...[]byte) (digest []byte) {
h.Reset()
for _, body := range data {
h.Write(body)
}
return h.Sum(nil)
} It could remove a lot of boilerplate code. Example usage: // SumSha256 - Sums Sha256 secure hash.
func SumSha256(data ...[]byte) []byte {
return SumBytes(sha256.New(), data...)
} |
Status: Blocked on #88 (comment) |
@samli88 the changes I was looking for were rather trivial so I went ahead and made them. Can I get a review from you this time? |
Yes, thank you. Now I see what you mean by closures for the blake funcs. Thanks for helping w/this. I cannot give an approval through the GitHub UI because I opened the Pull Request, but yes, I reviewed and approve these changes. 👍 |
This adds the ability to register hash funcs independently and not required them in this library.
Note: this is a suggestion/WIP which can be refactored as needed.
This adds a map var which maps codecs to hash functions. The hash funcs must
have signature
which accepts one byte slice, and returns one byte slice:func(data []byte, length int) ([]byte, error)
.It adds a RegisterHashFunc method which allows registration for any valid
codec. It prevents double registration or overwriting of hash funcs already
registered.
It also creates two funcs which are called by init() to register the funcs
which is already the current behaviour of the code:
If desired, the non-stdlib funcs can then easily be removed. Or it can be
implemented as-is, and required that future hash funcs use the Registration
func. Or refactored in some other way if necessary.
I have tested this and it is working with IPLD Dash resolver. Can also add
example in the README if this is approved / agreed.
Note: this is based upon 'cleanup' PR, #87 and needs to wait on that to be merged before this one. I can rebase after merged.
Fixes #78.