New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/sha256: provide a way to efficiently hash multiple byte slices #21948

Closed
chmike opened this Issue Sep 20, 2017 · 12 comments

Comments

Projects
None yet
8 participants
@chmike

chmike commented Sep 20, 2017

Using go1.9.

I'm trying to implement an optimized version of HMAC-SHA-256 digest computation that avoids allocation.
Current code is available in [this github repository]](https://github.com/chmike/hmacsha256)

The function signature I provide is the following:

func Digest(buf []byte, key []byte, data ...[]byte) []byte

The MAC is computed over the list of data slices given as argument.

Unfortunately, I'm not able to use the hardware implementation of SHA, so I'm stuck.
"internal/cpu" is not accessible.

I could use the sha256.Sum256(data []byte) [Size]byte function, but it accept only one data slice.

With a small backward compatible change to this function signature, it would be possible to compute an SHA over multiple slices without needing an allocation with sha256.New.

hmac requires to compute a hash over at least two data blocks.

If there is another solution to use the hardware or assembly optimized sha256 block function, I'm a buyer.

PS: as explained in the end of the readme of the github repository, my goal is to use this for secure cookie which has to be verified for nearly every request. Avoiding these basically useless allocations, could reduce the pressure on the GC.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Sep 20, 2017

Unfortunately we can not make that change, as it would violate the Go 1 compatibility guarantee (while existing calls would still compile, other uses of the function might not).

The sha256.Sum256 function is quite small and simple. I recommend just making a copy of it that is modified to serve your needs.

Closing.

@chmike

This comment has been minimized.

chmike commented Sep 20, 2017

Unfortunately, I can't adapt the function Sum256 because it uses private functions and struct. I would have done it if it was so simple.

Could you explain me what it would break by changing the data argument to a vararg ? I honnestly don't know.

Edit: oh I see. By changing the function signature, the type of a function variable would change.
This is really unfortunate and disapointing.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Sep 20, 2017

Oh, sorry, I failed to notice that digest is private.

I will reopen. We can't do what you suggest, but perhaps we can do something.

@ianlancetaylor ianlancetaylor changed the title from Request: change Sum256(data []byte) to Sum256(data ...[]byte) in crypto/sha256 package to crypto/sha256: provide a way to efficiently hash multiple byte slices Sep 20, 2017

@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Sep 20, 2017

@dsnet

This comment has been minimized.

Member

dsnet commented Sep 20, 2017

Given this snippet:

var b1, b2, b3 []byte
var h [sha256.Size]byte

d := sha256.New()
d.Write(b1)
d.Write(b2)
d.Write(b3)
d.Sum(h[:0])

Even though New returns an hash.Hash interface with a *digest inside, it seems to me that the compiler should be able to figure out the following:

  • The concrete type of d is really just *digest instead of hash.Hash
  • The call to Sum should not cause h to escape. The input to digest.Sum does escape to the output, but in this case, it is entirely ignored.

@dsnet dsnet added the Performance label Sep 20, 2017

@chmike

This comment has been minimized.

chmike commented Sep 20, 2017

Thank you for reopening the issue.

Another and simpler solution would be to add a new function accepting a varargs.

Sums256(data ...[]byte) [Size]byte

This would be perfectly backward compatible, and it would be easy to understand what it does.

@dsnet

This comment has been minimized.

Member

dsnet commented Sep 20, 2017

As @ianlancetaylor already pointed out, this is not backwards compatible as it changes the signature of the function.

@cespare

This comment has been minimized.

Contributor

cespare commented Sep 20, 2017

@dsnet @chmike's last suggestion was a new function (Sums256, not Sum256).

I agree that normal hash.Hash usage should be optimized by the compiler here rather than adding new API.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Sep 20, 2017

Dup of #19361 and/or #21070 ?

@chmike

This comment has been minimized.

chmike commented Sep 21, 2017

@bsadfitz #21070 is about Sum but it address a different issue which I agree with. AppendSum would make clear that it uses the append pattern. It took me some time to figure it out. For #19361, I don't see the link with this issue.

A compiler optimization would allow to fix the problem for all hashes in one hit. But I have no idea of how much effort it would require. After thinking about it, I agree that adding functions would a false good idea.

PS: For the secure cookie, I think I can get around the problem by concatenating slices to hash. This should be possible by arranging the data differently with a minimum of data copy. So for my point of view, the possibility to hash multiple slices by avoiding a heap allocation is now "good to have" instead of "must have". Beside, hashing one block is faster than hashing multiple block fragment. So I hope the concatenation cost will be compensated by that benefit.

@aead

This comment has been minimized.

Contributor

aead commented Sep 23, 2017

@chmike
As far as I can see you should be able to use a sha256 instance within your HMAC struct:

type hmac struct {
    ipad, opad [sha256.Size]byte
    digest hash.Hash
}

and use Hash.Reset. This should avoid the allocations as far as I can see...

Nevertheless Go can do the compiler optimization but for this specific use case I don't think that it's required.

@chmike

This comment has been minimized.

chmike commented Sep 24, 2017

Thanks for the suggestion. This is what I did. Code is available at github.com/chmike/securecookie.
I also rewrote the ctr ciphering computation to avoid the ctr allocation for the same reason. The aes block cipher is allocated once.

My secure cookie get and set need only 3 allocations and they are all outside of the value encoding and decoding. As a comparison, the gorilla secure cookie get/set require respectively 37 and 39 allocations. While there is a factor 10 difference in number of allocations, the speed difference is only around 3.

Of course the gorilla secure cookie code is much more readable because it use the New functions and don't use arrays to avoid heap allocations.

@rsc

This comment has been minimized.

Contributor

rsc commented Oct 23, 2017

The problem here is that sha256.New returns a hash.Hash. If it returned a *digest then the compiler would have an easier time inlining the whole thing and then seeing that nothing escapes and removing all the allocations. There was a recent CL that does the first step of devirtualizing this kind of thing. I think we should probably let the compiler optimizations continue to the point where the obvious code just doesn't allocate. That's better than adding new API that won't be (hopefully) necessary in a few releases. It's fine of course if you want to fork sha256 for your use.

@rsc rsc closed this Oct 23, 2017

i10r-mirror-bot pushed a commit to chain/txvm that referenced this issue Apr 25, 2018

crypto/ed25519/chainkd: optimize allocations
Counterintuitively, it is much more memory efficient to build up a byte
slice of data to hash and call sha512.Sum512 on it than to incrementally
Write the data to the hasher.

This is a limitation of the Go compiler's escape analysis. Since
sha512.New returns a hash.Hash interface, the Go compiler is unable to
reason about whether byte slices written to hash.Hash.Write escape, so
it forces each of these values to be heap-allocated.

This change uses sha512.Sum512, which uses the unexported, concrete
*sha512.digest type. The Go compiler is able to infer that the *digest
does not escape Sum512. The data slice provided to Sum512 requires a
single allocation if we compute its size ahead of time.

See golang/go#21948.

This change was motivated by seeing that chainkd.XPub.Child was in the
top 10 functions for number of allocations in a benchmark.

```
name                    old time/op    new time/op    delta
XPrvChildNonHardened-4    73.7µs ± 5%    71.4µs ± 4%     ~     (p=0.310
n=5+5)
XPrvChildHardened-4        688ns ±10%     558ns ±10%  -18.93%  (p=0.002
n=6+6)
XPubChild-4               85.5µs ± 4%    90.6µs ±14%     ~     (p=0.247
n=5+6)
XPrvSign-4                 143µs ± 7%     147µs ± 7%     ~     (p=0.485
n=6+6)
XPubVerify-4               186µs ± 3%     198µs ± 8%     ~     (p=0.065
n=6+6)

name                    old alloc/op   new alloc/op   delta
XPrvChildNonHardened-4      400B ± 0%       80B ± 0%  -80.00%  (p=0.002
n=6+6)
XPrvChildHardened-4         368B ± 0%       80B ± 0%  -78.26%  (p=0.002
n=6+6)
XPubChild-4                 848B ± 0%      560B ± 0%  -33.96%  (p=0.002
n=6+6)
XPrvSign-4                  833B ± 0%      624B ± 0%  -25.09%  (p=0.002
n=6+6)
XPubVerify-4                352B ± 0%      352B ± 0%     ~     (all
equal)

name                    old allocs/op  new allocs/op  delta
XPrvChildNonHardened-4      6.00 ± 0%      1.00 ± 0%  -83.33%  (p=0.002
n=6+6)
XPrvChildHardened-4         5.00 ± 0%      1.00 ± 0%  -80.00%  (p=0.002
n=6+6)
XPubChild-4                 8.00 ± 0%      4.00 ± 0%  -50.00%  (p=0.002
n=6+6)
XPrvSign-4                  10.0 ± 0%       8.0 ± 0%  -20.00%  (p=0.002
n=6+6)
XPubVerify-4                3.00 ± 0%      3.00 ± 0%     ~     (all
equal)
```

Closes #3472

Author: Jackson Owens <jackson_owens@alumni.brown.edu>
Date: Wed Apr 25 10:40:51 2018 -0700
upstream:6f31447593b5a975e92f8c1981e6ecb404479b57

@golang golang locked and limited conversation to collaborators Oct 23, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.