-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: bytes.Xor #30553
Comments
My use-case is for a webrtc library. I'm currently profiling the application and a sizable amount of time is spent in This line is one of my current bottlenecks, as it generates a slice of size 512 on the heap each time. My plan was to write my own CTR class, potentially abusing the fact that AES has a fixed block size to allocate on the stack. However, I'm not able to go down this route without lifting the fast For now, I'm evaluating a few alternative approaches but all of them involve Go changes. Later in the project, I will have to implement FEC. Forward error correction is performed by XORing multiple packets together (in a specific way) to create parity packets. This is similar to RAID5, which is another example of when a fast |
Also worth mentioning, this is similar to the purpose of the math/bits package. Go developers should not have to resort to assembly implementations for efficiency if the function is common and can be well defined. |
This might be a better fit for golang.org/x/crypto(?). cc @FiloSottile Btw, Filippo, I see that the amd64 version uses SSE2. Any interest in me adding an AVX2 version? |
@josharian Possibly, but I was thinking |
I think the chance of adding this to bytes is vanishingly small. To ask some obvious questions: what about And? Or? Andnot? All three of those (but not Xor) are used in sometimes hot code in the compiler (bv.go). And package bytes is not really a catchall home for things related to byte slices; it’s more a sibling package to strings, which makes it more focused on textual tasks. |
Yeah, that's fair. Adding it somewhere like |
What about adding it to |
For the AES-CTR use case I think changing the Go standard library implementation of the CTR cipher mode would be preferable to adding any new APIs. It shouldn't be too complicated to make the buffer lazily allocated, i.e. allocated only if extra state is needed across |
@mundaym that's what I thought of doing too, but |
@kixelated I think it would be reasonable to put it in an internal package, probably under The implementation of |
Like Josh said, there's basically no chance of this being bytes.Xor. One option would be for the compiler to recognize
but even that seems too specialized. Probably better would be to start with a package /cc @FiloSottile to see if we want to retitle as an x/crypto proposal or else close. |
If it looks good to the compiler folks, I'd rather have it via pattern-matching than make a name-conflicting bits package for a single function. |
@FiloSottile does it need to be constant time (per #30553 (comment))? If so it seems like it belongs in subtle and definitely not in the compiler. (There’s nothing about the Go code that is necessarily constant time—if a compiler could prove that b is all zero, then it a no-op.) Assuming that that is ok, I don’t personally object to the pattern matching. It’s easy enough to implement and maintain, and though it does contribute a little bit to binary bloat, it is nice to have obvious, clear Go code be fast. P.S. Did you want an AVX2 version? |
Hmm, that's a good point. I don't actually see how the compiler would know anything about cipher streams, but this is looking for trouble. Let's start it in golang.org/x/crypto/internal/subtle, and we can move it to crypto/subtle if it works out.
I don't know, does it move the needle on benchmarks of higher level primitives, or lets me remove some assembly elsewhere? I don't really ever need XOR by itself, so if it's not taking significant time already, I'm ok with it as is. |
But also, there’s no reason we can’t remove that CTR bottleneck for the common case, so let’s do that. |
Yeah, I've been messing around with some |
I'll run some quick benchmarks with a specialized AES-CTR class. I also want to compare it against a hypothetical |
Okay, I wrote some benchmarks for more information:
The old code uses
+3-4% speedup is nothing to scoff at. I'll mess around with a few more things, like seeing if we really need to encrypt the counter 512 bytes at a time. |
Oh and here's comparing
|
It wouldn't let us remove assembly, because AVX2 is not available on all amd64 chips that Go supports. It would speed things up, but perhaps not enough to care. I ran this:
and then pprof and got:
Assuming an ~2x speedup from SSE2 to AVX2, it looks like this would be a ~1.4% speed-up. So I'm guessing the answer is no. But now I can stop wondering, and the rest of this conversation can resume. :) |
How about adding a new package "math/bits/slices" that has the functions Xor, And, Or, Andnot, and maybe a few more that operate over byte slices, or why not, also over all sorts of uintX slices, and that may have platform dependent assembly language implementation for them? As @josharian indicated, the go compiler uses such functionality itself, and I can see how these functions would be useful for cryptography but also for image processing. So I'd say that these functions would be of general use. |
This seems like a useful third-party package, but I don't yet see a clear reason to put it into the standard library. Perhaps it could go into the x repo at some point. That package can start with a copy of the existing code in crypto/cipher, which isn't all that much. |
I would totally be fine with copying the fast xor code to I've written a CTR package for my application with a My package requires the fast |
Go’s license is very permissive. I’m surprised to hear that that is an issue. And fwiw, I’d consider copy/pasting such code a pretty reasonable interim fix under the circumstances—huge win, not all that much code, low refactoring risk (just copy/paste), pretty stable source with lots of miles on it. Just my 2c. |
I'm contributing to a project using the MIT license, so I assumed it wouldn't be compatible but you're right. I'll make a pull request and see what they think of a package containing the Go license plus files with the Go copyright. I'll still need |
Much of the time spent is performing `cipher.NewCTR` for each packet. This function allocates ~600 bytes on the heap each time which is a little excessive to encrypt 1000 bytes. I added a `Reset` function that allows cipher reuse. In addition, I sped up the CTR cipher taking advantage of the fixed aes.BlockSize. Unfortunately, forking the Go CTR implementation requires `xorBytes`, which is not exported. There's a Go issue open that will hopefully export the function because it's pretty useful. Until then, there needs to be a lot of copy-pasted assembly code related to fast XOR bytes. golang/go#30553 ``` name old time/op new time/op delta EncryptRTP-8 3.66µs ± 6% 3.36µs ± 6% -8.04% (p=0.001 n=10+10) EncryptRTPInPlace-8 3.38µs ± 8% 3.13µs ± 5% -7.33% (p=0.000 n=10+10) DecryptRTP-8 3.69µs ± 7% 3.37µs ± 8% -8.80% (p=0.000 n=10+10) Write-8 3.80µs ± 9% 3.45µs ± 5% -9.33% (p=0.000 n=10+10) WriteRTP-8 3.72µs ± 7% 3.46µs ± 8% -6.96% (p=0.005 n=10+10) name old speed new speed delta EncryptRTP-8 277MB/s ± 6% 301MB/s ± 6% +8.76% (p=0.001 n=10+10) EncryptRTPInPlace-8 300MB/s ± 7% 323MB/s ± 5% +7.85% (p=0.000 n=10+10) DecryptRTP-8 277MB/s ± 7% 304MB/s ± 7% +9.68% (p=0.000 n=10+10) Write-8 266MB/s ± 8% 294MB/s ± 5% +10.24% (p=0.000 n=10+10) WriteRTP-8 272MB/s ± 7% 293MB/s ± 8% +7.61% (p=0.005 n=10+10) ```
Seems like this is happening outside the |
Where was this merged into after all? |
@Kargakis I'm not sure I understand your question. This proposal was declined. |
Merging in the standard library was declined but folks seemed fine with merging in golang.org/x/crypto.
…-------- Original Message --------
On 5 Nov 2019, 04:06, Ian Lance Taylor wrote:
***@***.***(https://github.com/kargakis) I'm not sure I understand your question. This proposal was declined.
—
You are receiving this because you were mentioned.
Reply to this email directly, [view it on GitHub](#30553?email_source=notifications&email_token=AA5YK75B5KCEFNQKI4E6BPTQSDWOHA5CNFSM4G3NSFI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDBQ3FY#issuecomment-549653911), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/AA5YK76NAJ7TRTVCSS7TAFTQSDWOHANCNFSM4G3NSFIQ).
|
OK, thanks, I don't know what the status of that is. |
Variation of #28113 (rejected proposal)
I would like to export the optimized
xorBytes
implementation incrypto/cipher
. Unlike the above ticket, I think it should be a member of thebytes
package asbytes.Xor
.Applying a XOR over multiple bytes is commonly used in crypto as well as parity generation. It is was relatively trivial to implement yourself but thanks to a recent contribution, the Go version is now very optimized.
Currently,
xorBytes
will panic whendst
is not large enough. It could followcopy
semantics and write up tomin(len(dst), len(a), len(b))
bytes. But it seems relatively common to panic in thecrypto
package when the dst buffer is not large enough, such asXORKeyStream
.The text was updated successfully, but these errors were encountered: