Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/cipher: optimize safeXORBytes #35381

Open
josharian opened this issue Nov 5, 2019 · 10 comments
Open

crypto/cipher: optimize safeXORBytes #35381

josharian opened this issue Nov 5, 2019 · 10 comments

Comments

@josharian
Copy link
Contributor

@josharian josharian commented Nov 5, 2019

The discussion at #31586 includes several techniques for optimizing xor of byte slices, including using encoding/binary (with native endianness) and loop unrolling. It'd be nice to apply them to safeXORBytes, which is used on many non-amd64 platforms. An optimized safeXORBytes might even be fast enough that we could delete fastXORBytes.

Somewhat related: #30553

cc @nhooyr

@josharian

This comment has been minimized.

Copy link
Contributor Author

@josharian josharian commented Nov 5, 2019

(Aside: I've wanted encoding/binary.NativeEndian to exist on multiple occasions. Here's another one.)

@nhooyr

This comment has been minimized.

Copy link
Contributor

@nhooyr nhooyr commented Nov 6, 2019

Will give this a shot on the weekend.

@nhooyr

This comment has been minimized.

Copy link
Contributor

@nhooyr nhooyr commented Nov 6, 2019

(Aside: I've wanted encoding/binary.NativeEndian to exist on multiple occasions. Here's another one.)

I agree this would be very useful. Some discussion in the past: https://groups.google.com/forum/#!topic/golang-nuts/3GEzwKfRRQw

@josharian

This comment has been minimized.

Copy link
Contributor Author

@josharian josharian commented Nov 6, 2019

Will give this a shot on the weekend.

Awesome. No rush, in any case—the freeze just started, so nothing will go in for three months.

Thanks for the pointer to previous discussion of native endianness. I’ll file a proposal to discuss.

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Nov 6, 2019

If you implement a generic, fast xor function, it could probably stand to be used in x/crypto/sha3 — the current implementation there is somewhat dubious due to alignment assumptions.

(See CL 203837.)

CC @FiloSottile

@josharian

This comment has been minimized.

Copy link
Contributor Author

@josharian josharian commented Nov 6, 2019

@bcmills good point, thanks. If we put it in a place that both x/crypto and crypto/cipher can reach it, that'd presumably also resolve #30553.

@josharian

This comment has been minimized.

Copy link
Contributor Author

@josharian josharian commented Nov 6, 2019

Proposal at #35398.

nhooyr added a commit to nhooyr/go that referenced this issue Feb 9, 2020
Closes golang#35381
@nhooyr

This comment has been minimized.

Copy link
Contributor

@nhooyr nhooyr commented Feb 9, 2020

Added in nhooyr@24ce00e

Results against xorBytesSSE2 on amd64:

$ /Users/nhooyr/src/golang/go/bin/go test -run=_ -bench=XORBytes
goos: darwin
goarch: amd64
pkg: crypto/cipher
BenchmarkXORBytes/8Bytes-8    	176461338	         6.70 ns/op	1194.11 MB/s
BenchmarkXORBytes/128Bytes-8  	100000000	        11.6 ns/op	11062.06 MB/s
BenchmarkXORBytes/2048Bytes-8 	12895827	        95.2 ns/op	21517.42 MB/s
BenchmarkXORBytes/32768Bytes-8         	  627291	      1596 ns/op	20527.27 MB/s
BenchmarkFastSafeXORBytes/8Bytes-8     	147817434	         7.92 ns/op	1009.53 MB/s
BenchmarkFastSafeXORBytes/128Bytes-8   	66896612	        17.2 ns/op	7438.88 MB/s
BenchmarkFastSafeXORBytes/2048Bytes-8  	 6217075	       185 ns/op	11069.10 MB/s
BenchmarkFastSafeXORBytes/32768Bytes-8 	  397446	      3007 ns/op	10897.15 MB/s
PASS
ok  	crypto/cipher	12.014s

So safe fast is still about 2x as slow. Still 10x faster in my testing than basic simple xor though.

@nhooyr

This comment has been minimized.

Copy link
Contributor

@nhooyr nhooyr commented Feb 9, 2020

nhooyr@68261d7

Added safe xor to the benchmark:

$ /Users/nhooyr/src/golang/go/bin/go test -run=_ -bench=XORBytes
goos: darwin
goarch: amd64
pkg: crypto/cipher
BenchmarkXORBytes/default/8B-8      	180542707	         6.79 ns/op	1178.83 MB/s
BenchmarkXORBytes/safe/8B-8         	135818607	         8.73 ns/op	 916.11 MB/s
BenchmarkXORBytes/fastSafe/8B-8     	139375618	         8.62 ns/op	 927.86 MB/s
BenchmarkXORBytes/default#01/128B-8 	100000000	        10.9 ns/op	11787.06 MB/s
BenchmarkXORBytes/safe#01/128B-8    	13410099	        83.8 ns/op	1527.65 MB/s
BenchmarkXORBytes/fastSafe#01/128B-8         	60092373	        19.7 ns/op	6490.28 MB/s
BenchmarkXORBytes/default#02/2048B-8         	13841852	        83.5 ns/op	24532.14 MB/s
BenchmarkXORBytes/safe#02/2048B-8            	  974487	      1180 ns/op	1735.13 MB/s
BenchmarkXORBytes/fastSafe#02/2048B-8        	 5222248	       220 ns/op	9314.25 MB/s
BenchmarkXORBytes/default#03/32768B-8        	  855936	      1448 ns/op	22634.24 MB/s
BenchmarkXORBytes/safe#03/32768B-8           	   61302	     19043 ns/op	1720.75 MB/s
BenchmarkXORBytes/fastSafe#03/32768B-8       	  332857	      3440 ns/op	9524.70 MB/s
PASS
ok  	crypto/cipher	17.433s

Sometimes nearly 3x as slow as the current default.

@josharian

This comment has been minimized.

Copy link
Contributor Author

@josharian josharian commented Feb 12, 2020

cc @FiloSottile for removing assembly in favor of pure Go

Filippo, is 2x-3x in this case within range of switching to pure Go?

There may also be further optimizations available (either in the pure Go or in the compiler); I haven't checked yet, and may not for a little while.

cc @CAFxX who I believe likes optimizing stuff like this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.