Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic chore cleanups of tests and benchmarks #66

Merged
merged 4 commits into from
Jan 30, 2023
Merged

Basic chore cleanups of tests and benchmarks #66

merged 4 commits into from
Jan 30, 2023

Conversation

ribasushi
Copy link
Contributor

No functional changes whatsoever, only renaming of some internal functions and shoring up the benchmark executor.

Interesting results on go 1.19 /cc @klauspost

AWS Graviton Neoverse-N1
BenchmarkHash/Generic/8Bytes-48         	 2081868	       576.1 ns/op	  13.89 MB/s
BenchmarkHash/Generic/64Bytes-48        	 1000000	      1123 ns/op	  56.98 MB/s
BenchmarkHash/Generic/1K-48             	  132132	      9094 ns/op	 112.60 MB/s
BenchmarkHash/Generic/8K-48             	   17422	     68553 ns/op	 119.50 MB/s
BenchmarkHash/Generic/1M-48             	     136	   8727612 ns/op	 120.14 MB/s
BenchmarkHash/Generic/5M-48             	      26	  43630193 ns/op	 120.17 MB/s
BenchmarkHash/Generic/10M-48            	      13	  87234265 ns/op	 120.20 MB/s
BenchmarkHash/ArmSha2/8Bytes-48         	12504379	        95.34 ns/op	  83.91 MB/s
BenchmarkHash/ArmSha2/64Bytes-48        	 7602079	       157.2 ns/op	 407.24 MB/s
BenchmarkHash/ArmSha2/1K-48             	 1556676	       770.0 ns/op	1329.81 MB/s
BenchmarkHash/ArmSha2/8K-48             	  224259	      5350 ns/op	1531.33 MB/s
BenchmarkHash/ArmSha2/1M-48             	    1788	    670851 ns/op	1563.05 MB/s
BenchmarkHash/ArmSha2/5M-48             	     350	   3358073 ns/op	1561.28 MB/s
BenchmarkHash/ArmSha2/10M-48            	     177	   6707201 ns/op	1563.36 MB/s
BenchmarkHash/GoStdlib/8Bytes-48        	11228222	       106.3 ns/op	  75.29 MB/s
BenchmarkHash/GoStdlib/64Bytes-48       	 7964581	       150.9 ns/op	 424.12 MB/s
BenchmarkHash/GoStdlib/1K-48            	 1566817	       765.9 ns/op	1336.90 MB/s
BenchmarkHash/GoStdlib/8K-48            	  224491	      5345 ns/op	1532.70 MB/s
BenchmarkHash/GoStdlib/1M-48            	    1788	    670828 ns/op	1563.11 MB/s
BenchmarkHash/GoStdlib/5M-48            	     356	   3354074 ns/op	1563.14 MB/s
BenchmarkHash/GoStdlib/10M-48           	     177	   6710705 ns/op	1562.54 MB/s
Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
BenchmarkHash/Generic/8Bytes-48			 1000000	      1383 ns/op	   5.78 MB/s
BenchmarkHash/Generic/64Bytes-48         	  363115	      2934 ns/op	  21.81 MB/s
BenchmarkHash/Generic/1K-48              	   51576	     23917 ns/op	  42.81 MB/s
BenchmarkHash/Generic/8K-48              	    5802	    184658 ns/op	  44.36 MB/s
BenchmarkHash/Generic/1M-48              	      44	  24440770 ns/op	  42.90 MB/s
BenchmarkHash/Generic/5M-48              	      12	 102287809 ns/op	  51.26 MB/s
BenchmarkHash/Generic/10M-48             	       7	 151643552 ns/op	  69.15 MB/s
BenchmarkHash/GoStdlib/8Bytes-48         	 1775823	       684.9 ns/op	  11.68 MB/s
BenchmarkHash/GoStdlib/64Bytes-48        	  906128	      1203 ns/op	  53.21 MB/s
BenchmarkHash/GoStdlib/1K-48             	  133762	      8618 ns/op	 118.82 MB/s
BenchmarkHash/GoStdlib/8K-48             	   17229	     63527 ns/op	 128.95 MB/s
BenchmarkHash/GoStdlib/1M-48             	     136	   9105973 ns/op	 115.15 MB/s
BenchmarkHash/GoStdlib/5M-48             	      48	  25859536 ns/op	 202.74 MB/s
BenchmarkHash/GoStdlib/10M-48            	      26	  48503560 ns/op	 216.19 MB/s
BenchmarkAvx512_05M-48                   	      13	 111209048 ns/op	  75.43 MB/s
BenchmarkAvx512_1M-48                    	       6	 220232186 ns/op	  76.18 MB/s
BenchmarkAvx512_5M-48                    	       2	 922569076 ns/op	  90.93 MB/s
BenchmarkAvx512_10M-48                   	       1	1323279994 ns/op	 126.79 MB/s
BenchmarkAvx512_5M_2Cores-48             	       2	 549461240 ns/op	 305.34 MB/s
BenchmarkAvx512_5M_4Cores-48             	       2	 645792065 ns/op	 519.59 MB/s
BenchmarkAvx512_5M_6Cores-48             	       1	1132375978 ns/op	 444.48 MB/s
AMD Ryzen 7 3700X 8-Core Processor
BenchmarkHash/Generic/8Bytes-16 	      	 3536659	       341.3 ns/op	  23.44 MB/s
BenchmarkHash/Generic/64Bytes-16         	 1775308	       660.1 ns/op	  96.95 MB/s
BenchmarkHash/Generic/1K-16              	  219138	      5403 ns/op	 189.52 MB/s
BenchmarkHash/Generic/8K-16              	   28696	     40512 ns/op	 202.21 MB/s
BenchmarkHash/Generic/1M-16              	     216	   5214497 ns/op	 201.09 MB/s
BenchmarkHash/Generic/5M-16              	      44	  25382559 ns/op	 206.55 MB/s
BenchmarkHash/Generic/10M-16             	      22	  51851523 ns/op	 202.23 MB/s
BenchmarkHash/IntelSHA/8Bytes-16         	19744753	        59.74 ns/op	 133.91 MB/s
BenchmarkHash/IntelSHA/64Bytes-16        	12684049	        93.82 ns/op	 682.17 MB/s
BenchmarkHash/IntelSHA/1K-16             	 2163363	       531.0 ns/op	1928.27 MB/s
BenchmarkHash/IntelSHA/8K-16             	  300894	      3932 ns/op	2083.20 MB/s
BenchmarkHash/IntelSHA/1M-16             	    2388	    495169 ns/op	2117.61 MB/s
BenchmarkHash/IntelSHA/5M-16             	     436	   2486044 ns/op	2108.92 MB/s
BenchmarkHash/IntelSHA/10M-16            	     235	   4984128 ns/op	2103.83 MB/s
BenchmarkHash/GoStdlib/8Bytes-16         	 7553770	       166.8 ns/op	  47.96 MB/s
BenchmarkHash/GoStdlib/64Bytes-16        	 3785841	       292.5 ns/op	 218.83 MB/s
BenchmarkHash/GoStdlib/1K-16             	  545599	      2128 ns/op	 481.19 MB/s
BenchmarkHash/GoStdlib/8K-16             	   77860	     15812 ns/op	 518.09 MB/s
BenchmarkHash/GoStdlib/1M-16             	     534	   2026093 ns/op	 517.54 MB/s
BenchmarkHash/GoStdlib/5M-16             	     122	  10069437 ns/op	 520.67 MB/s
BenchmarkHash/GoStdlib/10M-16            	      57	  20029738 ns/op	 523.51 MB/s
Macbook 2015 Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
BenchmarkHash/Generic/8Bytes-8         	 2214006	       541.3 ns/op	  14.78 MB/s
BenchmarkHash/Generic/64Bytes-8        	 1000000	      1058 ns/op	  60.47 MB/s
BenchmarkHash/Generic/1K-8             	  137259	      8761 ns/op	 116.88 MB/s
BenchmarkHash/Generic/8K-8             	   17540	     65134 ns/op	 125.77 MB/s
BenchmarkHash/Generic/1M-8             	     144	   8301478 ns/op	 126.31 MB/s
BenchmarkHash/Generic/5M-8             	      27	  41688594 ns/op	 125.76 MB/s
BenchmarkHash/Generic/10M-8            	      13	  83150422 ns/op	 126.11 MB/s
BenchmarkHash/GoStdlib/8Bytes-8        	 4718106	       254.9 ns/op	  31.39 MB/s
BenchmarkHash/GoStdlib/64Bytes-8       	 2546954	       466.6 ns/op	 137.16 MB/s
BenchmarkHash/GoStdlib/1K-8            	  374762	      3170 ns/op	 322.98 MB/s
BenchmarkHash/GoStdlib/8K-8            	   50767	     23429 ns/op	 349.66 MB/s
BenchmarkHash/GoStdlib/1M-8            	     402	   2969511 ns/op	 353.11 MB/s
BenchmarkHash/GoStdlib/5M-8            	      79	  15015291 ns/op	 349.17 MB/s
BenchmarkHash/GoStdlib/10M-8           	      39	  30022449 ns/op	 349.26 MB/s
Macbook M1 Pro 2022
BenchmarkHash/Generic/8Bytes-10         	 3660108	       310.0 ns/op	  25.80 MB/s
BenchmarkHash/Generic/64Bytes-10        	 1974002	       607.5 ns/op	 105.35 MB/s
BenchmarkHash/Generic/1K-10             	  238525	      5026 ns/op	 203.75 MB/s
BenchmarkHash/Generic/8K-10             	   31683	     38009 ns/op	 215.53 MB/s
BenchmarkHash/Generic/1M-10             	     247	   4840271 ns/op	 216.64 MB/s
BenchmarkHash/Generic/5M-10             	      48	  24228139 ns/op	 216.40 MB/s
BenchmarkHash/Generic/10M-10            	      24	  48486415 ns/op	 216.26 MB/s
BenchmarkHash/ArmSha2/8Bytes-10         	23491410	        51.09 ns/op	 156.59 MB/s
BenchmarkHash/ArmSha2/64Bytes-10        	15159856	        78.47 ns/op	 815.56 MB/s
BenchmarkHash/ArmSha2/1K-10             	 2506291	       478.2 ns/op	2141.24 MB/s
BenchmarkHash/ArmSha2/8K-10             	  342472	      3463 ns/op	2365.80 MB/s
BenchmarkHash/ArmSha2/1M-10             	    2692	    436449 ns/op	2402.52 MB/s
BenchmarkHash/ArmSha2/5M-10             	     546	   2182072 ns/op	2402.71 MB/s
BenchmarkHash/ArmSha2/10M-10            	     272	   4379574 ns/op	2394.24 MB/s
BenchmarkHash/GoStdlib/8Bytes-10        	22089243	        54.34 ns/op	 147.22 MB/s
BenchmarkHash/GoStdlib/64Bytes-10       	16055944	        74.21 ns/op	 862.44 MB/s
BenchmarkHash/GoStdlib/1K-10            	 2557021	       472.0 ns/op	2169.35 MB/s
BenchmarkHash/GoStdlib/8K-10            	  348588	      3432 ns/op	2386.66 MB/s
BenchmarkHash/GoStdlib/1M-10            	    2752	    435490 ns/op	2407.81 MB/s
BenchmarkHash/GoStdlib/5M-10            	     553	   2162255 ns/op	2424.73 MB/s
BenchmarkHash/GoStdlib/10M-10           	     276	   4331759 ns/op	2420.67 MB/s

@ribasushi
Copy link
Contributor Author

After the latest renames I am also noticing that the arm go-preamble does a lot of assignment work:
https://github.com/minio/sha256-simd/blob/9235fbaea/sha256block_arm64.go#L26-L37

compared to the intel one pushing everything into asm-land:
https://github.com/minio/sha256-simd/blob/9235fbaea/sha256block_amd64.go#L26-L31

Sadly I do not know enough assembly yet to properly adjust the preamble of https://github.com/minio/sha256-simd/blob/9235fbaea/sha256block_arm64.s#L28-L36, but I am pretty sure this will speed up things even more.

@klauspost
Copy link
Contributor

klauspost commented Jan 30, 2023

@ribasushi Thanks. The preamble work should be pretty harmless since it doesn't escape. I had to double check the logic for the Xeon case, but it seems that the library correctly falls back to using the stdlib.

I guess now we could remove the arm64 version as well.

Copy link
Contributor

@klauspost klauspost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ribasushi
Copy link
Contributor Author

I guess now we could remove the arm64 version as well.

Yeah. Although if you look closely the ARM version on sub-block messages is noticeably faster than the stdlib one. I didn't manage to find the source of the difference

Graviton

BenchmarkHash/ArmSha2/8Bytes-48         	12504379	        95.34 ns/op	  83.91 MB/s
BenchmarkHash/GoStdlib/8Bytes-48        	11228222	       106.3 ns/op	  75.29 MB/s

M1

BenchmarkHash/ArmSha2/8Bytes-10         	23491410	        51.09 ns/op	 156.59 MB/s
BenchmarkHash/GoStdlib/8Bytes-10        	22089243	        54.34 ns/op	 147.22 MB/s

@klauspost klauspost merged commit d9c3aea into minio:master Jan 30, 2023
@fwessels
Copy link
Contributor

Sadly I do not know enough assembly yet to properly adjust the preamble of https://github.com/minio/sha256-simd/blob/9235fbaea/sha256block_arm64.s#L28-L36, but I am pretty sure this will speed up things even more.

Virtually all the time is spend in the (main) hashing loop, so optimizing calling into the assembly will have a negligible performance effect (even for very short messages that are hashed, but those are super fast regardless).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants