Skip to content

Commit

Permalink
Remove obsolete SIMD code (#57)
Browse files Browse the repository at this point in the history
Probably due to compiler optimizations the Go code is now faster than the SSSE3/AVX/AVX2 code:

```
BenchmarkHash/GEN_/8Bytes-32    	 6486468	       184 ns/op	  43.46 MB/s
BenchmarkHash/GEN_/1K-32        	  545470	      2172 ns/op	 471.36 MB/s
BenchmarkHash/GEN_/8K-32        	   74073	     16106 ns/op	 508.64 MB/s
BenchmarkHash/GEN_/1M-32        	     584	   2034247 ns/op	 515.46 MB/s
BenchmarkHash/GEN_/5M-32        	     100	  10190003 ns/op	 514.51 MB/s
BenchmarkHash/GEN_/10M-32       	      56	  20357139 ns/op	 515.09 MB/s

BenchmarkHash/AVX2/8Bytes-32    	 5263258	       226 ns/op	  35.44 MB/s
BenchmarkHash/AVX2/1K-32        	  444441	      2633 ns/op	 388.98 MB/s
BenchmarkHash/AVX2/8K-32        	   61855	     19513 ns/op	 419.81 MB/s
BenchmarkHash/AVX2/1M-32        	     487	   2462013 ns/op	 425.90 MB/s
BenchmarkHash/AVX2/5M-32        	      91	  12384626 ns/op	 423.34 MB/s
BenchmarkHash/AVX2/10M-32       	      44	  26636364 ns/op	 393.66 MB/s

BenchmarkHash/AVX_/8Bytes-32    	 6349206	       188 ns/op	  42.54 MB/s
BenchmarkHash/AVX_/1K-32        	  461538	      2620 ns/op	 390.91 MB/s
BenchmarkHash/AVX_/8K-32        	   61224	     19567 ns/op	 418.65 MB/s
BenchmarkHash/AVX_/1M-32        	     484	   2473140 ns/op	 423.99 MB/s
BenchmarkHash/AVX_/5M-32        	      99	  12505052 ns/op	 419.26 MB/s
BenchmarkHash/AVX_/10M-32       	      46	  24869557 ns/op	 421.63 MB/s

BenchmarkHash/SSSE/8Bytes-32    	 6282679	       192 ns/op	  41.71 MB/s
BenchmarkHash/SSSE/1K-32        	  461614	      2628 ns/op	 389.69 MB/s
BenchmarkHash/SSSE/8K-32        	   60913	     19651 ns/op	 416.88 MB/s
BenchmarkHash/SSSE/1M-32        	     481	   2488563 ns/op	 421.36 MB/s
BenchmarkHash/SSSE/5M-32        	      91	  12516477 ns/op	 418.88 MB/s
BenchmarkHash/SSSE/10M-32       	      46	  24869561 ns/op	 421.63 MB/s
```
  • Loading branch information
klauspost committed Feb 22, 2021
1 parent f675151 commit 6a57409
Show file tree
Hide file tree
Showing 26 changed files with 90 additions and 2,829 deletions.
7 changes: 5 additions & 2 deletions .github/workflows/go.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ jobs:
strategy:
max-parallel: 4
matrix:
go-version: [1.13.x, 1.12.x]
os: [ubuntu-latest, windows-latest]
go-version: [1.16.x, 1.15.x, 1.14.x]
os: [ubuntu-latest, windows-latest, macos-latest]
steps:
- name: Set up Go ${{ matrix.go-version }}
uses: actions/setup-go@v1
Expand All @@ -30,6 +30,9 @@ jobs:
- name: Build on ${{ matrix.os }}
if: matrix.os == 'windows-latest'
run: go test -race -v ./...
- name: Build on ${{ matrix.os }}
if: matrix.os == 'macos-latest'
run: go test -race -v ./...
- name: Build on ${{ matrix.os }}
if: matrix.os == 'ubuntu-latest'
run: |
Expand Down
22 changes: 13 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,18 @@
# sha256-simd

Accelerate SHA256 computations in pure Go using AVX512, SHA Extensions and AVX2 for Intel and ARM64 for ARM. On AVX512 it provides an up to 8x improvement (over 3 GB/s per core) in comparison to AVX2. SHA Extensions give a performance boost of close to 4x over AVX2.
Accelerate SHA256 computations in pure Go using AVX512, SHA Extensions for x86 and ARM64 for ARM.
On AVX512 it provides an up to 8x improvement (over 3 GB/s per core).
SHA Extensions give a performance boost of close to 4x over native.

## Introduction

This package is designed as a replacement for `crypto/sha256`. For Intel CPUs it has two flavors for AVX512 and AVX2 (AVX/SSE are also supported). For ARM CPUs with the Cryptography Extensions, advantage is taken of the SHA2 instructions resulting in a massive performance improvement.
This package is designed as a replacement for `crypto/sha256`.
For ARM CPUs with the Cryptography Extensions, advantage is taken of the SHA2 instructions resulting in a massive performance improvement.

This package uses Golang assembly. The AVX512 version is based on the Intel's "multi-buffer crypto library for IPSec" whereas the other Intel implementations are described in "Fast SHA-256 Implementations on Intel Architecture Processors" by J. Guilford et al.
This package uses Golang assembly.
The AVX512 version is based on the Intel's "multi-buffer crypto library for IPSec" whereas the other Intel implementations are described in "Fast SHA-256 Implementations on Intel Architecture Processors" by J. Guilford et al.

## New: Support for Intel SHA Extensions
## Support for Intel SHA Extensions

Support for the Intel SHA Extensions has been added by Kristofer Peterson (@svenski123), originally developed for spacemeshos [here](https://github.com/spacemeshos/POET/issues/23). On CPUs that support it (known thus far Intel Celeron J3455 and AMD Ryzen) it gives a significant boost in performance (with thanks to @AudriusButkevicius for reporting the results; full results [here](https://github.com/minio/sha256-simd/pull/37#issuecomment-451607827)).

Expand All @@ -18,7 +22,9 @@ benchmark AVX2 MB/s SHA Ext MB/s speedup
BenchmarkHash5M 514.40 1975.17 3.84x
```

Thanks to Kristofer Peterson, we also added additional performance changes such as optimized padding, endian conversions which sped up all implementations i.e. Intel SHA alone while doubled performance for small sizes, the other changes increased everything roughly 50%.
Thanks to Kristofer Peterson, we also added additional performance changes such as optimized padding,
endian conversions which sped up all implementations i.e. Intel SHA alone while doubled performance for small sizes,
the other changes increased everything roughly 50%.

## Support for AVX512

Expand Down Expand Up @@ -58,7 +64,8 @@ More detailed information can be found in this [blog](https://blog.minio.io/acce

## Drop-In Replacement

The following code snippet shows how you can use `github.com/minio/sha256-simd`. This will automatically select the fastest method for the architecture on which it will be executed.
The following code snippet shows how you can use `github.com/minio/sha256-simd`.
This will automatically select the fastest method for the architecture on which it will be executed.

```go
import "github.com/minio/sha256-simd"
Expand All @@ -80,9 +87,6 @@ Below is the speed in MB/s for a single core (ranked fast to slow) for blocks la
| 3.0 GHz Intel Xeon Platinum 8124M | AVX512 | 3498 |
| 3.7 GHz AMD Ryzen 7 2700X | SHA Ext | 1979 |
| 1.2 GHz ARM Cortex-A53 | ARM64 | 638 |
| 3.0 GHz Intel Xeon Platinum 8124M | AVX2 | 449 |
| 3.1 GHz Intel Core i7 | AVX | 362 |
| 3.1 GHz Intel Core i7 | SSE | 299 |

## asm2plan9s

Expand Down
119 changes: 0 additions & 119 deletions cpuid.go

This file was deleted.

24 changes: 0 additions & 24 deletions cpuid_386.go

This file was deleted.

53 changes: 0 additions & 53 deletions cpuid_386.s

This file was deleted.

24 changes: 0 additions & 24 deletions cpuid_amd64.go

This file was deleted.

53 changes: 0 additions & 53 deletions cpuid_amd64.s

This file was deleted.

32 changes: 0 additions & 32 deletions cpuid_arm.go

This file was deleted.

0 comments on commit 6a57409

Please sign in to comment.