Remove obsolete SIMD code (#57)

Probably due to compiler optimizations the Go code is now faster than the SSSE3/AVX/AVX2 code: ``` BenchmarkHash/GEN_/8Bytes-32 6486468 184 ns/op 43.46 MB/s BenchmarkHash/GEN_/1K-32 545470 2172 ns/op 471.36 MB/s BenchmarkHash/GEN_/8K-32 74073 16106 ns/op 508.64 MB/s BenchmarkHash/GEN_/1M-32 584 2034247 ns/op 515.46 MB/s BenchmarkHash/GEN_/5M-32 100 10190003 ns/op 514.51 MB/s BenchmarkHash/GEN_/10M-32 56 20357139 ns/op 515.09 MB/s BenchmarkHash/AVX2/8Bytes-32 5263258 226 ns/op 35.44 MB/s BenchmarkHash/AVX2/1K-32 444441 2633 ns/op 388.98 MB/s BenchmarkHash/AVX2/8K-32 61855 19513 ns/op 419.81 MB/s BenchmarkHash/AVX2/1M-32 487 2462013 ns/op 425.90 MB/s BenchmarkHash/AVX2/5M-32 91 12384626 ns/op 423.34 MB/s BenchmarkHash/AVX2/10M-32 44 26636364 ns/op 393.66 MB/s BenchmarkHash/AVX_/8Bytes-32 6349206 188 ns/op 42.54 MB/s BenchmarkHash/AVX_/1K-32 461538 2620 ns/op 390.91 MB/s BenchmarkHash/AVX_/8K-32 61224 19567 ns/op 418.65 MB/s BenchmarkHash/AVX_/1M-32 484 2473140 ns/op 423.99 MB/s BenchmarkHash/AVX_/5M-32 99 12505052 ns/op 419.26 MB/s BenchmarkHash/AVX_/10M-32 46 24869557 ns/op 421.63 MB/s BenchmarkHash/SSSE/8Bytes-32 6282679 192 ns/op 41.71 MB/s BenchmarkHash/SSSE/1K-32 461614 2628 ns/op 389.69 MB/s BenchmarkHash/SSSE/8K-32 60913 19651 ns/op 416.88 MB/s BenchmarkHash/SSSE/1M-32 481 2488563 ns/op 421.36 MB/s BenchmarkHash/SSSE/5M-32 91 12516477 ns/op 418.88 MB/s BenchmarkHash/SSSE/10M-32 46 24869561 ns/op 421.63 MB/s ```
minio · Feb 22, 2021 · 6a57409 · 6a57409
1 parent f675151
commit 6a57409
Show file tree

Hide file tree

Showing 26 changed files with 90 additions and 2,829 deletions.
diff --git a/.github/workflows/go.yml b/.github/workflows/go.yml
@@ -15,8 +15,8 @@ jobs:
     strategy:
       max-parallel: 4
       matrix:
-        go-version: [1.13.x, 1.12.x]
-        os: [ubuntu-latest, windows-latest]
+        go-version: [1.16.x, 1.15.x, 1.14.x]
+        os: [ubuntu-latest, windows-latest, macos-latest]
     steps:
     - name: Set up Go ${{ matrix.go-version }}
       uses: actions/setup-go@v1
@@ -30,6 +30,9 @@ jobs:
     - name: Build on ${{ matrix.os }}
       if: matrix.os == 'windows-latest'
       run: go test -race -v ./...
+    - name: Build on ${{ matrix.os }}
+      if: matrix.os == 'macos-latest'
+      run: go test -race -v ./...
     - name: Build on ${{ matrix.os }}      
       if: matrix.os == 'ubuntu-latest'
       run: |

diff --git a/README.md b/README.md
@@ -1,14 +1,18 @@
 # sha256-simd
 
-Accelerate SHA256 computations in pure Go using AVX512, SHA Extensions and AVX2 for Intel and ARM64 for ARM. On AVX512 it provides an up to 8x improvement (over 3 GB/s per core) in comparison to AVX2. SHA Extensions give a performance boost of close to 4x over AVX2.
+Accelerate SHA256 computations in pure Go using AVX512, SHA Extensions for x86 and ARM64 for ARM. 
+On AVX512 it provides an up to 8x improvement (over 3 GB/s per core).
+SHA Extensions give a performance boost of close to 4x over native.
 
 ## Introduction
 
-This package is designed as a replacement for `crypto/sha256`. For Intel CPUs it has two flavors for AVX512 and AVX2 (AVX/SSE are also supported). For ARM CPUs with the Cryptography Extensions, advantage is taken of the SHA2 instructions resulting in a massive performance improvement.
+This package is designed as a replacement for `crypto/sha256`. 
+For ARM CPUs with the Cryptography Extensions, advantage is taken of the SHA2 instructions resulting in a massive performance improvement.
 
-This package uses Golang assembly. The AVX512 version is based on the Intel's "multi-buffer crypto library for IPSec" whereas the other Intel implementations are described in "Fast SHA-256 Implementations on Intel Architecture Processors" by J. Guilford et al.
+This package uses Golang assembly. 
+The AVX512 version is based on the Intel's "multi-buffer crypto library for IPSec" whereas the other Intel implementations are described in "Fast SHA-256 Implementations on Intel Architecture Processors" by J. Guilford et al.
 
-## New: Support for Intel SHA Extensions
+## Support for Intel SHA Extensions
 
 Support for the Intel SHA Extensions has been added by Kristofer Peterson (@svenski123), originally developed for spacemeshos [here](https://github.com/spacemeshos/POET/issues/23). On CPUs that support it (known thus far Intel Celeron J3455 and AMD Ryzen) it gives a significant boost in performance (with thanks to @AudriusButkevicius for reporting the results; full results [here](https://github.com/minio/sha256-simd/pull/37#issuecomment-451607827)).
 
@@ -18,7 +22,9 @@ benchmark           AVX2 MB/s    SHA Ext MB/s  speedup
 BenchmarkHash5M     514.40       1975.17       3.84x
 ```
 
-Thanks to Kristofer Peterson, we also added additional performance changes such as optimized padding, endian conversions which sped up all implementations i.e. Intel SHA alone while doubled performance for small sizes, the other changes increased everything roughly 50%.
+Thanks to Kristofer Peterson, we also added additional performance changes such as optimized padding,
+endian conversions which sped up all implementations i.e. Intel SHA alone while doubled performance for small sizes,
+the other changes increased everything roughly 50%.
 
 ## Support for AVX512
 
@@ -58,7 +64,8 @@ More detailed information can be found in this [blog](https://blog.minio.io/acce
 
 ## Drop-In Replacement
 
-The following code snippet shows how you can use `github.com/minio/sha256-simd`. This will automatically select the fastest method for the architecture on which it will be executed.
+The following code snippet shows how you can use `github.com/minio/sha256-simd`. 
+This will automatically select the fastest method for the architecture on which it will be executed.
 
 ```go
 import "github.com/minio/sha256-simd"
@@ -80,9 +87,6 @@ Below is the speed in MB/s for a single core (ranked fast to slow) for blocks la
 | 3.0 GHz Intel Xeon Platinum 8124M | AVX512  |         3498 |
 | 3.7 GHz AMD Ryzen 7 2700X         | SHA Ext |         1979 |
 | 1.2 GHz ARM Cortex-A53            | ARM64   |          638 |
-| 3.0 GHz Intel Xeon Platinum 8124M | AVX2    |          449 |
-| 3.1 GHz Intel Core i7             | AVX     |          362 |
-| 3.1 GHz Intel Core i7             | SSE     |          299 |
 
 ## asm2plan9s
 

diff --git a/cpuid.go b/cpuid.go
diff --git a/cpuid_386.go b/cpuid_386.go
diff --git a/cpuid_386.s b/cpuid_386.s
diff --git a/cpuid_amd64.go b/cpuid_amd64.go
diff --git a/cpuid_amd64.s b/cpuid_amd64.s
diff --git a/cpuid_arm.go b/cpuid_arm.go