Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto: crashes on certain architecture due to no detection for BMI instruction set extensions #18512

Closed
LionNatsu opened this issue Jan 4, 2017 · 10 comments
Milestone

Comments

@LionNatsu
Copy link
Contributor

@LionNatsu LionNatsu commented Jan 4, 2017

I am using Linux. CPU architecture is AMD64, it's i3-4000M exactly. I wrote a long string literal. The compiler crashed magically: SIGILL: illegal instruction

I found /src/cmd/compile/internal/gc/obj.go#L289 uses sha256 to process huge strings (if len(s) > 100), and gdb showed me an rorx instruction in src/crypto/sha256/sha256block_amd64.s. It's easy to reproduce on my computer. (http://paste.ubuntu.com/23736706/)

The existing implementation on amd64 only detects AVX2 instructions, and uses them to improve performance when the target architecture supports these instruction set extensions.

I read the manuals by AMD[1] and Intel[2] carefully, and confirmed that RORX is a part of another feature called BMI (bit-manipulation instructions) including BMI1 and BMI2. RORX is a BMI2 instruction.

I checked my /proc/cpuinfo, and there was notbmi1 or bmi2 in that list. On the contrary, my server, Xeon E5-2600, has bmi1 and bmi2in cpuinfo and it can run that code correctly.

Therefore, unless we detect both AVX2 and BMI2, these instructions crash the running program as 'unknown instructions' on the architecture, e.g. my i3-4000M, which supports AVX2 but not support BMI2.

Attachment a part of cpuinfo

lion@Lion-Laptop [ go@master ] $ cat /proc/cpuinfo 
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 60
model name      : Intel(R) Core(TM) i3-4000M CPU @ 2.40GHz
stepping        : 3
microcode       : 0x12
cpu MHz         : 2400.000
cache size      : 3072 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 movbe popcnt tsc_deadline_timer xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust avx2 smep erms invpcid xsaveopt dtherm arat pln pts
bugs            :
bogomips        : 4790.10
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:
...

[1] AMD64 Architecture Programmer's Manual Volume 3: General-Purpose and System Instructions
[2] Intel® Advanced Vector Extensions Programming Reference

@LionNatsu LionNatsu changed the title crypto/sha256: crashes on certain architecture due to no detection for BMI instruction set extensions crypto: crashes on certain architecture due to no detection for BMI instruction set extensions Jan 4, 2017
@LionNatsu
Copy link
Contributor Author

@LionNatsu LionNatsu commented Jan 4, 2017

crypto/sha1 has the issue too. src/crypto/sha1/sha1block_amd64.s contains andn(BMI1) and rorx(BMI2) instructions.

Attachment error message when use sha1 or sha256

lion@Lion-Laptop [ ~ ] ! go run go/src/main/main.go 
SIGILL: illegal instruction
PC=0x44fd77 m=0

goroutine 1 [running]:
crypto/sha1.blockAVX2(0xc42003de68, 0xc42007a200, 0x180, 0x200)
        /usr/lib/go/src/crypto/sha1/sha1block_amd64.s:1458 +0x4d7 fp=0xc42003dd98 sp=0xc42003d808
crypto/sha1.block(0xc42003de68, 0xc42007a200, 0x200, 0x200)
        /usr/lib/go/src/crypto/sha1/sha1block_amd64.go:29 +0x9d fp=0xc42003ddd0 sp=0xc42003dd98
crypto/sha1.(*digest).Write(0xc42003de68, 0xc42007a200, 0x200, 0x200, 0x40, 0x200, 0xc42003de88)
        /usr/lib/go/src/crypto/sha1/sha1.go:75 +0x120 fp=0xc42003de18 sp=0xc42003ddd0
crypto/sha1.Sum(0xc42007a200, 0x200, 0x200, 0x0, 0x0, 0x0)
        /usr/lib/go/src/crypto/sha1/sha1.go:128 +0xee fp=0xc42003dee0 sp=0xc42003de18
main.main()
        /home/lion/go/src/main/main.go:12 +0x1e0 fp=0xc42003df48 sp=0xc42003dee0
runtime.main()
        /usr/lib/go/src/runtime/proc.go:183 +0x1f4 fp=0xc42003dfa0 sp=0xc42003df48
runtime.goexit()
        /usr/lib/go/src/runtime/asm_amd64.s:2099 +0x1 fp=0xc42003dfa8 sp=0xc42003dfa0

rax    0x10325476
rbx    0xefcdab89
rcx    0x67452301
rdx    0xc3d2e1f0
rdi    0x98badcfe
rsi    0xefcdab89
rbp    0xc42003dd88
rsp    0xc42003d808
r8     0x475400
r9     0xc42003de68
r10    0xc42007a200
r11    0xc42007a3c0
r12    0x200
r13    0xc42007a240
r14    0xc42003daa8
r15    0xc42003d808
rip    0x44fd77
rflags 0x10206
cs     0x33
fs     0x0
gs     0x0
exit status 2
lion@Lion-Laptop [ ~ ] ! 

@bradfitz bradfitz added this to the Go1.8 milestone Jan 4, 2017
@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Jan 4, 2017

@LionNatsu
Copy link
Contributor Author

@LionNatsu LionNatsu commented Jan 4, 2017

I searched it appears in src/that BMI1 (ANDN, BEXTR, BLSI, BLSMSK, BLSR, TZCNT), BMI2 (BZHI, MULX, PDEP, PEXT, RORX, SARX, SHLX, SHRX), and LZCNT.

Affected package:
crypto/sha1 from 2210d88 (29 Apr 2016), BMI1 and BMI2
crypto/sha256 from fafadc5 (29 Apr 2016), BMI2
x/crypto/chacha20poly1305 from 77b6a08 (13 Oct 2016), BMI2(MULX)

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Jan 4, 2017

/cc @TocarIP

@LionNatsu
Copy link
Contributor Author

@LionNatsu LionNatsu commented Jan 4, 2017

I wrote a quick patch for it... I hope this will be helpful for you.

From fbc8d16e7101ed57bd51f940745049e5d3fea487 Mon Sep 17 00:00:00 2001
From: Lion Yang <lion@aosc.xyz>
Date: Wed, 4 Jan 2017 22:42:18 +0800
Subject: [PATCH] crypto: detect BMI usability for sha1, sha256 &
 chacha20poly1305

The existing implementation on amd64 detects AVX2 instructions,
and uses them to improve performance when the target architecture
supports these instruction set extensions. It also contains some other
instructions which are parts of the BMI (bit-manipulation instructions).

These instructions crash the running program as 'unknown instructions'
on the architecture, e.g. i3-4000M, which supports AVX2 but not
support BMI.

This change added the detections for BMI1 and BMI2 to amd64 runtime with
two flags as the result, `support_bmi1` and `support_bmi2`,
in runtime/runtime2.go. It also completed the condition to run AVX2 version
in packages crypto/sha1, crypto/sha256 and x/crypto/chacha20poly1305.
---
 src/crypto/sha1/sha1block_amd64.s                        | 16 ++++++++++------
 src/crypto/sha256/sha256block_amd64.s                    |  6 +++++-
 src/runtime/asm_amd64.s                                  | 15 ++++++++++++++-
 src/runtime/runtime2.go                                  |  2 ++
 .../x/crypto/chacha20poly1305/chacha20poly1305_amd64.s   |  7 +++++--
 5 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/src/crypto/sha1/sha1block_amd64.s b/src/crypto/sha1/sha1block_amd64.s
index 0cdb43b..3047951 100644
--- a/src/crypto/sha1/sha1block_amd64.s
+++ b/src/crypto/sha1/sha1block_amd64.s
@@ -225,7 +225,7 @@ end:
 	RET
 
 
-// This is the implementation using AVX2. It is based on:
+// This is the implementation using AVX2, BMI1 and BMI2. It is based on:
 // "SHA-1 implementation with Intel(R) AVX2 instruction set extensions"
 // From http://software.intel.com/en-us/articles
 // (look for improving-the-performance-of-the-secure-hash-algorithm-1)
@@ -1459,15 +1459,19 @@ TEXT ·blockAVX2(SB),$1408-32
 
 
 // func checkAVX2() bool
-// returns whether AVX2 is supported
+// returns whether AVX2, BMI1 and BMI2 are supported
 TEXT ·checkAVX2(SB),NOSPLIT,$0
 	CMPB runtime·support_avx2(SB), $1
-	JE   has
-        MOVB    $0, ret+0(FP)
-	RET
-has:
+	JNE  noavx2
+	CMPB runtime·support_bmi1(SB), $1  // check for ANDNL instruction
+	JNE  noavx2
+	CMPB runtime·support_bmi2(SB), $1  // check for RORXL instruction
+	JNE  noavx2
         MOVB    $1, ret+0(FP)
 	RET
+noavx2:
+        MOVB    $0, ret+0(FP)
+	RET
 
 
 DATA K_XMM_AR<>+0x00(SB)/4,$0x5a827999
diff --git a/src/crypto/sha256/sha256block_amd64.s b/src/crypto/sha256/sha256block_amd64.s
index edf7ad1..9c6a210 100644
--- a/src/crypto/sha256/sha256block_amd64.s
+++ b/src/crypto/sha256/sha256block_amd64.s
@@ -560,7 +560,11 @@
 
 TEXT ·block(SB), 0, $536-32
 	CMPB runtime·support_avx2(SB), $1
-	JE   avx2
+	JNE  noavx2bmi2
+	CMPB runtime·support_bmi2(SB), $1  // check for RORXL instruction
+	JNE  noavx2bmi2
+	JMP  avx2
+noavx2bmi2:
 
 	MOVQ p_base+8(FP), SI
 	MOVQ p_len+16(FP), DX
diff --git a/src/runtime/asm_amd64.s b/src/runtime/asm_amd64.s
index 0070e9d..cb428d6 100644
--- a/src/runtime/asm_amd64.s
+++ b/src/runtime/asm_amd64.s
@@ -75,11 +75,24 @@ no7:
 	TESTL   $(1<<5), runtime·cpuid_ebx7(SB) // check for AVX2 bit
 	JEQ     noavx2
 	MOVB    $1, runtime·support_avx2(SB)
-	JMP     nocpuinfo
+	JMP     testbmi1
 noavx:
 	MOVB    $0, runtime·support_avx(SB)
 noavx2:
 	MOVB    $0, runtime·support_avx2(SB)
+testbmi1:
+	// Detect BMI1 and BMI2 extensions as per
+	// 5.1.16.1 Detection of VEX-encoded GPR Instructions,
+	//   LZCNT and TZCNT, PREFETCHW chapter of [1]
+	MOVB    $0, runtime·support_bmi1(SB)
+	TESTL   $(1<<3), runtime·cpuid_ebx7(SB) // check for BMI1 bit
+	JEQ     testbmi2
+	MOVB    $1, runtime·support_bmi1(SB)
+testbmi2:
+	MOVB    $0, runtime·support_bmi2(SB)
+	TESTL   $(1<<8), runtime·cpuid_ebx7(SB) // check for BMI2 bit
+	JEQ     nocpuinfo
+	MOVB    $1, runtime·support_bmi2(SB)
 nocpuinfo:	
 	
 	// if there is an _cgo_init, call it.
diff --git a/src/runtime/runtime2.go b/src/runtime/runtime2.go
index acc9426..1ceab0a 100644
--- a/src/runtime/runtime2.go
+++ b/src/runtime/runtime2.go
@@ -745,6 +745,8 @@ var (
 	lfenceBeforeRdtsc bool
 	support_avx       bool
 	support_avx2      bool
+	support_bmi1      bool
+	support_bmi2      bool
 
 	goarm                uint8 // set by cmd/link on arm systems
 	framepointer_enabled bool  // set by cmd/link
diff --git a/src/vendor/golang_org/x/crypto/chacha20poly1305/chacha20poly1305_amd64.s b/src/vendor/golang_org/x/crypto/chacha20poly1305/chacha20poly1305_amd64.s
index 2fa9b55..30afa38 100644
--- a/src/vendor/golang_org/x/crypto/chacha20poly1305/chacha20poly1305_amd64.s
+++ b/src/vendor/golang_org/x/crypto/chacha20poly1305/chacha20poly1305_amd64.s
@@ -279,8 +279,11 @@ TEXT ·chacha20Poly1305Open(SB), 0, $288-97
 
 	// Check for AVX2 support
 	CMPB runtime·support_avx2(SB), $1
-	JE   chacha20Poly1305Open_AVX2
-
+	JNE  noavx2
+	CMPB runtime·support_bmi2(SB), $1  // for MULXQ
+	JNE  noavx2
+	JMP   chacha20Poly1305Open_AVX2
+noavx2:
 	// Special optimization, for very short buffers
 	CMPQ inl, $128
 	JBE  openSSE128 // About 16% faster
-- 
2.11.0


@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Jan 4, 2017

@LionNatsu, we can't accept patches to the issue tracker. We need a CLA submitted, and we need the code in Gerrit for code review and because Gerrit is where our issue tracker is.

To contribute, see https://golang.org/doc/contribute.html

If you at least submit a CLA, then we could look at your code & use it.

@LionNatsu
Copy link
Contributor Author

@LionNatsu LionNatsu commented Jan 4, 2017

@bradfitz Thanks for your guidance. I'll register for a Gerrit account right now.

@gopherbot
Copy link

@gopherbot gopherbot commented Jan 4, 2017

CL https://golang.org/cl/34850 mentions this issue.

@gopherbot
Copy link

@gopherbot gopherbot commented Jan 4, 2017

CL https://golang.org/cl/34851 mentions this issue.

@gopherbot gopherbot closed this in a2b615d Jan 5, 2017
gopherbot pushed a commit that referenced this issue Jan 5, 2017
This change uses runtime.support_bmi2 as an additional condition
to examine the usability of AVX2 version algorithm, fixes
the crash on the platfrom which supports AVX2 but not support BMI2.

Fixes #18512

Change-Id: I408c0844ae2eb242dacf70cb9e8cec1b8f3bd941
Reviewed-on: https://go-review.googlesource.com/34851
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@gopherbot
Copy link

@gopherbot gopherbot commented Jan 5, 2017

CL https://golang.org/cl/34870 mentions this issue.

@golang golang locked and limited conversation to collaborators Jan 5, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.