Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: golang.org/cl/46038 causes illegal instruction on FreeBSD/ARM #22248

Closed
paulzhol opened this issue Oct 13, 2017 · 16 comments

Comments

Projects
None yet
4 participants
@paulzhol
Copy link
Member

commented Oct 13, 2017

It is triggered by the os test similarly to this builder log:

signal: illegal instruction (core dumped)
FAIL	os	17.079s

Attached a manually built os.test binary and it's core:
os.test_sigill.zip

The corefile itself is not usable under gdb 8.0.1:

[paulzhol@a20 /tmp]$ gdb801 ./os.test os.test-2276.core                                                                                                                          [7/75]
GNU gdb (GDB) 8.0.1 [GDB v8.0.1 for FreeBSD]                                                                                                                                           
Copyright (C) 2017 Free Software Foundation, Inc.                                                                                                                                      
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>                                                                                                          
This is free software: you are free to change and redistribute it.                                                                                                                     
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"                                                                                                             
and "show warranty" for details.                                                                                                                                                       
This GDB was configured as "armv6-portbld-freebsd11.0".                                                                                                                                
Type "show configuration" for configuration details.                                       
For bug reporting instructions, please see:  
<http://www.gnu.org/software/gdb/bugs/>.     
Find the GDB manual and other documentation resources online at:                           
<http://www.gnu.org/software/gdb/documentation/>.                                          
For help, type "help".                       
Type "apropos word" to search for commands related to "word"...                            
Reading symbols from ./os.test...done.       
[New process 100201]                         
[New process 100297]                         
[New process 100076]                         
[New process 100162]                         
[New process 100191]                         
[New process 100197]                         
[New process 100199]                         
[New process 100202]                         
[New process 100206]                         
[New process 100209]                         
[New process 100210]                         

warning: Unexpected size of section `.reg/100201' in core file.                            

warning: wrong size gregset struct in core file                                            

warning: Unexpected size of section `.reg2/100201' in core file.                           
Core was generated by `/tmp/os.test'.        
Program terminated with signal SIGILL, Illegal instruction.                                

warning: Unexpected size of section `.reg/100201' in core file.                            

warning: wrong size gregset struct in core file 
warning: Unexpected size of section `.reg2/100201' in core file.
#0  <unavailable> in ?? ()
[Current thread is 1 (process 100201)]
Loading Go Runtime support.

(gdb) info threads
PC not available

While running the binary under gdb seems to work:

[paulzhol@a20 /tmp/go/src/os]$ gdb801 /tmp/os.test                                                                                                                                     
GNU gdb (GDB) 8.0.1 [GDB v8.0.1 for FreeBSD]                                                                                                                                           
Copyright (C) 2017 Free Software Foundation, Inc.                                                                                                                                      
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>                                                                                                          
This is free software: you are free to change and redistribute it.                                                                                                                     
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"                                                                                                             
and "show warranty" for details.                                                                                                                                                       
This GDB was configured as "armv6-portbld-freebsd11.0".                                                                                                                                
Type "show configuration" for configuration details.                                                                                                                                   
For bug reporting instructions, please see:                                                                                                                                            
<http://www.gnu.org/software/gdb/bugs/>.                                                                                                                                               
Find the GDB manual and other documentation resources online at:                                                                                                                       
<http://www.gnu.org/software/gdb/documentation/>.                                                                                                                                      
For help, type "help".                                                                                                                                                                 
Type "apropos word" to search for commands related to "word"...                                                                                                                        
Reading symbols from /tmp/os.test...done.                                                                                                                                              
Loading Go Runtime support.                                                                                                                                                            
(gdb) run                                                                                                                                                                              
Starting program: /tmp/os.test                                                                                                                                                         
[New LWP 100091 of process 2323]                                                                                                                                                       
[New LWP 100234 of process 2323]                                                                                                                                                       
[New LWP 100235 of process 2323]                                                                                                                                                       
[New LWP 100198 of process 2323]                                                                                                                                                       
                                                                                                                                                                                       
Thread 3 received signal SIGILL, Illegal instruction.                                                                                                                                  
[Switching to LWP 100234 of process 2323]                                                                                                                                              
runtime.thr_start () at /tmp/go/src/runtime/sys_freebsd_arm.s:65
65      TEXT runtime_thr_start(SB),NOSPLIT,$0                                                                                                                                          

(gdb) list                                                                                                                                                                             
60              MOVW size+4(FP), R1                                                                                                                                                    
61              MOVW $SYS_thr_new, R7                                                                                                                                                  
62              SWI $0                                                                                                                                                                 
63              RET
64
65      TEXT runtime_thr_start(SB),NOSPLIT,$0
66              // set up g
67              MOVW m_g0(R0), g
68              MOVW R0, g_m(g)
69              BL runtime_emptyfunc(SB) // fault if stack check is wrong

(gdb) disassemble
Dump of assembler code for function runtime.thr_start:
=> 0x0006782c <+0>:     push    {lr}            ; (str lr, [sp, #-4]!)
   0x00067830 <+4>:     ldr     r10, [r0]
   0x00067834 <+8>:     str     r0, [r10, #24]
   0x00067838 <+12>:    bl      0x66cc4 <runtime.emptyfunc>
   0x0006783c <+16>:    bl      0x3e8cc <runtime.mstart>
   0x00067840 <+20>:    mov     r8, #2, 0
   0x00067844 <+24>:    str     r8, [r8]
   0x00067848 <+28>:    pop     {pc}            ; (ldr pc, [sp], #4)
End of assembler dump.

(gdb) info threads
  Id   Target Id         Frame
  1    LWP 100233 of process 2323 runtime.sys_umtx_op () at /tmp/go/src/runtime/sys_freebsd_arm.s:53
  2    LWP 100091 of process 2323 runtime.usleep () at /tmp/go/src/runtime/sys_freebsd_arm.s:319
* 3    LWP 100234 of process 2323 runtime.thr_start () at /tmp/go/src/runtime/sys_freebsd_arm.s:65
  4    LWP 100235 of process 2323 runtime.sys_umtx_op () at /tmp/go/src/runtime/sys_freebsd_arm.s:53
  5    LWP 100198 of process 2323 runtime.thr_start () at /tmp/go/src/runtime/sys_freebsd_arm.s:65

(gdb) info goroutines
  1 waiting  runtime.gopark
  2 waiting  runtime.gopark
  17 waiting  runtime.gopark
  3 waiting  runtime.gopark
* 4 syscall  runtime.systemstack_switch
  60 runnable runtime.gopark
  62 runnable os_test.TestProgWideChdir.func1
  63 runnable os_test.TestProgWideChdir.func1
  64 runnable runtime.gopark
  67 runnable os_test.TestProgWideChdir.func1
  68 runnable os_test.TestProgWideChdir.func1
  69 runnable os_test.TestProgWideChdir.func1
  70 runnable os_test.TestProgWideChdir.func1

(gdb) goroutine 4 bt
#0  runtime.systemstack_switch () at /tmp/go/src/runtime/asm_arm.s:210
#1  0x00038648 in runtime.futexsleep (addr=0x212048 <runtime.sig>, val=0, ns=-1) at /tmp/go/src/runtime/os_freebsd.go:148
#2  0x0001e78c in runtime.notetsleep_internal (n=0x212048 <runtime.sig>, ns=-1, ~r2=false) at /tmp/go/src/runtime/lock_futex.go:174
#3  0x0001ea34 in runtime.notetsleepg (n=0x212048 <runtime.sig>, ns=-1, ~r2=true) at /tmp/go/src/runtime/lock_futex.go:228
#4  0x0005112c in os/signal.signal_recv (~r0=0) at /tmp/go/src/runtime/sigqueue.go:131
#5  0x000fefa4 in os/signal.loop () at /tmp/go/src/os/signal/signal_unix.go:22
#6  0x00066f00 in runtime.goexit () at /tmp/go/src/runtime/asm_arm.s:929
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

@ianlancetaylor ianlancetaylor added this to the Go1.10 milestone Oct 13, 2017

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Oct 13, 2017

CC @aclements

What does gdb show in a backtrace from the point where the SIGILL occurs?

@paulzhol

This comment has been minimized.

Copy link
Member Author

commented Oct 13, 2017

Starting program: /tmp/os.test                                                                             
[New LWP 100285 of process 2537]                                                                           
[New LWP 100288 of process 2537]                                                                           
[New LWP 100290 of process 2537]                                                                           
[New LWP 100287 of process 2537]                                                                           
[New LWP 100291 of process 2537]                                                                           
[New LWP 100298 of process 2537]                                                                           
[New LWP 100300 of process 2537]                                                                           
[New LWP 101001 of process 2537]                                                                           
                                                                                                           
Thread 6 received signal SIGILL, Illegal instruction.                                                      
[Switching to LWP 100291 of process 2537]                                                                  
runtime.thr_start () at /tmp/go/src/runtime/sys_freebsd_arm.s:65
65      TEXT runtime_thr_start(SB),NOSPLIT,$0
(gdb) bt
#0  runtime.thr_start () at /tmp/go/src/runtime/sys_freebsd_arm.s:65
#1  0x0003ebbc in runtime.mexit (osStack=false) at /tmp/go/src/runtime/proc.go:1322
#2  0x0003e940 in runtime.mstart () at /tmp/go/src/runtime/proc.go:1179
#3  0x00067840 in runtime.thr_start () at /tmp/go/src/runtime/sys_freebsd_arm.s:70
#4  0x00038934 in runtime.newosproc (mp=0x0, stk=0x0) at /tmp/go/src/runtime/os_freebsd.go:206
#5  0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) 
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Oct 13, 2017

Thanks.

@aclements If I'm reading the current code correctly, it is now possible for mstart to return to its caller, which is typically the function that creates a new thread. Those functions tend to have comments like "It shouldn't return. If it does, exit that thread." (from clone in sys_linux_amd64.s) but then go ahead and exit the thread anyhow if the function does return. Am I correct in thinking that those comments are now out of date?

If that is the case, then the problem seems to be that in sys_freebsd_GOARCH.s, the thr_start function does not exit the thread if mstart returns; it crashes.

@aclements

This comment has been minimized.

Copy link
Member

commented Oct 13, 2017

If I'm reading the current code correctly, it is now possible for mstart to return to its caller, which is typically the function that creates a new thread.

Yes, though it will only do this if osStack is true. In that case, we didn't create the stack, so we assume that whatever started the thread left something on the stack for us to return to that will destroy the thread and reap the stack.

In non-cgo FreeBSD, allocm definitely allocates a stack and puts it in g.stack, so osStack will be false, so mexit will call exitThread. In cgo FreeBSD, osStack will be true because pthread_create allocates the stack, but it should also be set up to exit from the thread.

Oh! On cgo-using FreeBSD/arm, pthread_create starts the thread at gcc_freebsd_arm.c:threadentry, which calls crosscall_arm1, which is missing a ret at the end. But I didn't think the os package used cgo? And why isn't this affecting all of the */arm ports, which all call crosscall_arm1?

@gopherbot

This comment has been minimized.

Copy link

commented Oct 13, 2017

Change https://golang.org/cl/70830 mentions this issue: runtime/cgo: return from crosscall_arm1

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Oct 13, 2017

crosscall_arm1 has one of those annoying ARM return instructions: it loads pc from the location on the stack where it stored lr.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Oct 13, 2017

OK, my next probably dumb question is that I don't understand the exitThread code in sys_freebsd_arm.s.

	MOVW	wait+0(FP), R0
	// We're done using the stack.
	MOVW	$0, R1
storeloop:
	LDREX	(R0), R4          // loads R4
	STREX	R1, (R0), R1      // stores R2
	CMP	$0, R1
	BNE	storeloop

The comment on the STREX instruction says that it stores R2, but it clearly stores R1. Which is fine. But then it puts the result of the STREX into R1, and loops if the value is not zero. So the first time through the loop it stores 0, and subsequent times through the loop it stores 1. That seems wrong. Am I misreading it? Of course this will almost never matter.

@aclements

This comment has been minimized.

Copy link
Member

commented Oct 13, 2017

crosscall_arm1 has one of those annoying ARM return instructions: it loads pc from the location on the stack where it stored lr.

Ah, indeed. Yet another reason why this isn't actually the problem. :)

That seems wrong. Am I misreading it? Of course this will almost never matter.

You're right that this code is clearly wrong. In fact, it looks like it's wrong for several GOOSes. But I don't think this could be causing the SIGILL...

@paulzhol

This comment has been minimized.

Copy link
Member Author

commented Oct 14, 2017

@ianlancetaylor I think you've found it! if I move MOVW $0, R1 into the loop:

diff --git a/src/runtime/sys_freebsd_arm.s b/src/runtime/sys_freebsd_arm.s
index 0121e62309..a1b97db980 100644
--- a/src/runtime/sys_freebsd_arm.s
+++ b/src/runtime/sys_freebsd_arm.s
@@ -86,8 +86,8 @@ TEXT runtime·exit(SB),NOSPLIT,$-8
 TEXT runtime<C2><B7>exitThread(SB),NOSPLIT,$0-4
        MOVW    wait+0(FP), R0
        // We're done using the stack.
-       MOVW    $0, R1
 storeloop:
+       MOVW    $0, R1
        LDREX   (R0), R4          // loads R4
        STREX   R1, (R0), R1      // stores R2
        CMP     $0, R1

I'm able to single-step and hit the illegal instruction on LDREX (R0), R4 :

Thread 4 hit Breakpoint 1, runtime.exitThread () at /tmp/go/src/runtime/sys_freebsd_arm.s:87
87              MOVW    wait+0(FP), R0
(gdb) stepi
90              MOVW    $0, R1
(gdb) 
91              LDREX   (R0), R4          // loads R4
(gdb) 

Thread 4 received signal SIGILL, Illegal instruction.
runtime.exitThread () at /tmp/go/src/runtime/sys_freebsd_arm.s:92
92              STREX   R1, (R0), R1      // stores R2

Which seems very weird, however if I just use R2 like this:

diff --git a/src/runtime/sys_freebsd_arm.s b/src/runtime/sys_freebsd_arm.s
index 0121e62309..f277215ddc 100644
--- a/src/runtime/sys_freebsd_arm.s
+++ b/src/runtime/sys_freebsd_arm.s
@@ -89,8 +89,8 @@ TEXT runtime·exitThread(SB),NOSPLIT,$0-4
        MOVW    $0, R1
 storeloop:
        LDREX   (R0), R4          // loads R4
-       STREX   R1, (R0), R1      // stores R2
-       CMP     $0, R1
+       STREX   R1, (R0), R2      // stores R2
+       CMP     $0, R2
        BNE     storeloop
        MOVW    $0, R0          // arg 1 long *state
        MOVW    $SYS_thr_exit, R7

The test passes.

Also should memory barriers like in https://golang.org/src/runtime/internal/atomic/asm_arm.s atomic·armcas be added here as well?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Oct 14, 2017

Ah, thanks. I looked up STREX and it explicitly forbids using the same register for both Rd and Rt. Perhaps some ARM processors detect this and trigger a SIGILL and some do not.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Oct 14, 2017

@paulzhol Want to see if https://golang.org/cl/70911 fixes the problem?

@gopherbot

This comment has been minimized.

Copy link

commented Oct 14, 2017

Change https://golang.org/cl/70911 mentions this issue: runtime: fix use of STREX in various exitThread implementations

@paulzhol

This comment has been minimized.

Copy link
Member Author

commented Oct 14, 2017

@ianlancetaylor https://golang.org/cl/70911 works as well, what about the memory barriers?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Oct 14, 2017

@aclements would know better but I don't think any memory barriers are required here because the only case that needs to read this field is holding sched.lock, and that code only needs to see a linked list that was created while also using holding sched.lock, and as such all memory barriers will have been executed anyhow by the lock code. That is, this is not a case of so-called lock-free programming, this is just a case where we need to definitely set a single field to 0. Really an ordinary non-atomic store ought to work fine too, since there is nothing setting the field to non-0, it just might take longer for allocm to see the value.

@gopherbot gopherbot closed this in 5ddd3d5 Oct 16, 2017

@aclements

This comment has been minimized.

Copy link
Member

commented Oct 16, 2017

Thanks for figuring that out!

@ianlancetaylor is right that this doesn't exactly need a barrier. The ordered write is just there to be observed in a timely manner by another thread doing reaping and so that it doesn't get reordered with the last use of the stack.

@paulzhol

This comment has been minimized.

Copy link
Member Author

commented Oct 17, 2017

Thanks for explaining!

@golang golang locked and limited conversation to collaborators Oct 17, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.