Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: 1.11rc2 crash in runtime.sigInitIgnored #27183

Closed
jtsylve opened this issue Aug 23, 2018 · 27 comments
Closed

runtime: 1.11rc2 crash in runtime.sigInitIgnored #27183

jtsylve opened this issue Aug 23, 2018 · 27 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Darwin
Milestone

Comments

@jtsylve
Copy link
Contributor

jtsylve commented Aug 23, 2018

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.11rc2 darwin/386

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

GOBIN=""
GOCACHE="/Users/joe/Library/Caches/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/joe/go:/Users/joe/src/repo004/GOPATH"
GORACE=""
GOROOT="/usr/local/Cellar/go/1.10.3/libexec"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.10.3/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/mg/q2kh762j0bb2fbtpj1fh7_0h0000gn/T/go-build454742149=/tmp/go-build -gno-record-gcc-switches -fno-common"```

What did you do?

Our application links against a shared library that's written in Go. When compiling with both Go 1.11 RC1 and 1.11 RC2 we're seeing a crash at start for one of our 32-bit applications. The strange thing is that the crash depends on how the application is launched. If it's launched via the command line we see no crash. If it's launched via a call for a parent application, it crashes. We have no issues with Go 1.10. The issue happens without any calls to the go library, and seems like it's doing the runtime initialization. We don't see this issue with the 64-bit application.

I suspect that this crash is related to the TLS changes made in b3a854c.

Here's the crash in lldb

libbbtgo.dylib`runtime.sigInitIgnored:
    0x24df180 <+0>:  mov    ecx, dword ptr gs:[0x18]
->  0x24df187 <+7>:  cmp    esp, dword ptr [ecx + 0x8]
    0x24df18a <+10>: jbe    0x24df1ce                 ; <+78>
    0x24df18c <+12>: sub    esp, 0x8
    0x24df18f <+15>: mov    eax, dword ptr [0x2d91eac]
    0x24df195 <+21>: mov    ecx, dword ptr [esp + 0xc]
    0x24df199 <+25>: mov    edx, ecx
    0x24df19b <+27>: shr    ecx, 0x5
    0x24df19e <+30>: cmp    ecx, 0x1
    0x24df1a1 <+33>: jae    0x24df1c7                 ; <+71>
    0x24df1a3 <+35>: lea    ebx, [0x2d91eac]
    0x24df1a9 <+41>: lea    ebx, [ebx + 4*ecx]
    0x24df1ac <+44>: mov    dword ptr [esp], ebx
    0x24df1af <+47>: mov    ecx, edx
    0x24df1b1 <+49>: mov    ebx, 0x1
    0x24df1b6 <+54>: shl    ebx, cl
    0x24df1b8 <+56>: or     eax, ebx
    0x24df1ba <+58>: mov    dword ptr [esp + 0x4], eax
    0x24df1be <+62>: call   0x24a7560                 ; runtime/internal/atomic.Store
    0x24df1c3 <+67>: add    esp, 0x8
    0x24df1c6 <+70>: ret    
    0x24df1c7 <+71>: call   0x24cab80                 ; runtime.panicindex
    0x24df1cc <+76>: ud2    
    0x24df1ce <+78>: call   0x24f1280                 ; runtime.morestack_noctxt
    0x24df1d3 <+83>: jmp    0x24df180                 ; <+0>
    0x24df1d5 <+85>: int3   
    0x24df1d6 <+86>: int3   
    0x24df1d7 <+87>: int3   
    0x24df1d8 <+88>: int3   
    0x24df1d9 <+89>: int3   
    0x24df1da <+90>: int3   
    0x24df1db <+91>: int3   
    0x24df1dc <+92>: int3   
    0x24df1dd <+93>: int3   
    0x24df1de <+94>: int3   
    0x24df1df <+95>: int3   
(lldb) register read
General Purpose Registers:
       eax = 0x00000019
       ebx = 0xbfffd710
       ecx = 0x00000000
       edx = 0x00000001
       edi = 0xbfffd728
       esi = 0x00000000
       ebp = 0x00000000
       esp = 0xbfffd728
        ss = 0x00000023
    eflags = 0x00210246  diskprocessor`DiskProcessorShellBase.DeleteProcessTempFiles%%o<DiskProcessorShellBase> + 600
       eip = 0x024df187  libbbtgo.dylib`runtime.sigInitIgnored + 7
        cs = 0x0000001b
        ds = 0x00000023
        es = 0x00000023
        fs = 0x00000000
        gs = 0x0000000f

As you can see, gs:[0x18] contains a null pointer, which causes a segfault on the next instruction

@ianlancetaylor ianlancetaylor added this to the Go1.11 milestone Aug 23, 2018
@ianlancetaylor ianlancetaylor added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Darwin labels Aug 23, 2018
@ianlancetaylor
Copy link
Contributor

How exactly are you building your shared library? How are you using it?

@jtsylve
Copy link
Contributor Author

jtsylve commented Aug 23, 2018

We build it the standard way:

GOOS=darwin GOARCH=386 CGO_ENABLED=1 \
go build  \
    -o "libbbtgo.dylib" \
    -buildmode=c-shared \
    libbbtgo

Our main application is written in a language called XOJO and dynamically links against the libbbtgo library to expose the functionality. We've been doing this for the past few years without issue.

@jtsylve
Copy link
Contributor Author

jtsylve commented Aug 23, 2018

Here's the stack trace

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
  * frame #0: 0x024df187 libbbtgo.dylib`runtime.sigInitIgnored + 7
    frame #1: 0x024ddea3 libbbtgo.dylib`runtime.initsig + 179
    frame #2: 0x024ca75c libbbtgo.dylib`runtime.libpreinit + 12

@ianlancetaylor
Copy link
Contributor

CC @randall77

@jtsylve
Copy link
Contributor Author

jtsylve commented Aug 23, 2018

There's also only a single thread

(lldb) thread list
Process 64946 stopped
* thread #1: tid = 0x1d4fe3, 0x024df187 libbbtgo.dylib`runtime.sigInitIgnored + 7, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)

I wonder if sigInitIgnored is being called before threadentry initializes the memory

@ianlancetaylor
Copy link
Contributor

If it's an ordering issue, it's not clear why it would fail on 32-bit but work on 64-bit.

@jtsylve
Copy link
Contributor Author

jtsylve commented Aug 23, 2018

It's also unclear why it doesn't crash if I launch via command line, but crashes if the process is spawned by a parent process. With that in mind, it may work on 64-bit out of luck.

@ianlancetaylor
Copy link
Contributor

In the directory misc/cgo/testcshared, go test cshared_test.go works on Darwin with both GOARCH=amd64 and GOARCH=386.

@ianlancetaylor
Copy link
Contributor

How is the process spawned by the parent process? Is the parent process written in Go or C?

@ianlancetaylor
Copy link
Contributor

And for the record, go test cshared_test.go does spawn a process, using the os/exec package, that uses dlopen to open a Go shared library. So I can't recreate the problem.

@jtsylve
Copy link
Contributor Author

jtsylve commented Aug 23, 2018

The parent process is written in a language called Xojo and is spawned via a call to Shell.Execute

@ianlancetaylor
Copy link
Contributor

Thanks. I'm not going to be able to do anything here without some ability to recreate the problem. The code seems OK, and it works for me.

I would suggest investigating any differences between Shell.Execute and simply running the program from the shell, perhaps by using the dtruss program.

@jtsylve
Copy link
Contributor Author

jtsylve commented Aug 23, 2018

looking at pstree output it looks like Shell.Execute simply spawns a bash -c <child process>

@jtsylve
Copy link
Contributor Author

jtsylve commented Aug 23, 2018

Calling the utility via the command line using bash -c does indeed crash

@ianlancetaylor
Copy link
Contributor

What shell are you running when you run the program from the command line?

@jtsylve
Copy link
Contributor Author

jtsylve commented Aug 23, 2018

bash.. which is the weird thing

@jtsylve
Copy link
Contributor Author

jtsylve commented Aug 23, 2018

I apologize, calling it using bash -c via the command line does not crash.

@jtsylve
Copy link
Contributor Author

jtsylve commented Aug 23, 2018

Let me know if there's any more debug info I can provide, since it's not easily reproducible on your end. This is definitely a regression for us, since it wasn't an issue with Go 1.10 and prior. Things are further complicated since we need to upgrade to Go 1.11 to resolve an unrelated issue for 64-bit.

@jtsylve
Copy link
Contributor Author

jtsylve commented Aug 24, 2018

I'm confused as to why sigInitIgnored is even being called. The only call to it I can find is in initsig, but that should return when being called from a shared library. What am I missing?

@ianlancetaylor
Copy link
Contributor

Let me know if there's any more debug info I can provide,

Unfortunately I'm basically turning this around for you to look into what differs when using Shell.Execute. As I understand things at this point, that is the only case that fails. So there must be something different about that case. What is it?

sigInitIgnored is called by initsig, as you say. In a shared library initsig is called by libpreinit with the preinit argument set to false. libpreinit is called by _rt0_arm_lib in runtime/asm_arm.s, which is called by _rt0_arm_darwin_lib in runtime/rt0_darwin_arm.s. The shared library is built such that _rt0_arm_darwin_lib is invoked when the shared library is loaded.

@jtsylve
Copy link
Contributor Author

jtsylve commented Aug 24, 2018

I've found a difference that seems to matter. When launching from the GUI app, signal 25 (SIGCONT) seems to be ignored, which triggers the buggy call to sigInitIgnored. When launching from the command line, sigInitIgnored is never called. So this issue only seems to present itself if there are ignored signals that early in the loading process.

@jtsylve
Copy link
Contributor Author

jtsylve commented Aug 24, 2018

Another relevant datapoint. By setting breakpoints in lldb I can safely say that initsig is called before threadentry, so gs:[0x18] is not initialized, hence the crash. That seems to be the crux of this issue. It just isn't triggered in the tests, because none of the signals are ignored so sigInitIgnored is not called when launched via the command line.

@jtsylve
Copy link
Contributor Author

jtsylve commented Aug 24, 2018

Launching from GUI

Process 92446 resuming
Process 92446 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
    frame #0: 0x0254d4b0 libbbtgo.dylib`runtime.initsig
libbbtgo.dylib`runtime.initsig:
->  0x254d4b0 <+0>:  subl   $0x14, %esp
    0x254d4b3 <+3>:  movl   $0x0, %eax
    0x254d4b8 <+8>:  movl   %eax, 0xc(%esp)
    0x254d4bc <+12>: movl   %eax, 0x10(%esp)
Target 0: (diskprocessor) stopped.
(lldb) c
Process 92446 resuming
Process 92446 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
    frame #0: 0x0254e847 libbbtgo.dylib`runtime.sigInitIgnored + 7
libbbtgo.dylib`runtime.sigInitIgnored:
->  0x254e847 <+7>:  cmpl   0x8(%ecx), %esp
    0x254e84a <+10>: jbe    0x254e88e                 ; <+78>
    0x254e84c <+12>: subl   $0x8, %esp
    0x254e84f <+15>: movl   0x2dd804c, %eax

Re-running in lldb (same as running from command line)

(lldb) run
There is a running process, detach from it and restart?: [Y/n] y
Process 92446 detached
Process 92450 launched: '/Users/joe/src/repo004/BlackLight/Resources/Mac/diskprocessor/diskprocessor' (i386)
Process 92450 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
    frame #0: 0x024ab4b0 libbbtgo.dylib`runtime.initsig
libbbtgo.dylib`runtime.initsig:
->  0x24ab4b0 <+0>:  subl   $0x14, %esp
    0x24ab4b3 <+3>:  movl   $0x0, %eax
    0x24ab4b8 <+8>:  movl   %eax, 0xc(%esp)
    0x24ab4bc <+12>: movl   %eax, 0x10(%esp)
Target 0: (diskprocessor) stopped.
(lldb) c
Process 92450 resuming
Process 92450 stopped
* thread #2, stop reason = breakpoint 2.1
    frame #0: 0x024ab4b0 libbbtgo.dylib`runtime.initsig
libbbtgo.dylib`runtime.initsig:
->  0x24ab4b0 <+0>:  subl   $0x14, %esp
    0x24ab4b3 <+3>:  movl   $0x0, %eax
    0x24ab4b8 <+8>:  movl   %eax, 0xc(%esp)
    0x24ab4bc <+12>: movl   %eax, 0x10(%esp)
Target 0: (diskprocessor) stopped.
(lldb) c
Process 92450 resuming
Process 92450 stopped
* thread #3, stop reason = breakpoint 1.1
    frame #0: 0x02718fd0 libbbtgo.dylib`threadentry
libbbtgo.dylib`threadentry:
->  0x2718fd0 <+0>: pushl  %ebp
    0x2718fd1 <+1>: movl   %esp, %ebp
    0x2718fd3 <+3>: pushl  %edi
    0x2718fd4 <+4>: pushl  %esi
Target 0: (diskprocessor) stopped.
(lldb) 

@ianlancetaylor
Copy link
Contributor

OK, got it. Thanks very much for debugging.

@gopherbot
Copy link

Change https://golang.org/cl/131277 mentions this issue: runtime: mark sigInitIgnored nosplit and nowritebarrierrec

@jtsylve
Copy link
Contributor Author

jtsylve commented Aug 24, 2018

I can confirm that the patch does fix the crash we were seeing.

@gopherbot
Copy link

Change https://golang.org/cl/131278 mentions this issue: [release-branch.go1.11] runtime: mark sigInitIgnored nosplit

gopherbot pushed a commit that referenced this issue Aug 24, 2018
The sigInitIgnored function can be called by initsig before a shared
library is initialized, before the runtime is initialized.

Fixes #27183

Change-Id: I7073767938fc011879d47ea951d63a14d1cce878
Reviewed-on: https://go-review.googlesource.com/131277
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
(cherry picked from commit d20ecd6e5dab55376ea4f169eed63608f9bb3b2b)
Reviewed-on: https://go-review.googlesource.com/131278
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@golang golang locked and limited conversation to collaborators Aug 24, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Darwin
Projects
None yet
Development

No branches or pull requests

3 participants