Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: uninterruptible hang on os.Exit on darwin/arm64 #43294

Closed
aep opened this issue Dec 20, 2020 · 9 comments
Closed

runtime: uninterruptible hang on os.Exit on darwin/arm64 #43294

aep opened this issue Dec 20, 2020 · 9 comments

Comments

@aep
Copy link

@aep aep commented Dec 20, 2020

  • go version go1.16beta1 darwin/arm64
  • macos bigsur 11.1 arm64

What did you do?

unfortunately i cant find a small reproducible example yet, or a root cause.
maybe someone has an idea where to dig further.

the program calls C code. when commenting out the C calls, the issue does not appear.
It also does not appear on any other platform, including intel macosx

What did you expect to see?

when main() exits, the program should exit immediately

What did you see instead?

the last line in main() is executed (printf) but program wont exit for several seconds.

attempting to interrupt the program AFTER main exited using lldb will make lldb hang as well until the program finally exits and lldb just prints the exit code. pkill doesnt work either.

Interrupting the program BEFORE exiting main, works just as expected.

i'm not used to debugging macos. on linux, you can attach a syscall tracer to see which syscall is blocking, but dtrace doesnt work for me (no output). Maybe someone has an idea what runs after main and how to see it in lldb

calling os.Exit anywhere, leads to the same behaviour, so this is not a defer issue. even panic() hangs

lldb is unable to read golangs debug symbols, so the best i came up with for a breakpoint was using runtime.Breakpoint, but i cant step over it to the next instruction. i'll just forever repeat the breakpoint.

@aep aep changed the title uninteruptible hang on exit on bigsur arm64 uninteruptible hang on os.Exit on bigsur arm64 Dec 20, 2020
@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Dec 20, 2020

Could you try if patching in CL https://go-review.googlesource.com/c/go/+/269378 helps? Thanks.

@ianlancetaylor ianlancetaylor changed the title uninteruptible hang on os.Exit on bigsur arm64 runtime: uninterruptible hang on os.Exit on darwin/arm64 Dec 21, 2020
@ianlancetaylor ianlancetaylor added this to the Go1.16 milestone Dec 21, 2020
@gopherbot
Copy link

@gopherbot gopherbot commented Dec 21, 2020

Change https://golang.org/cl/269378 mentions this issue: runtime: use _exit on darwin

@gopherbot gopherbot closed this in 8438a57 Dec 21, 2020
@aep
Copy link
Author

@aep aep commented Dec 21, 2020

sorry for not being able to try earlier. i tested git from just now including this change, and it doesnt solve it.

i dont think the fix is relevant anyway, since there aren't any atexit functions registered in my case.
also atexit functions run inside userland and don't block lldb. this is more likely a syscall hanging,
maybe something to do with memory cleanup?

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Dec 21, 2020

Is it possible to share the source code of a reproducer?

@cherrymui cherrymui reopened this Dec 21, 2020
@aep
Copy link
Author

@aep aep commented Dec 21, 2020

its open source, but i couldnt find a good small reproducable example yet.

git clone https://github.com/devguardio/carrier.git
cd carrier/cli/
go build -a
./cli identity
./cli net trace

the last command will hang for a few seconds after being successful, but only on macos big sur.
and only if cgo was called. It's of course possible this is a bug in the C code, leading to memory corruption somewhere in go, although there is no osx specific code, its just posix, so same code works fine on linux.

also the same code works fine on macos being executed outside of cgo, i.e. as standalone c binary.

Unfortunately i dont know how to debug this, because lldb wont tell me where it hangs :/

there are no signal handlers or atexit being registered from the c code. it's just a bunch of protocol de/serialization around a udp socket, no threads, just socket/pipe/select/read/write , nothing in this should interfere with golang other than being blocking, so the go runtime has to dedicate a thread to the goroutine calling it

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Dec 21, 2020

Hmmm, I cannot reproduce.

% time ./cli identity
cDEXIP6HRNN7VV4R5TGZXNZ2XRZYXECD2LQHYBBS5F5RP3N3C6ZAZSCI
./cli identity  0.02s user 0.01s system 91% cpu 0.041 total
% time ./cli net trace
conduit.go:112: bootstrap
[WRN] carrier::endpoint timeout waiting for broker response
conduit.go:184: started broker con
[WRN] carrier::endpoint timeout waiting for broker response
conduit.go:184: started broker con
[WRN] carrier::endpoint timeout waiting for broker response
conduit.go:184: started broker con
conduit.go:53: started conduit
{
  "publishers": 0,
  "bytes_sent_per_epoch": 0,
  "bytes_recv_per_epoch": 0,
  "bytes_sent_per_second": 0,
  "bytes_recv_per_second": 0
}
conduit.go:60: stopping conduit
conduit.go:77: stopped conduit
./cli net trace  0.06s user 0.03s system 11% cpu 0.746 total

It exits quite quickly.

@aep
Copy link
Author

@aep aep commented Dec 22, 2020

yeah, network wasnt available, so it didnt do anything.
let me try to reduce to a test without network

@aep
Copy link
Author

@aep aep commented Dec 22, 2020

i believe this is a bug in macos big sur.
simply sending packets slower makes the issue go away. It might be that something is waiting for network responses on behalf of the program, even though it has already closed the socket.

i dont understand why this only happens when using C code inside golang, but i suspect its some sort of security thing that classifies golang programs differently. Or maybe its just because golang uses much more memory and causes a bug in some memory mapping.

Either way, this seems unlikely to be caused by golang, so i'm closing this.

@aep aep closed this Dec 22, 2020
@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Dec 22, 2020

Thanks for the investigation.

If you think Go uses more memory (or other resources) than it should, feel free to reopen or file a new bug. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants