Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: deadlock detection broken with external linking #46260

Closed
4a6f656c opened this issue May 19, 2021 · 10 comments
Closed

runtime: deadlock detection broken with external linking #46260

4a6f656c opened this issue May 19, 2021 · 10 comments
Milestone

Comments

@4a6f656c
Copy link
Contributor

@4a6f656c 4a6f656c commented May 19, 2021

The following code (based on test/fixedbugs/issue21576.go) works as expected when compiled with internal linking, however deadlocks when compiled with external linking:

$ cat b.go
package main

func main() {
  c := make(chan int)
  c <- 1
}
$ go build -o b b.go
$ time ./b
fatal error: all goroutines are asleep - deadlock!

goroutine 1 [chan send]:
main.main()
        /tmp/b.go:5 +0x50
    0m00.06s real     0m00.00s user     0m00.01s system
$ go build -ldflags=-linkmode=external -o b b.go
$ time ./b
^C    0m10.47s real     0m00.00s user     0m00.02s system


This does not appear to be new (Go 1.15, Go 1.16 and -tip result in the same on initial testing) and impacts multiple platforms (tested on openbsd/mips64, openbsd/amd64, linux/amd64).

@jsonbruce
Copy link

@jsonbruce jsonbruce commented May 19, 2021

It does not reproduce in darwin/amd64.

➜  go version                                      
go version go1.16.3 darwin/amd64
➜  cat test.go  
package main

func main() {
  c := make(chan int)
  c <- 1
}
➜  go build test.go        
➜  ./test 
fatal error: all goroutines are asleep - deadlock!

goroutine 1 [chan send]:
main.main()
        /tmp/test.go:5 +0x50
➜  go build -ldflags=-linkmode=external test.go
➜  ./test
fatal error: all goroutines are asleep - deadlock!

goroutine 1 [chan send]:
main.main()
        /tmp/test.go:5 +0x50
➜  go build -ldflags=all=-linkmode=external test.go
➜  ./test 
fatal error: all goroutines are asleep - deadlock!

goroutine 1 [chan send]:
main.main()
        /tmp/test.go:5 +0x50

@AlexRouSg

Loading

@thanm thanm added this to the Go1.17 milestone May 19, 2021
@thanm
Copy link
Contributor

@thanm thanm commented May 19, 2021

I can reproduce this on tip for linux/amd64.

Loading

@thanm
Copy link
Contributor

@thanm thanm commented May 19, 2021

Loading

@AlexRouSg
Copy link
Contributor

@AlexRouSg AlexRouSg commented May 19, 2021

On go1.16.4 windows/amd64 this does not reproduce with go build -ldflags=-linkmode=external .
But does reproduce with go build -ldflags=all=-linkmode=external . and go build -ldflags=-linkmode=external b.go

There is one notable difference, when running go build -ldflags=-linkmode=external .
It prints loadinternal: cannot find runtime/cgo

This could explain why @jsonbruce couldn't reproduce, can you confirm the exact command you ran?

Loading

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented May 19, 2021

Yes, this is known and expected. External linking in general brings in cgo. For a cgo binary even all goroutines are asleep C code can make progress, and we don't know. So we don't signal an error for all goroutines being asleep. That's why runtime deadlock tests require internal linking, e.g. https://tip.golang.org/src/runtime/crash_test.go#L184 .

Maybe we could do better detecting just external linking vs. real cgo. Other than that, this is expected.

Loading

@cherrymui cherrymui removed this from the Go1.17 milestone May 19, 2021
@cherrymui cherrymui added this to the Unplanned milestone May 19, 2021
@bcmills
Copy link
Member

@bcmills bcmills commented May 19, 2021

At one point there was an accepted proposal for partial deadlock detection (#13759), but it was apparently not implemented. The proposed partial-deadlock detector would be able to diagnose Go deadlocks even in cgo programs.

Loading

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented May 19, 2021

See also #29322.

Closing this issue because there is nothing new to do here. It's unfortunate but known. In general the deadlock detector is best-effort. I believe that it will also fail in a program that uses the network, even if there are no currently open network connections.

Loading

@4a6f656c
Copy link
Contributor Author

@4a6f656c 4a6f656c commented May 19, 2021

Yes, this is known and expected. External linking in general brings in cgo. For a cgo binary even all goroutines are asleep C code can make progress, and we don't know. So we don't signal an error for all goroutines being asleep.

Thanks for confirming - it makes sense with the external linking/cgo connection.

That's why runtime deadlock tests require internal linking, e.g. https://tip.golang.org/src/runtime/crash_test.go#L184 .

Heh, the tests in src/runtime do, however at least one in test does not (hence running into this). I'll send a CL to make this more obvious for future travelers.

Maybe we could do better detecting just external linking vs. real cgo. Other than that, this is expected.

Right, in this case there is no actual cgo, with all of the threads being Go controlled. Looking at the backtrace I was expecting it to work (and even reading the code and seeing iscgo in the runtime, I failed to realise the connection).

Loading

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented May 19, 2021

Heh, the tests in src/runtime do, however at least one in test does not (hence running into this). I'll send a CL to make this more obvious for future travelers.

Yeah, unfortunately GOROOT/test tests do not have access to internal/testenv, so it has to be done explicitly.

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented May 20, 2021

Change https://golang.org/cl/321450 mentions this issue: test: use internal linking with deadlock detector test

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants