Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: Docker daemon on Windows using 1.7 beta2 can deadlock all goroutines #16286

Closed
jhowardmsft opened this issue Jul 7, 2016 · 13 comments

Comments

Projects
None yet
8 participants
@jhowardmsft
Copy link

commented Jul 7, 2016

It looks like 276b177 introduces a case where an application can become completely deadlocked. This was found through moby/moby#23235 in an attempt to verify that docker can be upgraded to golang 1.7 successfully

@aclements @runcom @alexbrainman @jstarks

Please answer these questions before submitting your issue. Thanks!

  1. What version of Go are you using (go version)?

go 1.7 beta2, and through git bisect working back to commit 276b177.

  1. What operating system and processor architecture are you using (go env)?
set GOARCH=amd64
set GOBIN=
set GOCHAR=6
set GOEXE=.exe
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=e:\go\src\github.com\docker\docker\vendor;e:\go
set GORACE=
set GOROOT=C:\go
set GOTOOLDIR=C:\go\pkg\tool\windows_amd64
set CC=gcc
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0
set CXX=g++
set CGO_ENABLED=1
  1. What did you do?
    If possible, provide a recipe for reproducing the error.
    A complete runnable program is good.
    A link on play.golang.org is best.

I wish this were easier than it were, but running docker CI against binaries built against the above versions of golang. This also requires Windows Server 2016 builds more recent than the public TP5. I was specifically running on build 14375. The reason for newer builds is that TP5 does not support the newer APIs needed by docker, so we use older APIs in Windows. Post TP5, we make extensive use of callback APIs from C code in Windows to golang and make use of golang channels for callbacks. This appears to line up with the changes in 276b177

It was found by running the CLI test TestRestartContainerwithRestartPolicy, although I've seen it fail on other tests too. The most reliable way of repro was starting the test, killing the daemon 5 or 6 seconds after containers have been started, then start cycle a few times through starting the daemon, seeing if it deadlocks, if not, killing it and restarting it again.

  1. What did you expect to see?

No deadlock

  1. What did you see instead?

Docker daemon completely locks up. Even an added goroutine which prints to the console every 100ms no longer makes forward progress.

@quentinmit quentinmit changed the title Docker daemon on Windows using 1.7 beta2 can deadlock all goroutines runtime: Docker daemon on Windows using 1.7 beta2 can deadlock all goroutines Jul 7, 2016

@quentinmit quentinmit added this to the Go1.7Maybe milestone Jul 7, 2016

@quentinmit

This comment has been minimized.

Copy link
Contributor

commented Jul 7, 2016

@aclements

This comment has been minimized.

Copy link
Member

commented Jul 7, 2016

Hi @jhowardmsft (or @alexbrainman, if you can repro), is it possible for you to get a traceback or some other form of debug dump from the deadlocked process? With that, this may be easy to track down; without it, it's going to be extremely hard.

(BTW, I'm out of office this week, so I may be slow to respond.)

@jhowardmsft

This comment has been minimized.

Copy link
Author

commented Jul 7, 2016

Happy to help, but how can I get a traceback/debug dump? I can dump the process from task manager in Windows if that is sufficient. But if there's golang utilities for this, then if you can provide a pointer I can run them.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Jul 7, 2016

How exactly are you calling into Windows C code, and how are you calling back from C code? Do you ever use RawSyscall?

@jhowardmsft

This comment has been minimized.

Copy link
Author

commented Jul 7, 2016

@ianlancetaylor I don't believe we use RawSyscall directly, although possibly indirectly? @DarrenStahl - can you confirm?

The golang interface between docker and Windows is all through https://github.com/Microsoft/hcsshim.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jul 7, 2016

I see no use of RawSyscall in hcsshim or its dependent github.com/Microsoft/go-winio

@jstarks

This comment has been minimized.

Copy link

commented Jul 7, 2016

I don't believe RawSyscall exists on Windows, does it?

@darstahl

This comment has been minimized.

Copy link

commented Jul 7, 2016

RawSyscall is not used. All calls into C code are with syscall.Syscall, syscall.Syscall6 etc. A callback is created with syscall.NewCallback, and passed to C code as a parameter via syscall.Syscall. The C code calls it as a function pointer.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Jul 7, 2016

Thanks for all the info.

Windows is different than Unix in that a call to syscall.Syscall can call a Windows callback which will call runtime.cgocallback. As far as I know there is nothing that prevents passing the address of a stack variable, converted to uintptr, to syscall.Syscall. If a callback then causes the stack to grow, the variable will move, possibly leading to odd results. I don't have any reason to think that is the problem here, but I'm curious whether anybody knows of anything that will prevent that from happening.

Anyhow, back to this issue. Can you find out whether it fixes the problem if you set the environment variable GODEBUG to gcshrinkstackoff=1?

@jhowardmsft

This comment has been minimized.

Copy link
Author

commented Jul 7, 2016

@ianlancetaylor Yes, the deadlock doesn't not seem to happen with GODEBUG set to gcshrinkstackoff=1. Have given it a dozen or more attempts and can't get it to happen.

PS E:\go\src\github.com\docker\docker> docker version
Client:
 Version:      1.12.0-dev
 API version:  1.25
 Go version:   devel +276b177 Wed Mar 16 20:13:20 2016 +0000
 Git commit:   4b1883c-Administrator-WIN-QAKDUHMNV0O-Dynamic
 Built:        Thu Jul  7 23:07:53 UTC 2016
 OS/Arch:      windows/amd64

Server:
 Version:      1.12.0-dev
 API version:  1.25
 Go version:   devel +276b177 Wed Mar 16 20:13:20 2016 +0000
 Git commit:   4b1883c-Administrator-WIN-QAKDUHMNV0O-Dynamic
 Built:        Thu Jul  7 23:07:53 UTC 2016
 OS/Arch:      windows/amd64
PS E:\go\src\github.com\docker\docker> $env:GODEBUG
gcshrinkstackoff=1
PS E:\go\src\github.com\docker\docker>
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Jul 8, 2016

I think I figured it out. At least, I can recreate the same symptoms. I don't think it's Windows-specific and I don't think it has anything to do with callbacks. I think it's simply that shrinkstack doesn't correctly handle the case of a select statement with the same channel in multiple cases. Will send CL shortly.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Jul 8, 2016

@jhowardmsft can you try https://golang.org/cl/24815 to see if it fixes the problem? Thanks.

@gopherbot

This comment has been minimized.

Copy link

commented Jul 8, 2016

CL https://golang.org/cl/24815 mentions this issue.

@gopherbot gopherbot closed this in 84bb9e6 Jul 8, 2016

@golang golang locked and limited conversation to collaborators Jul 8, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.