Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
runtime: Docker daemon on Windows using 1.7 beta2 can deadlock all goroutines #16286
It looks like 276b177 introduces a case where an application can become completely deadlocked. This was found through moby/moby#23235 in an attempt to verify that docker can be upgraded to golang 1.7 successfully
Please answer these questions before submitting your issue. Thanks!
go 1.7 beta2, and through git bisect working back to commit 276b177.
I wish this were easier than it were, but running docker CI against binaries built against the above versions of golang. This also requires Windows Server 2016 builds more recent than the public TP5. I was specifically running on build 14375. The reason for newer builds is that TP5 does not support the newer APIs needed by docker, so we use older APIs in Windows. Post TP5, we make extensive use of callback APIs from C code in Windows to golang and make use of golang channels for callbacks. This appears to line up with the changes in 276b177
It was found by running the CLI test
Docker daemon completely locks up. Even an added goroutine which prints to the console every 100ms no longer makes forward progress.
changed the title
Docker daemon on Windows using 1.7 beta2 can deadlock all goroutines
Jul 7, 2016
Hi @jhowardmsft (or @alexbrainman, if you can repro), is it possible for you to get a traceback or some other form of debug dump from the deadlocked process? With that, this may be easy to track down; without it, it's going to be extremely hard.
(BTW, I'm out of office this week, so I may be slow to respond.)
Thanks for all the info.
Windows is different than Unix in that a call to
Anyhow, back to this issue. Can you find out whether it fixes the problem if you set the environment variable
@ianlancetaylor Yes, the deadlock doesn't not seem to happen with
I think I figured it out. At least, I can recreate the same symptoms. I don't think it's Windows-specific and I don't think it has anything to do with callbacks. I think it's simply that shrinkstack doesn't correctly handle the case of a select statement with the same channel in multiple cases. Will send CL shortly.