Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: "fatal error: PowerRegisterSuspendResumeNotification failure" when running in Windows docker containers #35447

Closed
KnicKnic opened this issue Nov 8, 2019 · 29 comments

Comments

@KnicKnic
Copy link

@KnicKnic KnicKnic commented Nov 8, 2019

What version of Go are you using (go version)?

go version go1.13.3 windows/amd64

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

go env Output
set GO111MODULE=
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\nmaliwa\AppData\Local\go-build
set GOENV=C:\Users\nmaliwa\AppData\Roaming\go\env
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GONOPROXY=
set GONOSUMDB=
set GOOS=windows
set GOPATH=C:\Users\nmaliwa\go
set GOPRIVATE=
set GOPROXY=https://proxy.golang.org,direct
set GOROOT=c:\go
set GOSUMDB=sum.golang.org
set GOTMPDIR=
set GOTOOLDIR=c:\go\pkg\tool\windows_amd64
set GCCGO=gccgo
set AR=ar
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\nmaliwa\AppData\Local\Temp\go-build924229414=/tmp/go-build -gno-record-gcc-switches

What did you do?

package main
import "fmt"
func main() {
    fmt.Println("hello world")
}

Ran it inside a windows docker container
docker run -it -v C:\test:c:\data mcr.microsoft.com/windows/nanoserver:1809

What did you expect to see?

hello world

What did you see instead?

c:\data\test.exe
fatal error: PowerRegisterSuspendResumeNotification failure

runtime stack:
runtime.throw(0x4d916b, 0x2e)
        c:/go/src/runtime/panic.go:774 +0x79 fp=0x82fde0 sp=0x82fdb0 pc=0x42cea9
runtime.monitorSuspendResume()
        c:/go/src/runtime/os_windows.go:294 +0x1a9 fp=0x82fe80 sp=0x82fde0 pc=0x4298c9
runtime.goenvs()
        c:/go/src/runtime/os_windows.go:531 +0x1ba fp=0x82fed8 sp=0x82fe80 pc=0x42a24a
runtime.schedinit()
        c:/go/src/runtime/proc.go:554 +0xa9 fp=0x82ff30 sp=0x82fed8 pc=0x42f899
runtime.rt0_go(0x82ff60, 0x7ff9d77ca27f, 0x82ff60, 0x0, 0x7ff9d77ca27f, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        c:/go/src/runtime/asm_amd64.s:214 +0x13d fp=0x82ff38 sp=0x82ff30 pc=0x452dfd

This is related to change https://go-review.googlesource.com/c/go/+/191957/ from issue #31528

The problem is that the PowerRegisterSuspendResumeNotification returns error 2 - ERROR_FILE_NOT_FOUND

This also happens with the latest ltsc container images mcr.microsoft.com/windows/servercore:ltsc2019

I found someone else experiencing the same issue https://social.msdn.microsoft.com/Forums/en-US/f13a5e6c-e57d-4790-88db-ea9757ca3846/running-golang-program-in-azure-app-service-console-give-fatal-error?forum=windowsazurewebsitespreview

It would be nice if when calling PowerRegisterSuspendResumeNotification we don't throw on error 2.

@zx2c4

@KnicKnic KnicKnic changed the title Cannot run golang programs in windows docker containers Cannot run go programs in windows docker containers Nov 8, 2019
@bcmills
Copy link
Member

@bcmills bcmills commented Nov 8, 2019

@bcmills bcmills changed the title Cannot run go programs in windows docker containers runtime: "fatal error: PowerRegisterSuspendResumeNotification failure" when running in Windows docker containers Nov 8, 2019
@bcmills bcmills added this to the Unplanned milestone Nov 8, 2019
@alexbrainman
Copy link
Member

@alexbrainman alexbrainman commented Nov 9, 2019

@KnicKnic

Thank you for report.

We should at least add Windows error number to the panic message. Lucky we have you, who worked out what the error number is.

The problem is that the PowerRegisterSuspendResumeNotification returns error 2 - ERROR_FILE_NOT_FOUND

This error looks strange. Maybe PowerRegisterSuspendResumeNotification is broken when running in docker container. Unfortunetly I don't have docker to verify the error.

@jstarks should PowerRegisterSuspendResumeNotification work in container? Thank you.

Alex

@jstarks
Copy link

@jstarks jstarks commented Nov 9, 2019

We can look into whether this could be supported in a future version of Windows containers. I think it would be reasonable to ignore failure in this case.

The bigger question to me is whether the change that introduced this regression was the right one at all. The Windows kernel team changed timer behavior in Windows 8 to stop advancing relative timeouts on wake. Otherwise when you open your laptop lid, every timer in the system goes off all at once and you get a bunch of unpredictable errors. Software is generally written to assume that local processes will make forward progress over reasonable time periods, and if they don't then something is wrong. When the machine is asleep, this assumption is violated. By making relative timers behave like threads, so that they both run together or they both don't, the illusion is maintained. You can claim these programs are buggy, but they obviously exist. Watchdog timers are well-known constructs.

This was a conscious design decision in Windows, and so it's disappointing to see the Go runtime second guess this several years later in a bug fix.

As far as behavior on Linux, there is clearly no consensus in issue #24595, which discusses this same problem. And indeed you can see that the CLOCK_MONOTONIC/CLOCK_BOOTTIME convergence was reverted in the kernel exactly because of the reason we stopped advancing time in Windows: random code has random failures due to timeouts. See https://lkml.org/lkml/2018/4/26/929 for a brief justification.

But despite the lack of consensus on Linux, for some reason the change in Windows behavior was merged already.

I think the change that introduced this should be reverted and the original proposal to use QueryUnbiasedInterruptTime (to match the behavior of WaitForSingleObject) should be adopted.

@mpx @networkimprov @ianlancetaylor seemed interested in this topic from the other issue.

@networkimprov
Copy link

@networkimprov networkimprov commented Nov 9, 2019

Filed #35482 re reverting the Windows change.

@alexbrainman
Copy link
Member

@alexbrainman alexbrainman commented Nov 10, 2019

We can look into whether this could be supported in a future version of Windows containers.

Sounds good.

I think it would be reasonable to ignore failure in this case.

Do you suggest we always ignore PowerRegisterSuspendResumeNotification failure? Or you suggest to ignore it on particular Windows version? How do I decide when to ignore PowerRegisterSuspendResumeNotification failure?

The bigger question to me is whether the change that introduced this regression was the right one at all. The Windows kernel team changed timer behavior in Windows 8 to stop advancing relative timeouts on wake. Otherwise when you open your laptop lid, every timer in the system goes off all at once and you get a bunch of unpredictable errors.

Sounds reasonable to me. But you could also see problems with your design decision too - see, for example #31528. It could be confusing when your timer takes longer after you have laptop lid closed.

I will let Ian and @zx2c4 fight that fight with you.

Thank you for your comments.

Alex

@northtyphoon
Copy link

@northtyphoon northtyphoon commented Nov 11, 2019

What golang version can I use to workaround the issue now?

@networkimprov
Copy link

@networkimprov networkimprov commented Nov 11, 2019

Go 1.12.* and 1.13.2 don't have the code which causes the error.

Also, patching your local c:/go/src/runtime/os_windows.go to ignore the error would be straightforward.

@zx2c4
Copy link
Contributor

@zx2c4 zx2c4 commented Nov 21, 2019

Do you suggest we always ignore PowerRegisterSuspendResumeNotification failure? Or you suggest to ignore it on particular Windows version? How do I decide when to ignore PowerRegisterSuspendResumeNotification failure?

Let's just always ignore the error. We support the feature on operating system configurations that support the feature. I'll send a CL.

@gopherbot
Copy link

@gopherbot gopherbot commented Nov 21, 2019

Change https://golang.org/cl/208317 mentions this issue: runtime: treat failures of PowerRegisterSuspendResumeNotification as non-fatal

@zx2c4
Copy link
Contributor

@zx2c4 zx2c4 commented Nov 21, 2019

@gopherbot please backport my fix at https://golang.org/cl/208317 for this to 1.13

@gopherbot
Copy link

@gopherbot gopherbot commented Nov 21, 2019

Backport issue(s) opened: #35746 (for 1.13).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@zx2c4
Copy link
Contributor

@zx2c4 zx2c4 commented Nov 21, 2019

@jstarks I haven't yet tried to reproduce the bug or look dynamically at what's happening, but reversing around the binaries I'm seeing

powrprof.dll!PowerRegisterSuspendResumeNotification -> umpdc.dll!PdcPortOpen -> ntdll.dll!ZwAlpcConnectPort("\\PdcPort") -> syscall -> ntdll.exe!AlpcpConnectPort and then it jumps into the object manager to look it up. Is this not symlinked into the view containers have or something?

@gopherbot
Copy link

@gopherbot gopherbot commented Dec 8, 2019

Change https://golang.org/cl/210437 mentions this issue: runtime: revert "monitor for suspend/resume to kick timeouts"

@goldsam
Copy link

@goldsam goldsam commented Dec 9, 2019

Any word on this?

@networkimprov
Copy link

@networkimprov networkimprov commented Dec 9, 2019

It will be fixed in the next minor release, 1.13.6. You can patch the current Go release in the interim by applying the diff for the CL posted immediately above.

@ianlancetaylor ianlancetaylor modified the milestones: Unplanned, Go1.13.6 Dec 13, 2019
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 13, 2019

I'm open to suggestions, with the proviso that changes on release branches must be as minimal as possible.

I don't think the CL (https://golang.org/cl/191957) actually changed timers on Windows from "program time" to "real time". I think they were already "real time." It was only the earliest timer in a bucket that was delayed after a suspend. After that timer fired, all the other timers that were ready according to "real time" would fire immediately. CL 191957 removed the delay after a suspend for the first timer.

@networkimprov
Copy link

@networkimprov networkimprov commented Dec 13, 2019

Right, thanks for clarifying. I think a fix entails a shift to program timers on Win8/10 for machines that may be suspended. Two ideas:

a) an opt-in fix where runtime.nanotime is from QueryUnbiasedInterruptTime()
b) use PowerRegisterSuspendResumeNotification() to adjust pending timers by the suspended period

The latter may fail in the circumstance described above, but maybe Docker deployments rarely suspend? We can ask users not to combine Win8/10 + Docker + Go<=1.13 + timer/sleep + suspend.

@gopherbot
Copy link

@gopherbot gopherbot commented Dec 13, 2019

Change https://golang.org/cl/211279 mentions this issue: runtime: fallback to WM_POWERBROADCAST if powrprof is broken

@gopherbot
Copy link

@gopherbot gopherbot commented Dec 13, 2019

Change https://golang.org/cl/211280 mentions this issue: [release-branch.go1.13] runtime: do not use PowerRegisterSuspendResumeNotification on systems with "program time" timer

@gopherbot
Copy link

@gopherbot gopherbot commented Jan 3, 2020

Change https://golang.org/cl/213198 mentions this issue: [release-branch.go1.12] runtime: do not use PowerRegisterSuspendResumeNotification on systems with "program time" timer

gopherbot pushed a commit that referenced this issue Jan 3, 2020
…eNotification on systems with "program time" timer

Systems where PowerRegisterSuspendResumeNotification returns ERROR_
FILE_NOT_FOUND are also systems where nanotime() is on "program time"
rather than "real time".  The chain for this is:

powrprof.dll!PowerRegisterSuspendResumeNotification ->
  umpdc.dll!PdcPortOpen ->
    ntdll.dll!ZwAlpcConnectPort("\\PdcPort") ->
      syscall -> ntoskrnl.exe!AlpcpConnectPort

Opening \\.\PdcPort fails with STATUS_OBJECT_NAME_NOT_FOUND when pdc.sys
hasn't been initialized. Pdc.sys also provides the various hooks for
sleep resumption events, which means if it's not loaded, then our "real
time" timer is actually on "program time". Finally STATUS_OBJECT_NAME_
NOT_FOUND is passed through RtlNtStatusToDosError, which returns ERROR_
FILE_NOT_FOUND. Therefore, in the case where the function returns ERROR_
FILE_NOT_FOUND, we don't mind, since the timer we're using will
correspond fine with the lack of sleep resumption notifications. This
applies, for example, to Docker users.

Updates #35447
Updates #35482
Fixes #35746

Change-Id: I9e1ce5bbc54b9da55ff7a3918b5da28112647eee
Reviewed-on: https://go-review.googlesource.com/c/go/+/211280
Run-TryBot: Jason A. Donenfeld <Jason@zx2c4.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
gopherbot pushed a commit that referenced this issue Jan 3, 2020
…eNotification on systems with "program time" timer

Systems where PowerRegisterSuspendResumeNotification returns ERROR_
FILE_NOT_FOUND are also systems where nanotime() is on "program time"
rather than "real time".  The chain for this is:

powrprof.dll!PowerRegisterSuspendResumeNotification ->
  umpdc.dll!PdcPortOpen ->
    ntdll.dll!ZwAlpcConnectPort("\\PdcPort") ->
      syscall -> ntoskrnl.exe!AlpcpConnectPort

Opening \\.\PdcPort fails with STATUS_OBJECT_NAME_NOT_FOUND when pdc.sys
hasn't been initialized. Pdc.sys also provides the various hooks for
sleep resumption events, which means if it's not loaded, then our "real
time" timer is actually on "program time". Finally STATUS_OBJECT_NAME_
NOT_FOUND is passed through RtlNtStatusToDosError, which returns ERROR_
FILE_NOT_FOUND. Therefore, in the case where the function returns ERROR_
FILE_NOT_FOUND, we don't mind, since the timer we're using will
correspond fine with the lack of sleep resumption notifications. This
applies, for example, to Docker users.

Updates #35447
Updates #35482
Fixes #36377

Change-Id: I9e1ce5bbc54b9da55ff7a3918b5da28112647eee
Reviewed-on: https://go-review.googlesource.com/c/go/+/208317
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Jason A. Donenfeld <Jason@zx2c4.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-on: https://go-review.googlesource.com/c/go/+/213198
@cagedmantis cagedmantis modified the milestones: Go1.13.6, Go1.14 Jan 9, 2020
@cagedmantis
Copy link
Contributor

@cagedmantis cagedmantis commented Jan 9, 2020

Moved this issue to the go1.14 milestone. This has been resolved in master via 056a3d1. There are two backport issues which were created and resolved for 1.13.6 ( #35746) and 1.12.15 (#36377) releases.

@kpfaulkner
Copy link

@kpfaulkner kpfaulkner commented Feb 9, 2020

I know this has already closed, but wanted to mention that I'm also getting this issue when running in Azure App Services (using go 1.14rc1). Previously I had this working fine with 1.12 (I believe). Normally use 1.13 now but hadn't rebuilt/redeployed this binary in question. So far 1.13.* and 1.14rc1 get this PowerRegisterSuspendResumeNotification error stil.

@networkimprov
Copy link

@networkimprov networkimprov commented Feb 9, 2020

@kpfaulkner could you file a new issue, and give the exact error string you're seeing, and details for how to reproduce it on azure?

@kpfaulkner
Copy link

@kpfaulkner kpfaulkner commented Feb 10, 2020

Done. Cheers.

@gopherbot
Copy link

@gopherbot gopherbot commented Feb 16, 2020

Change https://golang.org/cl/219657 mentions this issue: runtime: ignore error returned by PowerRegisterSuspendResumeNotification

gopherbot pushed a commit that referenced this issue Mar 21, 2020
It appears that PowerRegisterSuspendResumeNotification is not supported
when running inside Docker - see issues #35447, #36557 and #37149.

Our current code relies on error number to determine Docker environment.
But we already saw PowerRegisterSuspendResumeNotification return
ERROR_FILE_NOT_FOUND, ERROR_INVALID_PARAMETERS and ERROR_ACCESS_DENIED
(see issues above). So this approach is not sustainable.

Just ignore PowerRegisterSuspendResumeNotification returned error.

Fixes #37149

Change-Id: I2beba9d45cdb8c1efac5e974e747827a6261915a
Reviewed-on: https://go-review.googlesource.com/c/go/+/219657
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
@gopherbot
Copy link

@gopherbot gopherbot commented Mar 21, 2020

Change https://golang.org/cl/224585 mentions this issue: [release-branch.go1.13] runtime: ignore error returned by PowerRegisterSuspendResumeNotification

@gopherbot
Copy link

@gopherbot gopherbot commented Mar 21, 2020

Change https://golang.org/cl/224586 mentions this issue: [release-branch.go1.14] runtime: ignore error returned by PowerRegisterSuspendResumeNotification

gopherbot pushed a commit that referenced this issue Mar 25, 2020
…erSuspendResumeNotification

It appears that PowerRegisterSuspendResumeNotification is not supported
when running inside Docker - see issues #35447, #36557 and #37149.

Our current code relies on error number to determine Docker environment.
But we already saw PowerRegisterSuspendResumeNotification return
ERROR_FILE_NOT_FOUND, ERROR_INVALID_PARAMETERS and ERROR_ACCESS_DENIED
(see issues above). So this approach is not sustainable.

Just ignore PowerRegisterSuspendResumeNotification returned error.

For #37149
Fixes #37230

Change-Id: I2beba9d45cdb8c1efac5e974e747827a6261915a
Reviewed-on: https://go-review.googlesource.com/c/go/+/219657
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
(cherry picked from commit d467f3b)
Reviewed-on: https://go-review.googlesource.com/c/go/+/224585
Run-TryBot: Ian Lance Taylor <iant@golang.org>
gopherbot pushed a commit that referenced this issue Mar 25, 2020
…erSuspendResumeNotification

It appears that PowerRegisterSuspendResumeNotification is not supported
when running inside Docker - see issues #35447, #36557 and #37149.

Our current code relies on error number to determine Docker environment.
But we already saw PowerRegisterSuspendResumeNotification return
ERROR_FILE_NOT_FOUND, ERROR_INVALID_PARAMETERS and ERROR_ACCESS_DENIED
(see issues above). So this approach is not sustainable.

Just ignore PowerRegisterSuspendResumeNotification returned error.

For #37149
Fixes #37699

Change-Id: I2beba9d45cdb8c1efac5e974e747827a6261915a
Reviewed-on: https://go-review.googlesource.com/c/go/+/219657
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
(cherry picked from commit d467f3b)
Reviewed-on: https://go-review.googlesource.com/c/go/+/224586
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet