Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to react to graceful shutdown of (Windows) container #25982

Open
godefroi opened this issue Aug 24, 2016 · 92 comments
Open

Unable to react to graceful shutdown of (Windows) container #25982

godefroi opened this issue Aug 24, 2016 · 92 comments

Comments

@godefroi
Copy link
Contributor

@godefroi godefroi commented Aug 24, 2016

Output of docker version:

Client:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:        Thu Jul 28 21:15:28 2016
 OS/Arch:      windows/amd64

Server:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:        Thu Jul 28 21:15:28 2016
 OS/Arch:      windows/amd64

Output of docker info:

Containers: 1
 Running: 0
 Paused: 0
 Stopped: 1
Images: 86
Server Version: 1.12.0
Storage Driver: windowsfilter
 Windows:
Logging Driver: json-file
Plugins:
 Volume: local
 Network: nat null overlay
Swarm: inactive
Security Options:
Kernel Version: 10.0 14300 (14300.1045.amd64fre.rs1_release_svc.160705-1059)
Operating System: Windows Server 2016 Standard Technical Preview 5
OSType: windows
Architecture: x86_64
CPUs: 2
Total Memory: 8 GiB
Name: slc-dev-s16p-2
ID: H3YG:MO32:XSEU:NGD4:Z7FC:4SMS:EYPL:RGHO:DGWY:W767:O3L2:HNXZ
Docker Root Dir: C:\ProgramData\docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:
 127.0.0.0/8

I am unable to react to a graceful shutdown of my application running inside a (Windows) container. I have tried SetConsoleCtrlHandler(), but my handler is never called. I have tried signal(), but no SIGTERM is received. I have tried running a message loop, but WM_CLOSE is never received.

The work of shutting down the container is (apparently) done by the ShutdownComputeSystem routine from vmcompute.dll (this is from zhcsshim.go), but I cannot find any documentation or other information on what ShutdownComputeSystem does. It has been suggested that @jhowardmsft would know what's going on.

Please help!

@godefroi
Copy link
Contributor Author

@godefroi godefroi commented Aug 24, 2016

There is some discussion related to this here and here, for what it's worth.

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Sep 27, 2016

ping @jhowardmsft ptal

@lowenna
Copy link
Member

@lowenna lowenna commented Sep 27, 2016

The call into HCSShim will differ under the covers depending on whether this is a Windows Server Container, or a Hyper-V container, and depending on not whether the initiator of the shutdown is a forced shutdown or graceful shutdown (eg from docker stop -f container vs docker stop container). In a forced case, for either container type, no notification will be expected. It is possible that there is a kernel issue in the Windows Server Container case, but I'm told by the kernel folks that the way it works is that the cexecsvc calls an initiatesystemshutdown, and the processes in the job object/silo should be notified as per a regular system shutdown.

Can you confirm what container type this is? And if you see the same for both types. If you run the app outside of a container just on the host and run "shutdown /t 0 /r" (effectively an InitateSystemShutdown call), does your app get notified?

@lowenna
Copy link
Member

@lowenna lowenna commented Sep 27, 2016

(Note also this was on TP5 - I would strongly recommend seeing if the behaviour is different on 14393/RTM)

@godefroi
Copy link
Contributor Author

@godefroi godefroi commented Sep 27, 2016

@jhowardmsft This is a Windows Server container, and a graceful shutdown (docker stop). I installed today the evaluation version that was released yesterday (which I assume is 14393, but am not at my machine currently to verify), but I have not re-tested. I will in the morning, and update.

@godefroi
Copy link
Contributor Author

@godefroi godefroi commented Sep 28, 2016

@jhowardmsft I gave it a shot this morning, and this is what I found. First, the versions I'm now working with:

docker info says this about my Windows version:

Kernel Version: 10.0 14393 (14393.206.amd64fre.rs1_release.160915-0644)
Operating System: Windows Server 2016 Standard Evaluation

docker version says this:

Client:
 Version:      1.12.2-cs2-ws-beta-rc1
 API version:  1.25
 Go version:   go1.7.1
 Git commit:   62d9ff9
 Built:        Fri Sep 23 20:50:29 2016
 OS/Arch:      windows/amd64

Server:
 Version:      1.12.2-cs2-ws-beta-rc1
 API version:  1.25
 Go version:   go1.7.1
 Git commit:   62d9ff9
 Built:        Fri Sep 23 20:50:29 2016
 OS/Arch:      windows/amd64

My test application is a C# application, and I'm using the following methods to detect system shutdown:

  • I'm hooking the Console.CancelKeyPress event.
  • I'm hooking the AppDomain.CurrentDomain.ProcessExit and AppDomain.CurrentDomain.DomainUnload events.
  • I'm hooking the SessionEnding, SessionEnded, SessionSwitch, EventsThreadShutdown, and PowerModeChanged events from the Microsoft.Win32.SystemEvents class.
  • I'm setting up a handler for the SIGTERM signal.
  • I'm setting a handler using SetConsoleCtrlHandler.
  • I'm running a form that overrides WndProc and watches for the WM_CLOSE message.

Of these methods, when I run the application and shut it down using shutdown /t 0 /r as you suggested, only the Microsoft.Win32.SystemEvents.SessionEnding and Microsoft.Win32.SystemEvents.SessionEnded events are raised (which surprised me; maybe I should trap WM_QUERYENDSESSION too). When I shut down a Windows Container using docker stop, none of the events are raised.

I am happy to provide the complete source to the app I'm using to test, if you'd like.

@lowenna
Copy link
Member

@lowenna lowenna commented Sep 28, 2016

Getting the app would be useful, although I'd probably just be a go-between for the kernel team who would need to dig into what is going on.

@godefroi
Copy link
Contributor Author

@godefroi godefroi commented Sep 29, 2016

Here's my test app: show_stop.txt

Hopefully it provides some helpful insight.

@lowenna
Copy link
Member

@lowenna lowenna commented Oct 17, 2016

@darrenstahlmsft FYI

@sandersaares
Copy link

@sandersaares sandersaares commented Nov 11, 2016

Do I understand it right that there is currently no way to get a notification that a Windows container is shutting down? How are containerized apps intended to avoid incomplete writes in such a situation? Or do I miss something here?

@godefroi
Copy link
Contributor Author

@godefroi godefroi commented Nov 14, 2016

@sandersaares That is my understanding with current versions, yes.

@PatrickLang
Copy link

@PatrickLang PatrickLang commented May 15, 2017

This is still in progress. Changes are needed to the Windows platform. MS#8633377. Will share more info as things change.

@marcosnils
Copy link

@marcosnils marcosnils commented Jun 9, 2017

@PatrickLang is that MS#8633377 case public accessible?

@PatrickLang
Copy link

@PatrickLang PatrickLang commented Jun 10, 2017

It's internal. I referenced it so my team could find it and you would know it's being worked on. It's still on backlog for a future release.

@rgl
Copy link
Contributor

@rgl rgl commented Sep 3, 2017

@jhowardmsft @PatrickLang any news about this?

I've tested the current behavior from plain C applications and posted the results at rgl/docker-windows-2016-vagrant, essentially:

Windows containers cannot be gracefully shutdown, either there is no shutdown notification or they are forcefully terminated after a while.

The next table describes whether a docker stop --time 30 <container> will graceful shutdown a container that is running a console, gui, or service app.

base image app behaviour
nanoserver console does not receive the shutdown notification
windowsservercore console receives the shutdown notification but is killed after about 5 seconds
nanoserver gui fails to run RegisterClass (there's no GUI support in nano)
windowsservercore gui receives the shutdown notification but is killed after about 5 seconds
nanoserver service does not receive the shutdown notification
windowsservercore service does not receive the shutdown notification
@sandersaares
Copy link

@sandersaares sandersaares commented Sep 4, 2017

@PatrickLang will this feature be improved in RS3?

@riverar
Copy link

@riverar riverar commented Sep 11, 2017

@rgl I suspect the reason your services don't get notifications is because

  • WaitToKillServiceTimeout defaults to ~20 seconds
  • Docker kills the container way before then

Can you re-test with SERVICE_ACCEPT_PRESHUTDOWN? Should work and align it with the others (but still die after ~5 seconds).

I presume this is still an open issue due to docker not supporting container shutdown deferral (for good reasons). Maybe an orchestrator-side flag to opt into this potentially unwanted behavior is in order (e.g. --allow-shutdown-deferral)?

@godefroi
Copy link
Contributor Author

@godefroi godefroi commented Sep 11, 2017

@riverar It's an open issue because Docker on Windows doesn't ask nicely to shut down processes, it simply kills them. This behavior does not match Docker on other platforms. This is (per @PatrickLang) due to issues in Windows.

@riverar
Copy link

@riverar riverar commented Sep 11, 2017

That seems to jive with my experience, but runs counter to @jhowardmsft's statement about changes in Windows to call InitateSystemShutdown. I'm running latest insider bits, so if it's not in here, those changes probably never made it into RS3.

Maybe John or @PatrickLang can update us.

Update: I'm guessing this is a larger issue of non-hyperv containers not having a winlogon.exe to do all the maid work during shutdown.

@godefroi
Copy link
Contributor Author

@godefroi godefroi commented Sep 11, 2017

@riverar All my containers are non-hyperv; @rgl were you testing in HyperV containers? Your results do not match my results (and that could absolutely be my fault).

@darstahl
Copy link
Contributor

@darstahl darstahl commented Sep 11, 2017

Sorry for the delay updating this thread (Thanks @riverar for pinging me via email). Here's the current status on this.

In RS3 (currently available in insider preview builds) there is a partial fix available. The partial fix appears to be what @rgl is testing against. Processes started by Docker are able to register for console notifications via Kernel32.SetConsoleCtrlHandler. Prior to shutting down the container (when initiated by Docker) all applications tracked by Docker (applications started via docker run, docker exec etc.) are sent a CTRL_CLOSE_EVENT and given 5 seconds to shut down, after this, InitiateSystemShutdown is called, and it is up to the kernel to handle it from there (which currently terminates applications). Note that this notification does not currently propagate to child processes, so docker run image cmd.exe /c MyApp.exe will notify cmd.exe, but not MyApp.exe.

This means that most console and GUI applications will receive the notification, as most application runtimes register for these notifications and send them via the runtime specific shutdown mechanisms.

Services will not currently get the exit notification without a kernel fix which did not make RS3. It is possible to work around this by using a shim application which acts as the container entrypoint and manages starting the service at container start, and stopping the service when the shim receives the CTRL_CLOSE_EVENT. I can write a proof of concept for this if someone would like a starting point to work from.

I'm open to feedback on the current approach for future releases (though I hope to get kernel support for this so it works like regular Windows). Let me know what shutdown features are necessary for your application that are not possible to do with the console notification and 5 second timeout prior to kill.

@darstahl
Copy link
Contributor

@darstahl darstahl commented Sep 11, 2017

@godefroi This fix is available starting in RS3, so the testing is probably on an insider preview build. This works in both HyperV containers and Windows Server containers running both nanoserver and windowsservercore.

@rgl The above chart looks right for the current RS3 state, except that in my testing nanoserver console apps correctly receive the notification exactly the same as windowsservercore. I'll try to take a look at your examples and see if I can see what is happening.

@matt-richardson
Copy link
Contributor

@matt-richardson matt-richardson commented Sep 11, 2017

@darrenstahlmsft - Whats an RS3 when its around?

@darstahl
Copy link
Contributor

@darstahl darstahl commented Sep 11, 2017

@matt-richardson Sorry, I was using Microsoft internal language 👼

RS3 is the internal name for the Fall 2017 Windows Semi-Annual Channel release for Windows Server 2016, or the Windows 10 Fall Creators Update depending if you are on Server or Client SKU respectively. It is currently available for preview in the Windows Insider program for the public to download and test, and will be going public some time this fall (I don't think I can share a date yet). For more info on the Windows Server side, see this link.

We generally refer to it as RS3 for brevity 😄

@rgl
Copy link
Contributor

@rgl rgl commented Sep 12, 2017

oh d'oh... I didn't actually implemented the (PRE)SHUTDOWN when I originally tested this! I've updated the source, and the service now receives the PRESHUTDOWN (but not SHUTDOWN) notification. Here's the updated table (only the service rows have changed).

The next table describes whether a docker stop --time 600 <container> will graceful shutdown a container that is running a console, gui, or service app.

base image app behaviour
nanoserver console does not receive the shutdown notification
windowsservercore console receives the shutdown notification but is killed after about 5 seconds
nanoserver gui fails to run RegisterClass (there's no GUI support in nano)
windowsservercore gui receives the shutdown notification but is killed after about 5 seconds
nanoserver service only receives the pre shutdown notification but is killed after about 195 seconds
windowsservercore service only receives the pre shutdown notification but is killed after about 195 seconds

NG setting WaitToKillServiceTimeout (e.g. Set-ItemProperty -Path HKLM:\SYSTEM\CurrentControlSet\Control -Name WaitToKillServiceTimeout -Value '450000') does not have any effect on extending the kill service timeout.

do you guys known from where that (approximate) 195 seconds timeout comes from? anyways, I would only expect the container to be killed after the timeout specified at the docker --time argument expires.

I'm testing this on a Windows Server 2016 (10.0.14393.1532) VM and with the following docker base images.

microsoft/windowsservercore:
==> default: be84290c2315 5 weeks ago Install update 10.0.14393.1593 2.63GB
==> default: 9 months ago Apply image 10.0.14393.0 7.68GB

microsoft/nanoserver:
==> default: 28dad12ef0bc 5 weeks ago Install update 10.0.14393.1593 368MB
==> default: 9 months ago Apply image 10.0.14393.0 701MB

@rgl
Copy link
Contributor

@rgl rgl commented Sep 12, 2017

@godefroi I'm not using Hyper-V. Please, see my previous comment to known what I'm using.

@swernli
Copy link
Contributor

@swernli swernli commented Apr 30, 2019

Ah, that explains the difference then. There have been several changes to the whole stack, kernel all the way up to services and utilities, that run inside of the container image. These are not planned to be backported that I'm aware of; given that 2019 is the current ltsc, the expectation is that folks will build their containers and infrastructure off of that instead. However, that also means you need to use Windows 2019 as your host OS, since Windows container images cannot be run on older hosts (see [https://docs.microsoft.com/en-us/virtualization/windowscontainers/deploy-containers/version-compatibility](Windows Container Version Compatibility) for more info).

I can understand that the shutdown problem exhibited in the server 2016 image causing database corruption is a blocker for you. The best tactic would be to update your host to 2019 and begin using the ltsc2019 image for your containers. The only other hack/workaround I can image is having a custom tool or script that will allow you to signal to your workload that it should clean-up and exit, and then triggering that instead of docker stop. Once the workload process exits, docker will automatically clean up the container. I'm not sure if Mongo offers something like an on-demaind shutdown of the database, so using the latest ltsc release would be safer.

@drnybble
Copy link

@drnybble drnybble commented May 1, 2019

OK thanks for the detailed response; we'll have to investigate using docker exec to force a clean Mongo shutdown until such time as we can deprecate Windows 2016 (we ship a product on Windows containers...)

As a follow up -- our script we use to invoke docker-compose now intercepts any calls that might shut down Mongo and first gracefully shuts it down. We also add a PowerShell script to the Windows Shutdown scripts to gracefully shut down Mongo if the system is being shut down.

Still disappointed that Windows Containers on 2016 (which has mainstream support until 2020) is basically broken.

@brettjacobson
Copy link

@brettjacobson brettjacobson commented May 1, 2019

How does this stop signal get propated; my Docker ENTRYPOINT is powershell.exe, then I use the CMD to do myInitScript.ps1; my-exe-that-Iwant-graceful-stop.exe.
But it does not seem that my exe is getting the graceful shutdown signal (using latest 2019 container).

@dslack
Copy link

@dslack dslack commented May 3, 2019

@brettjacobson I'm in a similar scenario as you, however, our exe is actually invoked within our powershell script (and thats being directly executed via the ENTRYPOINT). Had to modify my C# app to use @OnurGumus source above to properly capture the close events, but, our apps are getting the stop signal.

tianon added a commit to tianon/go that referenced this issue Jul 26, 2019
…NT as SIGTERM on Windows

This matches the existing behavior of treating CTRL_C_EVENT, CTRL_BREAK_EVENT as a synthesized SIGINT event.

See https://docs.microsoft.com/en-us/windows/console/handlerroutine for a good documentation source upstream to confirm these values.

As for the usage of these events, the "Timeouts" section of that upstream documentation is important to note, especially the limited window in which to do any cleanup before the program will be forcibly killed (defaults typically 5s, but as low as 500ms, and in many cases configurable system-wide).

These events are especially relevant for Windows containers, where these events (particularly `CTRL_SHUTDOWN_EVENT`) are one of the only ways containers can "gracefully" shut down (moby/moby#25982 (comment)).

This also updates the vendoring of "golang.org/x/sys" under "cmd" to include CL 187578 (adding these same symbols).

Fixes golang#7479
tianon added a commit to tianon/go that referenced this issue Jul 29, 2019
…NT as SIGTERM on Windows

This matches the existing behavior of treating CTRL_C_EVENT, CTRL_BREAK_EVENT as a synthesized SIGINT event.

See https://docs.microsoft.com/en-us/windows/console/handlerroutine for a good documentation source upstream to confirm these values.

As for the usage of these events, the "Timeouts" section of that upstream documentation is important to note, especially the limited window in which to do any cleanup before the program will be forcibly killed (defaults typically 5s, but as low as 500ms, and in many cases configurable system-wide).

These events are especially relevant for Windows containers, where these events (particularly `CTRL_SHUTDOWN_EVENT`) are one of the only ways containers can "gracefully" shut down (moby/moby#25982 (comment)).

This was verified by making a simple `main()` which implements the same code as in `ExampleNotify_allSignals` but in a `for` loop, building a `main.exe`, running that in a container, then doing `docker kill -sTERM` on said container.  The program prints `Got signal: SIGTERM`, then exits after the aforementioned timeout, as expected.  Behavior before this patch is that the program gets no notification (and thus no output) but still exits after the timeout.

Fixes golang#7479
tianon added a commit to tianon/go that referenced this issue Jul 29, 2019
…NT as SIGTERM on Windows

This matches the existing behavior of treating CTRL_C_EVENT, CTRL_BREAK_EVENT as a synthesized SIGINT event.

See https://docs.microsoft.com/en-us/windows/console/handlerroutine for a good documentation source upstream to confirm these values.

As for the usage of these events, the "Timeouts" section of that upstream documentation is important to note, especially the limited window in which to do any cleanup before the program will be forcibly killed (defaults typically 5s, but as low as 500ms, and in many cases configurable system-wide).

These events are especially relevant for Windows containers, where these events (particularly `CTRL_SHUTDOWN_EVENT`) are one of the only ways containers can "gracefully" shut down (moby/moby#25982 (comment)).

This was verified by making a simple `main()` which implements the same code as in `ExampleNotify_allSignals` but in a `for` loop, building a `main.exe`, running that in a container, then doing `docker kill -sTERM` on said container.  The program prints `Got signal: SIGTERM`, then exits after the aforementioned timeout, as expected.  Behavior before this patch is that the program gets no notification (and thus no output) but still exits after the timeout.

Fixes golang#7479
tianon added a commit to tianon/go that referenced this issue Aug 5, 2019
…NT as SIGTERM on Windows

This matches the existing behavior of treating CTRL_C_EVENT, CTRL_BREAK_EVENT as a synthesized SIGINT event.

See https://docs.microsoft.com/en-us/windows/console/handlerroutine for a good documentation source upstream to confirm these values.

As for the usage of these events, the "Timeouts" section of that upstream documentation is important to note, especially the limited window in which to do any cleanup before the program will be forcibly killed (defaults typically 5s, but as low as 500ms, and in many cases configurable system-wide).

These events are especially relevant for Windows containers, where these events (particularly `CTRL_SHUTDOWN_EVENT`) are one of the only ways containers can "gracefully" shut down (moby/moby#25982 (comment)).

This was verified by making a simple `main()` which implements the same code as in `ExampleNotify_allSignals` but in a `for` loop, building a `main.exe`, running that in a container, then doing `docker kill -sTERM` on said container.  The program prints `Got signal: SIGTERM`, then exits after the aforementioned timeout, as expected.  Behavior before this patch is that the program gets no notification (and thus no output) but still exits after the timeout.

Fixes golang#7479
tianon added a commit to tianon/go that referenced this issue Aug 14, 2019
…NT as SIGTERM on Windows

This matches the existing behavior of treating CTRL_C_EVENT, CTRL_BREAK_EVENT as a synthesized SIGINT event.

See https://docs.microsoft.com/en-us/windows/console/handlerroutine for a good documentation source upstream to confirm these values.

As for the usage of these events, the "Timeouts" section of that upstream documentation is important to note, especially the limited window in which to do any cleanup before the program will be forcibly killed (defaults typically 5s, but as low as 500ms, and in many cases configurable system-wide).

These events are especially relevant for Windows containers, where these events (particularly `CTRL_SHUTDOWN_EVENT`) are one of the only ways containers can "gracefully" shut down (moby/moby#25982 (comment)).

This was verified by making a simple `main()` which implements the same code as in `ExampleNotify_allSignals` but in a `for` loop, building a `main.exe`, running that in a container, then doing `docker kill -sTERM` on said container.  The program prints `Got signal: SIGTERM`, then exits after the aforementioned timeout, as expected.  Behavior before this patch is that the program gets no notification (and thus no output) but still exits after the timeout.

Fixes golang#7479
gopherbot pushed a commit to golang/go that referenced this issue Aug 29, 2019
…NT as SIGTERM on Windows

This matches the existing behavior of treating CTRL_C_EVENT, CTRL_BREAK_EVENT as a synthesized SIGINT event.

See https://docs.microsoft.com/en-us/windows/console/handlerroutine for a good documentation source upstream to confirm these values.

As for the usage of these events, the "Timeouts" section of that upstream documentation is important to note, especially the limited window in which to do any cleanup before the program will be forcibly killed (defaults typically 5s, but as low as 500ms, and in many cases configurable system-wide).

These events are especially relevant for Windows containers, where these events (particularly `CTRL_SHUTDOWN_EVENT`) are one of the only ways containers can "gracefully" shut down (moby/moby#25982 (comment)).

This was verified by making a simple `main()` which implements the same code as in `ExampleNotify_allSignals` but in a `for` loop, building a `main.exe`, running that in a container, then doing `docker kill -sTERM` on said container.  The program prints `Got signal: SIGTERM`, then exits after the aforementioned timeout, as expected.  Behavior before this patch is that the program gets no notification (and thus no output) but still exits after the timeout.

Fixes #7479

Change-Id: I2af79421cd484a0fbb9467bb7ddb5f0e8bc3610e
GitHub-Last-Rev: 9e05d63
GitHub-Pull-Request: #33311
Reviewed-on: https://go-review.googlesource.com/c/go/+/187739
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
tomocy added a commit to tomocy/go that referenced this issue Sep 1, 2019
…NT as SIGTERM on Windows

This matches the existing behavior of treating CTRL_C_EVENT, CTRL_BREAK_EVENT as a synthesized SIGINT event.

See https://docs.microsoft.com/en-us/windows/console/handlerroutine for a good documentation source upstream to confirm these values.

As for the usage of these events, the "Timeouts" section of that upstream documentation is important to note, especially the limited window in which to do any cleanup before the program will be forcibly killed (defaults typically 5s, but as low as 500ms, and in many cases configurable system-wide).

These events are especially relevant for Windows containers, where these events (particularly `CTRL_SHUTDOWN_EVENT`) are one of the only ways containers can "gracefully" shut down (moby/moby#25982 (comment)).

This was verified by making a simple `main()` which implements the same code as in `ExampleNotify_allSignals` but in a `for` loop, building a `main.exe`, running that in a container, then doing `docker kill -sTERM` on said container.  The program prints `Got signal: SIGTERM`, then exits after the aforementioned timeout, as expected.  Behavior before this patch is that the program gets no notification (and thus no output) but still exits after the timeout.

Fixes golang#7479

Change-Id: I2af79421cd484a0fbb9467bb7ddb5f0e8bc3610e
GitHub-Last-Rev: 9e05d63
GitHub-Pull-Request: golang#33311
Reviewed-on: https://go-review.googlesource.com/c/go/+/187739
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
t4n6a1ka added a commit to t4n6a1ka/go that referenced this issue Sep 5, 2019
…NT as SIGTERM on Windows

This matches the existing behavior of treating CTRL_C_EVENT, CTRL_BREAK_EVENT as a synthesized SIGINT event.

See https://docs.microsoft.com/en-us/windows/console/handlerroutine for a good documentation source upstream to confirm these values.

As for the usage of these events, the "Timeouts" section of that upstream documentation is important to note, especially the limited window in which to do any cleanup before the program will be forcibly killed (defaults typically 5s, but as low as 500ms, and in many cases configurable system-wide).

These events are especially relevant for Windows containers, where these events (particularly `CTRL_SHUTDOWN_EVENT`) are one of the only ways containers can "gracefully" shut down (moby/moby#25982 (comment)).

This was verified by making a simple `main()` which implements the same code as in `ExampleNotify_allSignals` but in a `for` loop, building a `main.exe`, running that in a container, then doing `docker kill -sTERM` on said container.  The program prints `Got signal: SIGTERM`, then exits after the aforementioned timeout, as expected.  Behavior before this patch is that the program gets no notification (and thus no output) but still exits after the timeout.

Fixes golang#7479

Change-Id: I2af79421cd484a0fbb9467bb7ddb5f0e8bc3610e
GitHub-Last-Rev: 9e05d63
GitHub-Pull-Request: golang#33311
Reviewed-on: https://go-review.googlesource.com/c/go/+/187739
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
@olandese
Copy link

@olandese olandese commented Oct 16, 2019

Hello,

I am running the following powershell script in a Windows Container:

try
  {
    Write-Host "3. Configuring Azure Pipelines agent..." -ForegroundColor Cyan
  
    .\config.cmd --unattended `
      --agent "$(if (Test-Path Env:AZP_AGENT_NAME) { ${Env:AZP_AGENT_NAME} } else { ${Env:computername} })" `
      --url "$(${Env:AZP_URL})" `
      --auth PAT `
      --token "$(Get-Content ${Env:AZP_TOKEN_FILE})" `
      --pool "$(if (Test-Path Env:AZP_POOL) { ${Env:AZP_POOL} } else { 'Default' })" `
      --work "$(if (Test-Path Env:AZP_WORK) { ${Env:AZP_WORK} } else { '_work' })" `
      --replace
  
    Write-Host "4. Running Azure Pipelines agent..." -ForegroundColor Cyan
  
    .\run.cmd
  } 
  finally
  {
    Write-Host "Cleanup. Removing Azure Pipelines agent..." -ForegroundColor Cyan
  
    .\config.cmd remove --unattended `
      --auth PAT `
      --token "$(Get-Content ${Env:AZP_TOKEN_FILE})"
  }

My container is using the windowsservercore-1903 image.

Unfortunately when I stop the container using docker stop, with or without -t 60 , the finally block is never triggered.
Can someone help me with this?

@eugen-nw
Copy link

@eugen-nw eugen-nw commented Dec 9, 2019

I cannot seem to be able to get the delayed container termination to work. I wonder if anyone more experienced can spot my fault please. I won't need 2 hours for clean-up, more like one hour to finish the computation at hand.

My container runs a Windows Console application in Azure Kubernetes. I'm doing the SetConsoleCtrlHandler subscription, I catch the CTRL_SHUTDOWN_EVENT (6) and Thread.Sleep(TimeSpan.FromSeconds(7200)); so the SIGKILL won't get sent to the container. The container receives the CTRL_SHUTDOWN_EVENT and logs on a separate thread one message/second to show how long it kept waiting.

If I do "kubectl delete pod ...", I get a 9-13 second grace period, which is way shorter than what I need.

FROM mcr.microsoft.com/windows/servercore:1809
USER ContainerAdministrator
RUN reg add hklm\system\currentcontrolset\services\cexecsvc /v ProcessShutdownTimeoutSeconds /t REG_DWORD /d 7200 && \
    reg add hklm\system\currentcontrolset\control /v WaitToKillServiceTimeout /t REG_SZ /d 7200000 /f
ADD publish/ /
ENTRYPOINT ExternalSolverRunner.exe

This part works OK. If I run the container on my computer and call "docker stop -t <600>", the container waits for that duration.

Section of the .yaml file:

spec:
  replicas: 1
  selector:
    matchLabels:
      app: aks-aci-boldiq-external-solver-runner
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: aks-aci-boldiq-external-solver-runner
    spec:
      terminationGracePeriodSeconds: 7200
      containers:
      - image: ...
        imagePullPolicy: IfNotPresent
        name: boldiq-external-solver-runner
        resources:
          requests:
            memory: 8G
            cpu: 1
      imagePullSecrets:
        - name: docker-registry-secret-official
      nodeName: virtual-kubelet-aci-connector-windows-windows-westus
@cphillips83
Copy link

@cphillips83 cphillips83 commented Jan 21, 2020

@OnurGumus Glad to have that confirmed! I do remember running into that difference when testing this, but couldn't root cause why terminal emulation behaved differently in nanoserver vs servercore. But since removing -t worked across both, it seemed like the right general solution. Hopefully this unblocks you from being able to rely on the correct shutdown behavior in your usage.

@OnurGumus @swernli I just wanted to comment in case this helps others too. If you don't add an event handler to CancelKeyPress, RunConsoleAsync just hangs after calling the hosted service StopAsync

System.Console.CancelKeyPress += (e, s) => { };

Here is how I FINALLY achieved graceful shutdown on windows containers after 2 days of beating my head against the wall.

Using the following...

USER ContainerAdministrator
RUN reg add hklm\system\currentcontrolset\services\cexecsvc /v ProcessShutdownTimeoutSeconds /t REG_DWORD /d 7200  
RUN reg add hklm\system\currentcontrolset\control /v WaitToKillServiceTimeout /t REG_SZ /d 7200000 /f

Starting the container with

docker run -id

Stopping the container with

docker stop -t 7200

This was all in conjuction with using SetConsoleCtrlHandler

sunghwan2789 added a commit to sunghwan2789/minecraft-server that referenced this issue Mar 9, 2020
@weijuans-msft
Copy link

@weijuans-msft weijuans-msft commented Mar 25, 2020

@cphillips83 @OnurGumus can you confirm if this is still an issue, or now you have a workaround? Also do people need it for both Nano Server container and Server Core container?

Sorry @cphillips83 to see you have to bead your head against the wall for 2 days. ouch ...

For Microsoft folks, I created a new bug 25695040. I couldn't find the on Patrick gave 8633377.

@fedorbirjukov
Copy link

@fedorbirjukov fedorbirjukov commented Mar 25, 2020

@weijuans-msft We need it in all windows images. So, yes in both Nano Server and in Server Core.

@timpclip
Copy link

@timpclip timpclip commented Sep 28, 2020

I am trying something very similar to others here but am having an issue getting docker stop to wait longer than 5 minutes.
I created a scaled-down test program that just does Thread.Sleep for 20 minutes. I deployed it in a docker container with base image: windows servercore:1909

I'm using the lines:
USER ContainerAdministrator
RUN reg add hklm\system\currentcontrolset\services\cexecsvc /v ProcessShutdownTimeoutSeconds /t REG_DWORD /d 2700 /f
RUN reg add hklm\system\currentcontrolset\control /v WaitToKillServiceTimeout /t REG_SZ /d 2700000 /f

(This is set to 45 minutes which is what I actually want to use when I get this working)

I start it using: docker run -id
I stop it using: docker stop -t 400

I expect that it should wait 400 second (over 6 minutes) before killing the container.

When I run it on my desktop which is Windows 10 Professional version 2004, it works as expected.
But when I try it on hosts running Windows Server Datacenter version 1909, it stops after exactly 5 minutes.

I upgraded both Docker engines to version 19.03.12 but observe the same behavior as before.

Should this work as I am expecting? If so, any ideas what could be wrong?
Any help appreciated.

@zleight1
Copy link

@zleight1 zleight1 commented Oct 15, 2020

Hello,

I am running the following powershell script in a Windows Container:

try
  {
    Write-Host "3. Configuring Azure Pipelines agent..." -ForegroundColor Cyan
  
    .\config.cmd --unattended `
      --agent "$(if (Test-Path Env:AZP_AGENT_NAME) { ${Env:AZP_AGENT_NAME} } else { ${Env:computername} })" `
      --url "$(${Env:AZP_URL})" `
      --auth PAT `
      --token "$(Get-Content ${Env:AZP_TOKEN_FILE})" `
      --pool "$(if (Test-Path Env:AZP_POOL) { ${Env:AZP_POOL} } else { 'Default' })" `
      --work "$(if (Test-Path Env:AZP_WORK) { ${Env:AZP_WORK} } else { '_work' })" `
      --replace
  
    Write-Host "4. Running Azure Pipelines agent..." -ForegroundColor Cyan
  
    .\run.cmd
  } 
  finally
  {
    Write-Host "Cleanup. Removing Azure Pipelines agent..." -ForegroundColor Cyan
  
    .\config.cmd remove --unattended `
      --auth PAT `
      --token "$(Get-Content ${Env:AZP_TOKEN_FILE})"
  }

My container is using the windowsservercore-1903 image.

Unfortunately when I stop the container using docker stop, with or without -t 60 , the finally block is never triggered.
Can someone help me with this?

@olandese Did you ever manage to get this to work?

@artlogic
Copy link

@artlogic artlogic commented Mar 24, 2021

Just a note here for anyone who find this issue and was as confused as I was about all this:

  1. For Windows containers, the -t parameter of docker stop seems to do nothing. There seems to be no way to specify that value from outside the container. The command issues a CTRL_SHUTDOWN_EVENT immediately regardless of any --time set.
  2. The first registry value change, RUN reg add hklm\system\currentcontrolset\services\cexecsvc /v ProcessShutdownTimeoutSeconds /t REG_DWORD /d 7200, causes Windows to delay sending the shutdown event for the specified number of seconds. I'm not exactly sure how useful this is - unless you've used some other channel to tell your process to shut down.
  3. The second registry value change is more useful: RUN reg add hklm\system\currentcontrolset\control /v WaitToKillServiceTimeout /t REG_SZ /d 7200000 /f. This causes Windows to wait the specified number of milliseconds before killing a process after issuing a CTRL_SHUTDOWN_EVENT - essentially the rough equivalent of docker stop -t.
  4. If you don't delay while handling the CTRL_SHUTDOWN_EVENT your process will stop immediately, regardless of any delay set (much like how handling SIGTERM would work).

I hope this helps someone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet