Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: handling of CTRL_CLOSE_EVENT seems broken #41884

Open
ncruces opened this issue Oct 9, 2020 · 6 comments · May be fixed by #41886
Open

runtime: handling of CTRL_CLOSE_EVENT seems broken #41884

ncruces opened this issue Oct 9, 2020 · 6 comments · May be fixed by #41886

Comments

@ncruces
Copy link
Contributor

@ncruces ncruces commented Oct 9, 2020

What version of Go are you using (go version)?

go version go1.15.2 windows/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

set GO111MODULE=
set GOARCH=amd64
set GOBIN=D:\Apps\Go\work\bin
set GOCACHE=C:\Users\ncruc\AppData\Local\go-build
set GOENV=C:\Users\ncruc\AppData\Roaming\go\env
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOINSECURE=
set GOMODCACHE=D:\Apps\Go\work\pkg\mod
set GONOPROXY=
set GONOSUMDB=
set GOOS=windows
set GOPATH=D:\Apps\Go\work
set GOPRIVATE=
set GOPROXY=https://proxy.golang.org,direct
set GOROOT=D:\Apps\Go\root
set GOSUMDB=sum.golang.org
set GOTMPDIR=
set GOTOOLDIR=D:\Apps\Go\root\pkg\tool\windows_amd64
set GCCGO=gccgo
set AR=ar
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=D:\MeltingSource\RethinkRAW\go.mod
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\ncruc\AppData\Local\Temp\go-build526790066=/tmp/go-build -gno-record-gcc-switches

What did you do?

I have a MCVE on https://gist.github.com/ncruces/20dbdc73d0da6e211ee56b68c2240bae

Open a new console window, build and run the example.
Then close the console window and check the log file it creates next to the executable.

go build -o sig.exe gist.github.com/ncruces/20dbdc73d0da6e211ee56b68c2240bae.git
sig.exe

What did you expect to see?

Given cl/187739 has been merged, I expected to see a SIGTERM being received by signal.Notify, and having 5 to 20s to handle cleanup.

Specifically I want to see these log lines:

2020/10/09 13:36:31 Cleaning up...
2020/10/09 13:36:32 Exited

What did you see instead?

Either SIGTERM isn't being received at all, or we're not given any time to cleanup.


The MCVE also includes code to call SetConsoleCtrlHandler (which shouldn't be needed given cl/187739).

You can activate it by running either of:

sig.exe -handle
sig.exe -handle -block

The MCVE works consistently if the handler blocks giving the rest of the program time to gracefully terminate.

This is consistent with the documentation:

So, I'm assuming that's what needs to be changed, this return needs to become a select {} or similar:

go/src/runtime/os_windows.go

Lines 1010 to 1012 in c0dded0

if sigsend(s) {
return 1
}

I could easily do a PR, but I'm not well versed with building go from source and testing the change.

Maybe commenters from #7479 can help: @alexbrainman, @tianon?

ncruces added a commit to ncruces/go that referenced this issue Oct 9, 2020
Fixes golang#41884.
@gopherbot
Copy link

@gopherbot gopherbot commented Oct 9, 2020

Change https://golang.org/cl/261057 mentions this issue: runtime: block console ctrlhandler when the signal is handled

@ALTree ALTree changed the title Handling of CTRL_CLOSE_EVENT seems broken runtime: handling of CTRL_CLOSE_EVENT seems broken Oct 9, 2020
@ianlancetaylor ianlancetaylor added this to the Backlog milestone Oct 9, 2020
@networkimprov
Copy link

@networkimprov networkimprov commented Oct 10, 2020

@mattn
Copy link
Member

@mattn mattn commented Oct 11, 2020

I don't understand yet what you goes wrong. I compiled this code and run start sig.exe on cmd.exe. Then clicked [x] button. The program quit immediately. It does not cancel closing.

#include <windows.h>
#include <stdio.h>
#include <signal.h>

BOOL WINAPI ctrl_handler(DWORD dwCtrlType) {
  switch (dwCtrlType) {
    case CTRL_C_EVENT:
      puts("ctrl-c");
      break;
    case CTRL_BREAK_EVENT:
      puts("ctrl-break");
      break;
    case CTRL_LOGOFF_EVENT:
      puts("ctrl-logoff");
      break;
    case CTRL_SHUTDOWN_EVENT:
      puts("ctrl-shutdown");
      break;
    case CTRL_CLOSE_EVENT:
      puts("ctrl-close");
      break;
  }
  return TRUE;
}

int
main(int argc, char* argv[]) {
  SetConsoleCtrlHandler(ctrl_handler, TRUE);
  puts("waiting...");
  while (TRUE) {
    Sleep(5000);
  }
  return 0;
}

FYI, CTRL_CLOSE_EVENT has a timeout constraint. https://docs.microsoft.com/en-us/windows/console/handlerroutine?redirectedfrom=MSDN#timeouts

@ncruces
Copy link
Contributor Author

@ncruces ncruces commented Oct 11, 2020

The point isn't quitting immediately. Not handling the signal does that for you (Windows just kills your process).
It is also not to cancel closing. You can't do that. Eventually, you need to die (Windows kills your process).

The point is exiting gracefully: closing DB connections, cleaning temporary files, notifying child processes, etc. For that you do need to handle the signal. The Windows API is designed in such a way that cleanup code should run synchronously in the handler.

Go decided in cl/187739 to take these (CTRL_C_EVENT, CTRL_CLOSE_EVENT, etc) and turn them into signals (SIGINT, SIGTERM) to be handled elsewhere (signal.Notify(channel)).

This works fine for CTRL_C_EVENT (turned into SIGINT) because when you say you handled those (return 1 from the ctrlhandler) Windows considers it handled and doesn't kill the process (it's an interrupt, you handled it, it's fine). This means your asynchronous code (the channel you notified) gets a chance to handle the event.

This does not work at all for CTRL_CLOSE_EVENT (turned into SIGTERM) because when you say you handled those (return 1 from the ctrlhandler) Windows considers it handled and immediately kills the process (you were asked to exit, you handled it, means you're ready to exit). This means your asynchronous code (the channel you notified) does not get a chance to handle the event (unless scheduling means your handling code gets a chance to run, though not necessarily complete).

What happens in https://gist.github.com/ncruces/20dbdc73d0da6e211ee56b68c2240bae is that the handling code does not get a chance to run. If you look at the log you get with a CTRL_C_EVENT you get this:

2020/10/09 13:36:30 Waiting...
2020/10/09 13:36:31 Received: interrupt
2020/10/09 13:36:31 Cleaning up...
2020/10/09 13:36:32 Exited

If you instead close the console (CTRL_CLOSE_EVENT), you typically only get the Waiting... line. You may have the Received: termination and Cleaning up..., but you will never get the Exited line. This means the "slow" cleaning up code (time.Sleep(time.Second)) was not given a chance to finish.

I hope this covers the need for this fix. The current checked in code is quite frankly pointless. Cleanup code may run, only to be forcefully interrupted, or it may not run at all, depending on scheduling.


Now for the rational behind the fix in #41886 (basically SLEEP(INFINITE)). You need to give your handling code a chance to cleanup. But you have no idea how much time it takes, and no way to be notified of it terminating. So your only option here is to wait "forever."

Why is this not a problem?

As you mentioned, Windows sets an upper bound into how much it will let you wait (5s for console close, 20s for shutdown). So, you'll never wait more than this.

The other question is, does this SLEEP(INFINITE) prevent you from exiting earlier? The answer is no, it does not. If func main() exits, the process is immediately terminated, even with the handler blocked. Also if you call os.Exit.

Also SLEEP(INFINITE) blocks the native thread (which is what we intended), but doesn't prevent other goroutines from being scheduled into all the available GOMAXPROCS.


Finally, and if it helps, as I mentioned in cl/187739, SLEEP(INFINITE) is precisely how libuv (used by Node.js) handles this, and there's a nice comment explaining why:
https://github.com/libuv/libuv/blob/dec0723864c2a1d41cfbab8164c5683f5cffff14/src/win/signal.c#L130

@networkimprov
Copy link

@networkimprov networkimprov commented Oct 11, 2020

See also #40167 & #40074

cc @zx2c4

@ncruces
Copy link
Contributor Author

@ncruces ncruces commented Oct 11, 2020

I didn't investigate service issues.

I added a test case for this (though it briefly flashes a console window and syncs over UDP, not sure that's acceptable).

There were no test cases for services, if there are any pointers to stuff I can test for regressions, please link them here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

6 participants
You can’t perform that action at this time.