Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: handling of CTRL_CLOSE_EVENT seems broken #41884

Closed
ncruces opened this issue Oct 9, 2020 · 17 comments
Closed

runtime: handling of CTRL_CLOSE_EVENT seems broken #41884

ncruces opened this issue Oct 9, 2020 · 17 comments

Comments

@ncruces
Copy link
Contributor

@ncruces ncruces commented Oct 9, 2020

What version of Go are you using (go version)?

go version go1.15.2 windows/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

set GO111MODULE=
set GOARCH=amd64
set GOBIN=D:\Apps\Go\work\bin
set GOCACHE=C:\Users\ncruc\AppData\Local\go-build
set GOENV=C:\Users\ncruc\AppData\Roaming\go\env
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOINSECURE=
set GOMODCACHE=D:\Apps\Go\work\pkg\mod
set GONOPROXY=
set GONOSUMDB=
set GOOS=windows
set GOPATH=D:\Apps\Go\work
set GOPRIVATE=
set GOPROXY=https://proxy.golang.org,direct
set GOROOT=D:\Apps\Go\root
set GOSUMDB=sum.golang.org
set GOTMPDIR=
set GOTOOLDIR=D:\Apps\Go\root\pkg\tool\windows_amd64
set GCCGO=gccgo
set AR=ar
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\ncruc\AppData\Local\Temp\go-build526790066=/tmp/go-build -gno-record-gcc-switches

What did you do?

I have a MCVE on https://gist.github.com/ncruces/20dbdc73d0da6e211ee56b68c2240bae

Open a new console window, build and run the example.
Then close the console window and check the log file it creates next to the executable.

go build -o sig.exe gist.github.com/ncruces/20dbdc73d0da6e211ee56b68c2240bae.git
sig.exe

What did you expect to see?

Given cl/187739 has been merged, I expected to see a SIGTERM being received by signal.Notify, and having 5 to 20s to handle cleanup.

Specifically I want to see these log lines:

2020/10/09 13:36:31 Cleaning up...
2020/10/09 13:36:32 Exited

What did you see instead?

Either SIGTERM isn't being received at all, or we're not given any time to cleanup.


The MCVE also includes code to call SetConsoleCtrlHandler (which shouldn't be needed given cl/187739).

You can activate it by running either of:

sig.exe -handle
sig.exe -handle -block

The MCVE works consistently if the handler blocks giving the rest of the program time to gracefully terminate.

This is consistent with the documentation:

So, I'm assuming that's what needs to be changed, this return needs to become a select {} or similar:

go/src/runtime/os_windows.go

Lines 1010 to 1012 in c0dded0

if sigsend(s) {
return 1
}

I could easily do a PR, but I'm not well versed with building go from source and testing the change.

Maybe commenters from #7479 can help: @alexbrainman, @tianon?

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 9, 2020

Change https://golang.org/cl/261057 mentions this issue: runtime: block console ctrlhandler when the signal is handled

@ALTree ALTree changed the title Handling of CTRL_CLOSE_EVENT seems broken runtime: handling of CTRL_CLOSE_EVENT seems broken Oct 9, 2020
@ianlancetaylor ianlancetaylor added this to the Backlog milestone Oct 9, 2020
@networkimprov
Copy link

@networkimprov networkimprov commented Oct 10, 2020

@mattn
Copy link
Member

@mattn mattn commented Oct 11, 2020

I don't understand yet what you goes wrong. I compiled this code and run start sig.exe on cmd.exe. Then clicked [x] button. The program quit immediately. It does not cancel closing.

#include <windows.h>
#include <stdio.h>
#include <signal.h>

BOOL WINAPI ctrl_handler(DWORD dwCtrlType) {
  switch (dwCtrlType) {
    case CTRL_C_EVENT:
      puts("ctrl-c");
      break;
    case CTRL_BREAK_EVENT:
      puts("ctrl-break");
      break;
    case CTRL_LOGOFF_EVENT:
      puts("ctrl-logoff");
      break;
    case CTRL_SHUTDOWN_EVENT:
      puts("ctrl-shutdown");
      break;
    case CTRL_CLOSE_EVENT:
      puts("ctrl-close");
      break;
  }
  return TRUE;
}

int
main(int argc, char* argv[]) {
  SetConsoleCtrlHandler(ctrl_handler, TRUE);
  puts("waiting...");
  while (TRUE) {
    Sleep(5000);
  }
  return 0;
}

FYI, CTRL_CLOSE_EVENT has a timeout constraint. https://docs.microsoft.com/en-us/windows/console/handlerroutine?redirectedfrom=MSDN#timeouts

@ncruces
Copy link
Contributor Author

@ncruces ncruces commented Oct 11, 2020

The point isn't quitting immediately. Not handling the signal does that for you (Windows just kills your process).
It is also not to cancel closing. You can't do that. Eventually, you need to die (Windows kills your process).

The point is exiting gracefully: closing DB connections, cleaning temporary files, notifying child processes, etc. For that you do need to handle the signal. The Windows API is designed in such a way that cleanup code should run synchronously in the handler.

Go decided in cl/187739 to take these (CTRL_C_EVENT, CTRL_CLOSE_EVENT, etc) and turn them into signals (SIGINT, SIGTERM) to be handled elsewhere (signal.Notify(channel)).

This works fine for CTRL_C_EVENT (turned into SIGINT) because when you say you handled those (return 1 from the ctrlhandler) Windows considers it handled and doesn't kill the process (it's an interrupt, you handled it, it's fine). This means your asynchronous code (the channel you notified) gets a chance to handle the event.

This does not work at all for CTRL_CLOSE_EVENT (turned into SIGTERM) because when you say you handled those (return 1 from the ctrlhandler) Windows considers it handled and immediately kills the process (you were asked to exit, you handled it, means you're ready to exit). This means your asynchronous code (the channel you notified) does not get a chance to handle the event (unless scheduling means your handling code gets a chance to run, though not necessarily complete).

What happens in https://gist.github.com/ncruces/20dbdc73d0da6e211ee56b68c2240bae is that the handling code does not get a chance to run. If you look at the log you get with a CTRL_C_EVENT you get this:

2020/10/09 13:36:30 Waiting...
2020/10/09 13:36:31 Received: interrupt
2020/10/09 13:36:31 Cleaning up...
2020/10/09 13:36:32 Exited

If you instead close the console (CTRL_CLOSE_EVENT), you typically only get the Waiting... line. You may have the Received: termination and Cleaning up..., but you will never get the Exited line. This means the "slow" cleaning up code (time.Sleep(time.Second)) was not given a chance to finish.

I hope this covers the need for this fix. The current checked in code is quite frankly pointless. Cleanup code may run, only to be forcefully interrupted, or it may not run at all, depending on scheduling.


Now for the rational behind the fix in #41886 (basically SLEEP(INFINITE)). You need to give your handling code a chance to cleanup. But you have no idea how much time it takes, and no way to be notified of it terminating. So your only option here is to wait "forever."

Why is this not a problem?

As you mentioned, Windows sets an upper bound into how much it will let you wait (5s for console close, 20s for shutdown). So, you'll never wait more than this.

The other question is, does this SLEEP(INFINITE) prevent you from exiting earlier? The answer is no, it does not. If func main() exits, the process is immediately terminated, even with the handler blocked. Also if you call os.Exit.

Also SLEEP(INFINITE) blocks the native thread (which is what we intended), but doesn't prevent other goroutines from being scheduled into all the available GOMAXPROCS.


Finally, and if it helps, as I mentioned in cl/187739, SLEEP(INFINITE) is precisely how libuv (used by Node.js) handles this, and there's a nice comment explaining why:
https://github.com/libuv/libuv/blob/dec0723864c2a1d41cfbab8164c5683f5cffff14/src/win/signal.c#L130

@networkimprov
Copy link

@networkimprov networkimprov commented Oct 11, 2020

See also #40167 & #40074

cc @zx2c4

@ncruces
Copy link
Contributor Author

@ncruces ncruces commented Oct 11, 2020

I didn't investigate service issues.

I added a test case for this (though it briefly flashes a console window and syncs over UDP, not sure that's acceptable).

There were no test cases for services, if there are any pointers to stuff I can test for regressions, please link them here.

@ncruces
Copy link
Contributor Author

@ncruces ncruces commented Jan 2, 2021

This seems to have stalled on lack of reviews. There's now a merge conflict. Should I pursue this? Fix the conflict, anything else I can do?

@alexbrainman
Copy link
Member

@alexbrainman alexbrainman commented Jan 3, 2021

@ncruces see my comment in https://go-review.googlesource.com/c/go/+/187739/1..9//COMMIT_MSG#b19 starting with

Cool. I built this program into test.exe, and then ...

My test results contradicts your findings. Are you are saying I got lucky (or unlucky whichever way you look at it)? That there is a race in my test?

Change https://golang.org/cl/261057 mentions this issue: runtime: block console ctrlhandler when the signal is handled

Are you saying that Go does not provide function to handle CTRL_CLOSE_EVENT, CTRL_LOGOFF_EVENT and CTRL_SHUTDOWN_EVENT events?

According to your CL 261057, you propose that Go runtime change to hang in event handler. I can see two problems with your change:

  1. Some existing programs expect Go runtime exits in this situation. Now these programs would hang for some time before Windows decide to close them.
  2. I am not sure what will happen, if Windows calls Go event handler second or third or .. time while runtime is sleeping forever.

/cc @ianlancetaylor and @zx2c4 for opinion.

Alex

You propose that we change handling of CTRL_CLOSE_EVENT, CTRL_LOGOFF_EVENT and CTRL_SHUTDOWN_EVENT events

Thank you.

Alex

@ncruces
Copy link
Contributor Author

@ncruces ncruces commented Jan 3, 2021

@alexbrainman thanks for reviewing. My comments inline.

@ncruces see my comment in https://go-review.googlesource.com/c/go/+/187739/1..9//COMMIT_MSG#b19 starting with

Cool. I built this program into test.exe, and then ...

My test results contradicts your findings. Are you are saying I got lucky (or unlucky whichever way you look at it)? That there is a race in my test?

Yes, that's what I'm saying. If you modify said example code by inserting a time.Sleep(time.Second) between notification and printing, you'll never see the Got signal: terminated (with Go 1.15.6, my patch fixes this).

func main() {
	for {
		c := make(chan os.Signal, 1)
		signal.Notify(c)

		s := <-c
		time.Sleep(time.Second) // ** ADDED THIS **
		fmt.Println("Got signal:", s)
	}
}

Change https://golang.org/cl/261057 mentions this issue: runtime: block console ctrlhandler when the signal is handled

Are you saying that Go does not provide function to handle CTRL_CLOSE_EVENT, CTRL_LOGOFF_EVENT and CTRL_SHUTDOWN_EVENT events?

I'm saying at a minimum there's a race, and that you get killed before getting to do any significant clean up (I'm trying to notify child processes).

According to your CL 261057, you propose that Go runtime change to hang in event handler. I can see two problems with your change:

  1. Some existing programs expect Go runtime exits in this situation. Now these programs would hang for some time before Windows decide to close them.
  2. I am not sure what will happen, if Windows calls Go event handler second or third or .. time while runtime is sleeping forever.

Regarding 1, only programs that handle syscall.SIGTERM are affected by the change.

The conditions below ensure this (sleeping only happens if the signal is handled and if it is SIGTERM):

go/src/runtime/os_windows.go

Lines 1012 to 1019 in 03abc2d

if sigsend(s) {
if s == _SIGTERM {
// Windows terminates the process after this handler returns.
// Block indefinitely to give signal handlers a chance to clean up.
stdcall1(_Sleep, uintptr(_INFINITE))
}
return 1
}

You can test this by modifying the sample code to only listen to syscall.SIGINT and closing the console window. Result is that it exits immediately, without printing Got signal: terminated.

func main() {
	for {
		c := make(chan os.Signal, 1)
		signal.Notify(c, syscall.SIGINT) // ** MODIFIED THIS **

		s := <-c
		time.Sleep(time.Second)
		fmt.Println("Got signal:", s)
	}
}

Additionally, if in handling syscall.SIGTERM you exit the program (by returning from main, or calling os.Exit) you will exit immediately. So all you have to do to avoid the delay is to exit the program after cleaning up. If you don't exit, yes, you see a 5 second delay. The alternative is that you get killed before you've had any change to clean up.

You can test this by modifying sample code as follows.

func main() {
	for {
		c := make(chan os.Signal, 1)
		signal.Notify(c)

		s := <-c
		time.Sleep(time.Second)
		fmt.Println("Got signal:", s)
		if s == syscall.SIGTERM {
			// comment the break to be forcefully terminated after 5 seconds
			// uncomment to exit immediately after the 1 second sleep
			break
		}
	}
}

In the original issue, I commented that syscall.SIGTERM did not seem the best "mapping" for these signals, because you can't really block, handle or ignore them (as you can in POSIX). Eventually, you do need to exit, or get killed.

Regarding number 2, signal handling through SetConsoleCtrlHandler becomes blocked. The CTRL_CLOSE_EVENT (mapped to syscall.SIGTERM) is the last signal you handle. Again, the alternative is to unblock the handler, which in this case means telling the system you've handled CTRL_CLOSE_EVENT, and that the system is free to kill you immediately. You're supposed to handle CTRL_CLOSE_EVENT (et al.) synchronously.

@ncruces
Copy link
Contributor Author

@ncruces ncruces commented Jan 3, 2021

Fixed the merge conflict (#41886, cl/261057), and updated the test to also ensure that the child exits before the 5 second deadline.

The test starts a child process, then closes its console window (taskkill does this, it sends WM_CLOSE to the console window), the child waits for a second, then notifies the parent through UDP and exits.

To be successful the test requires the parent to be notified through UDP, that the signal received is SIGTERM, that the child exits with a 0 status, and that the whole process completes in under 5 seconds. This ensures that the child has the time to properly handle the signal, and then exit, before Windows forcefully kills it.

@ncruces
Copy link
Contributor Author

@ncruces ncruces commented Jan 23, 2021

Sorry to bother you, and thanks for your time, @alexbrainman.
A final push to get this through?
Thanks!

@zx2c4
Copy link
Contributor

@zx2c4 zx2c4 commented Jan 23, 2021

@alexbrainman @dmitshur @bradfitz @ianlancetaylor - This has a working/reviewed fix now -- https://go-review.googlesource.com/c/go/+/261057 -- and it fixes a bug. Is this something we can consider submitting for 1.16?

@alexbrainman
Copy link
Member

@alexbrainman alexbrainman commented Jan 27, 2021

Is this something we can consider submitting for 1.16?

I am fine to include https://go-review.googlesource.com/c/go/+/261057 in Go 1.16. But it is up to release team to decide.

Alex

@dmitshur
Copy link
Contributor

@dmitshur dmitshur commented Jan 27, 2021

@zx2c4 @alexbrainman Discussed with @toothrot, this looks okay to try for Go 1.16 if you can land it before RC 1. It's contained, fixing a bug, and has a test. It'll go through testing in RC 1 and we can roll it back and try again in Go 1.17 if there are problems. Thanks.

@dmitshur dmitshur modified the milestones: Backlog, Go1.16 Jan 27, 2021
@zx2c4
Copy link
Contributor

@zx2c4 zx2c4 commented Jan 27, 2021

@ncruces - Could you address the last-minute feedback on Gerrit, and then I'll submit this?

@ncruces
Copy link
Contributor Author

@ncruces ncruces commented Jan 27, 2021

Done. Sorry for stepping on your toes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

9 participants