Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
syscall: add support for Windows job objects #17608
I would like to reopen #6720.
Look, this is not a solution!
Full story bellow.
I would appreciate your input on this...
What version of Go are you using (
This is not a good way to start a bug report. You've now established yourself as an adversary rather than a collaborator.
Something to keep in mind for future bug reports. Tone matters.
I'm sorry, but we need more information to understand what you are asking for on the Go side. I did read nodist/issues#179, but there is not a line of Go code in sight.
You also mentioned reopening #6720, which as I read it is about p.Signal(os.Interrupt), where p is an os.Process, not being implemented and therefore always returning an error on Windows. That issue was closed by 05cc78d, which did:
As discussed in #6720, we don't know of an obvious way to implement p.Signal(os.Interrupt) on Windows, or we would have. But none of us are Windows experts. A few long time users, maybe, but not experts. We do the best we can by reading the Microsoft documentation and a liberal amount of Stack Overflow and trial and error.
One possibility is that nodejs ctx.child.kill('SIGTERM') is sending a Ctrl-Break to the entire process group, but if that were the case I don't understand why that wouldn't hit both the go wrapper and the nodejs server it started.
Another possibility is that nodejs ctx.child.kill('SIGTERM') knows some kind of magic to send a Ctrl-Break to just that one process. If so and you can tell us what that magic is, we can probably implement it in Go.
Another possibility is that SIGTERM doesn't mean Ctrl-Break at all here. But then what does it mean?
I looked in github.com/nodejs/node and there is no mention of what SIGTERM means on Windows. They must be using the Microsoft C runtime library, but what does that do? I looked in the mingw sources and the closest I found was mingw/include/signal.h, which says:
"SIGTERM comes from what kind of termination request exactly?". Exactly.
It sounds like you maybe you know what Go should be doing instead, or maybe what SIGTERM means on Windows. If so, can you tell us? Thanks.
It looks like Node uses libuv, which treats SIGKILL, SIGTERM, and SIGINT the same on windows: https://github.com/nodejs/node/blob/db1087c9757c31a82c50a1eba368d8cba95b57d0/deps/uv/src/win/process.c#L1166
@pbnjay, thanks for finding that.
To recap, there is a Node parent that ran a Go process that ran a Node child.
On Unix systems, if the Node parent sends SIGTERM ("please stop") to the Go process, then the Go process's signal handler runs and can do something in response to the signal, like send SIGTERM to the Node child, wait for the Node child to exit gracefully, and then exit itself.
On Windows, my reading of what @pbnjay found is that the Node parent calls TerminateProcess on the Go process. That doesn't send a nice "please stop" to the Go process. It just terminates it, like Unix SIGKILL. There is no signal sent, no time to react; the operating system just destroys the process. In this case the Node child is left behind. It would have to be: the Go process had no chance to do anything.
On Linux, there is still a way to cope with SIGKILL. When the Go program starts the subprocess, it can pass a SysProcAttr with Pdeathsig!=0, which makes the forked child call prctl(PR_SET_PDEATHSIG, Pdeathsig) before exec'ing the actual new program. That setting means "if my parent dies, send me this signal", so that even if the Go program dies with kill -9 or some other path that forgets to do cleanup, the child can be notified that the parent is gone and clean up after itself.
It looks like maybe the Windows equivalent of PR_SET_PDEATHSIG is "job objects". It is unclear to me whether this still works in current versions of Windows, but some way to support that would be the obvious next thing to try. I'm going to retitle this bug to be about that.
I'm away from keyboard until Sunday, however - I wanted to say I appreciate the seriousity and professionalism the thread gets, despite the aforementioned ...tone.
I wish I had more lower level info - I would have shared it at start.
I can be just a bit more elaborate, I will next week.
In case it implies on your priorities - I owe you an update.
Specifically for my case a workaround has been found:
This results in removing the 3rd and 2nd last steps, jumping streight to the last.
In this case, the SIGTERM terminates the server and the system works as expected without leaving hung processes.
However, we're still get hanging processes whenever I
FYI, AssignProcessToJobObject fail on Windows7. AFAIK, it have to terminate process with walking children processes using CreateToolhelp32Snapshot on Windows7 or older. One another issue, as alex said in #6720, GenerateConsoleCtrlEvent have another problem. The API require "console". So if the process doesn't have a console, it doesn't work. For example, if the process call AllocConsole, it works fine.
I do not know about PR_SET_PDEATHSIG, but you can use "job objects" to control process groups on Windows. I even have github.com/alexbrainman/ps package with some APIs. We have used "job objects" to collect benchmark run statistics in golang.org/x/benchmarks/driver. From what I remember "job objects" provide facilities for child processes to start their own group too, so you would need some cooperation from your clients.
I think it works on all Go supported Windows versions.