-
Notifications
You must be signed in to change notification settings - Fork 1k
Break context chain and move subprocesses into separate pgroups #1270
Conversation
This puts us more squarely in control of subprocess signaling.
| c.cancel = cancel | ||
| // Create a one-off cancellable context for use by the CommandContext, in | ||
| // the event that we have to force a Process.Kill(). | ||
| ctx2, cancel := context.WithCancel(context.Background()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment I left in #1269; we should just not use a context here and just c.Cmd.Process.Kill directly instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think that's racy, unfortunately. IIRC, the code that your original PR got rid of in favor of exec.CommandContext() had all these nasty workarounds, including passing around integer address pointers for atomic ops, in order to be able to safely call c.Cmd.Process.Kill() directly without any possibility of that call being made on a nil pointer.
i'll go back and double-check, but i know that the thing i really liked about using exec.CommandContext in the first place was that it gave us a safe way of invoking Kill() by taking advantage of timing information that is otherwise only accessible from within the os/exec package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do double check - as far as I know, there's no possibility of Cmd.Process being nil after Start returns successfully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so, it was a bit more complicated than that - while i think you're correct that there's no possibility of Command.Process being nil, it was Command.ProcessState that gave problems:
Lines 29 to 53 in e74b613
| // If the process doesn't exit immediately, check every 50ms, up to 3s, | |
| // after which send a hard kill. | |
| // | |
| // Cannot rely on cmd.ProcessState.Exited() here, as that is not set | |
| // correctly when the process exits due to a signal. See | |
| // https://github.com/golang/go/issues/19798 . Also cannot rely on it | |
| // because cmd.ProcessState will be nil before the process exits, and | |
| // checking if nil create a data race. | |
| if !atomic.CompareAndSwapInt32(isDone, 1, 1) { | |
| to := time.NewTimer(3 * time.Second) | |
| tick := time.NewTicker(50 * time.Millisecond) | |
| defer to.Stop() | |
| defer tick.Stop() | |
| // Loop until the ProcessState shows up, indicating the proc has exited, | |
| // or the timer expires and | |
| for !atomic.CompareAndSwapInt32(isDone, 1, 1) { | |
| select { | |
| case <-to.C: | |
| return cmd.Process.Kill() | |
| case <-tick.C: | |
| } | |
| } | |
| } |
the goal there was avoiding sending a signal to a process that had already terminated, and that's the thing that's difficult to ascertain from anywhere other than the thread that's Wait()ing. all i can remember about why i went to those lengths is because i wanted to avoid returning an error to the broader system when that error was simply a case of our mismanagement of the process (signal-after-termination), rather than an actual error from the process. yes, we could sniff the error, but...well, it's an unexported, potentially os-specific error there. not ideal.
(and yes, that implies that this system should also probably be returning the cancelation error, rather than one from the underlying process, when cancelation is instructed)
so yeah, i'd prefer we stick with the reuse of context for the kill semantics here, as it both insulates us from having to think about any of the above, and may also provide a more useful point for understanding the causal ordering of events here (signal send start, signal send complete, process terminated, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what you're saying here. The code in os/exec is no more resilient to any of these problems; it just ignores the error returned from Process.Kill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
¯\_(ツ)_/¯ - maybe it's all equivalent. i don't have more time i can afford to invest in this right now, and figuring out if there was something else that motivated the original code. so - if you and @jmank88 can agree that just calling Process.Kill() is equivalent, then i'll relent, and we can just use that.
if not, this route feels less dragon-y to me, and without adequate time to dispel the fog, i've gotta follow my gut.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm with @tamird here. We could do this in another PR though, right? I can put something up after this is merged for us to stare at some more in isolation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, good point, can do it in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that's what #1269 is.
internal/gps/cmd_unix.go
Outdated
| c.Cmd.Stdout = &b | ||
| c.Cmd.Stderr = &b | ||
|
|
||
| // Force subprocesses into their own process group, rather than being in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not do this in the constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair nuff
internal/gps/cmd_unix.go
Outdated
|
|
||
| // Force subprocesses into their own process group, rather than being in the | ||
| // same process group as the dep process. Ctrl-C sent from a terminal will | ||
| // send the signal to the entire running process group, so This allows us to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/This/this/
|
we could treat it that way, and may as well - but they actually started as separate overlapping PRs when @jmank88 and I noticed this problem at the same time
…On October 14, 2017 3:46:40 AM EDT, Tamir Duberstein ***@***.***> wrote:
tamird commented on this pull request.
> @@ -25,11 +25,15 @@ type cmd struct {
}
func commandContext(ctx context.Context, name string, arg ...string)
cmd {
- // Grab the caller's context and pass a derived one to
CommandContext.
- c := cmd{ctx: ctx}
- ctx, cancel := context.WithCancel(ctx)
- c.Cmd = exec.CommandContext(ctx, name, arg...)
- c.cancel = cancel
+ // Create a one-off cancellable context for use by the
CommandContext, in
+ // the event that we have to force a Process.Kill().
+ ctx2, cancel := context.WithCancel(context.Background())
I thought that's what #1269 is.
--
You are receiving this because you modified the open/close state.
Reply to this email directly or view it on GitHub:
#1270 (comment)
|
What does this do / why do we need it?
This has two parts - first, we break the context connection from the incoming ctx to a
cmd, such that its cancellation does not immediately propagate to theexec.ContextCommand, resulting in an immediateos.Process.Kill()(which we want to avoid).In testing this, though, i discovered that the signals sent via ctrl-C were still being processed immediately by the e.g. git subprocesses. So, after tearing my hair out for a while, I discovered that it's because ctrl-C from a terminal, by default, sends the signal to everything in the process group, and (with default settings) our spawned subprocesses are placed in that same group.
This is a bit of just rearranging deck chairs as it still has the same effect, but I prefer to be more explicit about our subprocess controls, if we can.
What should your reviewer look out for in this PR?
The biggest question in my mind with this if this is indeed a portable use of
syscallacross everything that's!windows.Which issue(s) does this PR fix?
None unfortunately (I'd hoped this would fix our corruption problems), as this achieves the same outcomes, just with greater control.