Skip to content
This repository was archived by the owner on Sep 9, 2020. It is now read-only.

Conversation

@sdboyer
Copy link
Member

@sdboyer sdboyer commented Oct 13, 2017

What does this do / why do we need it?

This has two parts - first, we break the context connection from the incoming ctx to a cmd, such that its cancellation does not immediately propagate to the exec.ContextCommand, resulting in an immediate os.Process.Kill() (which we want to avoid).

In testing this, though, i discovered that the signals sent via ctrl-C were still being processed immediately by the e.g. git subprocesses. So, after tearing my hair out for a while, I discovered that it's because ctrl-C from a terminal, by default, sends the signal to everything in the process group, and (with default settings) our spawned subprocesses are placed in that same group.

This is a bit of just rearranging deck chairs as it still has the same effect, but I prefer to be more explicit about our subprocess controls, if we can.

What should your reviewer look out for in this PR?

The biggest question in my mind with this if this is indeed a portable use of syscall across everything that's !windows.

Which issue(s) does this PR fix?

None unfortunately (I'd hoped this would fix our corruption problems), as this achieves the same outcomes, just with greater control.

c.cancel = cancel
// Create a one-off cancellable context for use by the CommandContext, in
// the event that we have to force a Process.Kill().
ctx2, cancel := context.WithCancel(context.Background())
Copy link
Contributor

@tamird tamird Oct 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment I left in #1269; we should just not use a context here and just c.Cmd.Process.Kill directly instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that's racy, unfortunately. IIRC, the code that your original PR got rid of in favor of exec.CommandContext() had all these nasty workarounds, including passing around integer address pointers for atomic ops, in order to be able to safely call c.Cmd.Process.Kill() directly without any possibility of that call being made on a nil pointer.

i'll go back and double-check, but i know that the thing i really liked about using exec.CommandContext in the first place was that it gave us a safe way of invoking Kill() by taking advantage of timing information that is otherwise only accessible from within the os/exec package.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do double check - as far as I know, there's no possibility of Cmd.Process being nil after Start returns successfully.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so, it was a bit more complicated than that - while i think you're correct that there's no possibility of Command.Process being nil, it was Command.ProcessState that gave problems:

// If the process doesn't exit immediately, check every 50ms, up to 3s,
// after which send a hard kill.
//
// Cannot rely on cmd.ProcessState.Exited() here, as that is not set
// correctly when the process exits due to a signal. See
// https://github.com/golang/go/issues/19798 . Also cannot rely on it
// because cmd.ProcessState will be nil before the process exits, and
// checking if nil create a data race.
if !atomic.CompareAndSwapInt32(isDone, 1, 1) {
to := time.NewTimer(3 * time.Second)
tick := time.NewTicker(50 * time.Millisecond)
defer to.Stop()
defer tick.Stop()
// Loop until the ProcessState shows up, indicating the proc has exited,
// or the timer expires and
for !atomic.CompareAndSwapInt32(isDone, 1, 1) {
select {
case <-to.C:
return cmd.Process.Kill()
case <-tick.C:
}
}
}

the goal there was avoiding sending a signal to a process that had already terminated, and that's the thing that's difficult to ascertain from anywhere other than the thread that's Wait()ing. all i can remember about why i went to those lengths is because i wanted to avoid returning an error to the broader system when that error was simply a case of our mismanagement of the process (signal-after-termination), rather than an actual error from the process. yes, we could sniff the error, but...well, it's an unexported, potentially os-specific error there. not ideal.

(and yes, that implies that this system should also probably be returning the cancelation error, rather than one from the underlying process, when cancelation is instructed)

so yeah, i'd prefer we stick with the reuse of context for the kill semantics here, as it both insulates us from having to think about any of the above, and may also provide a more useful point for understanding the causal ordering of events here (signal send start, signal send complete, process terminated, etc.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you're saying here. The code in os/exec is no more resilient to any of these problems; it just ignores the error returned from Process.Kill.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

¯\_(ツ)_/¯ - maybe it's all equivalent. i don't have more time i can afford to invest in this right now, and figuring out if there was something else that motivated the original code. so - if you and @jmank88 can agree that just calling Process.Kill() is equivalent, then i'll relent, and we can just use that.

if not, this route feels less dragon-y to me, and without adequate time to dispel the fog, i've gotta follow my gut.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm with @tamird here. We could do this in another PR though, right? I can put something up after this is merged for us to stare at some more in isolation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, good point, can do it in a follow-up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that's what #1269 is.

c.Cmd.Stdout = &b
c.Cmd.Stderr = &b

// Force subprocesses into their own process group, rather than being in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not do this in the constructor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair nuff


// Force subprocesses into their own process group, rather than being in the
// same process group as the dep process. Ctrl-C sent from a terminal will
// send the signal to the entire running process group, so This allows us to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/This/this/

@sdboyer sdboyer merged commit 7d9726b into golang:master Oct 14, 2017
@sdboyer
Copy link
Member Author

sdboyer commented Oct 14, 2017 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants