Break context chain and move subprocesses into separate pgroups #1270

sdboyer · 2017-10-13T13:59:15Z

What does this do / why do we need it?

This has two parts - first, we break the context connection from the incoming ctx to a cmd, such that its cancellation does not immediately propagate to the exec.ContextCommand, resulting in an immediate os.Process.Kill() (which we want to avoid).

In testing this, though, i discovered that the signals sent via ctrl-C were still being processed immediately by the e.g. git subprocesses. So, after tearing my hair out for a while, I discovered that it's because ctrl-C from a terminal, by default, sends the signal to everything in the process group, and (with default settings) our spawned subprocesses are placed in that same group.

This is a bit of just rearranging deck chairs as it still has the same effect, but I prefer to be more explicit about our subprocess controls, if we can.

What should your reviewer look out for in this PR?

The biggest question in my mind with this if this is indeed a portable use of syscall across everything that's !windows.

Which issue(s) does this PR fix?

None unfortunately (I'd hoped this would fix our corruption problems), as this achieves the same outcomes, just with greater control.

This puts us more squarely in control of subprocess signaling.

tamird · 2017-10-13T14:18:23Z

internal/gps/cmd_unix.go

-	c.cancel = cancel
+	// Create a one-off cancellable context for use by the CommandContext, in
+	// the event that we have to force a Process.Kill().
+	ctx2, cancel := context.WithCancel(context.Background())


same comment I left in #1269; we should just not use a context here and just c.Cmd.Process.Kill directly instead.

i think that's racy, unfortunately. IIRC, the code that your original PR got rid of in favor of exec.CommandContext() had all these nasty workarounds, including passing around integer address pointers for atomic ops, in order to be able to safely call c.Cmd.Process.Kill() directly without any possibility of that call being made on a nil pointer.

i'll go back and double-check, but i know that the thing i really liked about using exec.CommandContext in the first place was that it gave us a safe way of invoking Kill() by taking advantage of timing information that is otherwise only accessible from within the os/exec package.

Please do double check - as far as I know, there's no possibility of Cmd.Process being nil after Start returns successfully.

Ah, so, it was a bit more complicated than that - while i think you're correct that there's no possibility of Command.Process being nil, it was Command.ProcessState that gave problems:

dep/internal/gps/cmd_unix.go

Lines 29 to 53 in e74b613

// If the process doesn't exit immediately, check every 50ms, up to 3s,

// after which send a hard kill.

//

// Cannot rely on cmd.ProcessState.Exited() here, as that is not set

// correctly when the process exits due to a signal. See

// https://github.com/golang/go/issues/19798 . Also cannot rely on it

// because cmd.ProcessState will be nil before the process exits, and

// checking if nil create a data race.

if !atomic.CompareAndSwapInt32(isDone, 1, 1) {

to := time.NewTimer(3 * time.Second)

tick := time.NewTicker(50 * time.Millisecond)

defer to.Stop()

defer tick.Stop()

// Loop until the ProcessState shows up, indicating the proc has exited,

// or the timer expires and

for !atomic.CompareAndSwapInt32(isDone, 1, 1) {

select {

case <-to.C:

return cmd.Process.Kill()

case <-tick.C:

}

}

}

the goal there was avoiding sending a signal to a process that had already terminated, and that's the thing that's difficult to ascertain from anywhere other than the thread that's Wait()ing. all i can remember about why i went to those lengths is because i wanted to avoid returning an error to the broader system when that error was simply a case of our mismanagement of the process (signal-after-termination), rather than an actual error from the process. yes, we could sniff the error, but...well, it's an unexported, potentially os-specific error there. not ideal.

(and yes, that implies that this system should also probably be returning the cancelation error, rather than one from the underlying process, when cancelation is instructed)

so yeah, i'd prefer we stick with the reuse of context for the kill semantics here, as it both insulates us from having to think about any of the above, and may also provide a more useful point for understanding the causal ordering of events here (signal send start, signal send complete, process terminated, etc.)

I don't understand what you're saying here. The code in os/exec is no more resilient to any of these problems; it just ignores the error returned from Process.Kill.

¯\_(ツ)_/¯ - maybe it's all equivalent. i don't have more time i can afford to invest in this right now, and figuring out if there was something else that motivated the original code. so - if you and @jmank88 can agree that just calling Process.Kill() is equivalent, then i'll relent, and we can just use that.

if not, this route feels less dragon-y to me, and without adequate time to dispel the fog, i've gotta follow my gut.

I think I'm with @tamird here. We could do this in another PR though, right? I can put something up after this is merged for us to stare at some more in isolation.

yeah, good point, can do it in a follow-up.

I thought that's what #1269 is.

tamird · 2017-10-13T14:19:08Z

internal/gps/cmd_unix.go

 	c.Cmd.Stdout = &b
 	c.Cmd.Stderr = &b
+
+	// Force subprocesses into their own process group, rather than being in the


Why not do this in the constructor?

tamird · 2017-10-13T14:19:19Z

internal/gps/cmd_unix.go

+
+	// Force subprocesses into their own process group, rather than being in the
+	// same process group as the dep process. Ctrl-C sent from a terminal will
+	// send the signal to the entire running process group, so This allows us to


s/This/this/

sdboyer · 2017-10-14T23:10:01Z

we could treat it that way, and may as well - but they actually started as separate overlapping PRs when @jmank88 and I noticed this problem at the same time

…

On October 14, 2017 3:46:40 AM EDT, Tamir Duberstein ***@***.***> wrote: tamird commented on this pull request. > @@ -25,11 +25,15 @@ type cmd struct { } func commandContext(ctx context.Context, name string, arg ...string) cmd { - // Grab the caller's context and pass a derived one to CommandContext. - c := cmd{ctx: ctx} - ctx, cancel := context.WithCancel(ctx) - c.Cmd = exec.CommandContext(ctx, name, arg...) - c.cancel = cancel + // Create a one-off cancellable context for use by the CommandContext, in + // the event that we have to force a Process.Kill(). + ctx2, cancel := context.WithCancel(context.Background()) I thought that's what #1269 is. -- You are receiving this because you modified the open/close state. Reply to this email directly or view it on GitHub: #1270 (comment)

sdboyer added 3 commits October 12, 2017 22:40

gps: Break ctx chain so that Kill isn't autofired

bea24e2

gps: Force subprocesses into new pgroup on unix

6bbd19d

This puts us more squarely in control of subprocess signaling.

gps: Strings for calltypes

8de86c4

sdboyer added the area: gps label Oct 13, 2017

googlebot added the cla: yes label Oct 13, 2017

sdboyer mentioned this pull request Oct 13, 2017

gps: cmd_unix: remove extra command context; use Cmd.Process.Kill() directly #1269

Merged

tamird reviewed Oct 13, 2017

View reviewed changes

gps: Set pgroup controls in cmd constructor

239acda

sdboyer merged commit 7d9726b into golang:master Oct 14, 2017

	// If the process doesn't exit immediately, check every 50ms, up to 3s,
	// after which send a hard kill.
	//
	// Cannot rely on cmd.ProcessState.Exited() here, as that is not set
	// correctly when the process exits due to a signal. See
	// https://github.com/golang/go/issues/19798 . Also cannot rely on it
	// because cmd.ProcessState will be nil before the process exits, and
	// checking if nil create a data race.
	if !atomic.CompareAndSwapInt32(isDone, 1, 1) {
	to := time.NewTimer(3 * time.Second)
	tick := time.NewTicker(50 * time.Millisecond)

	defer to.Stop()
	defer tick.Stop()

	// Loop until the ProcessState shows up, indicating the proc has exited,
	// or the timer expires and
	for !atomic.CompareAndSwapInt32(isDone, 1, 1) {
	select {
	case <-to.C:
	return cmd.Process.Kill()
	case <-tick.C:
	}
	}
	}

Break context chain and move subprocesses into separate pgroups #1270

Break context chain and move subprocesses into separate pgroups #1270

Uh oh!

Conversation

sdboyer commented Oct 13, 2017

What does this do / why do we need it?

What should your reviewer look out for in this PR?

Which issue(s) does this PR fix?

Uh oh!

tamird Oct 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sdboyer commented Oct 14, 2017 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tamird Oct 13, 2017 •

edited

Loading