-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix process termination handling for runc exec #3722
Conversation
var eg errgroup.Group | ||
egCtx, cancel := context.WithCancel(ctx) | ||
egCtx, cancel := context.WithCancel(context.Background()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be:
eg, egCtx := errgroup.WithContext(context.Background)
or maybe at least
egCtx, cancel := context.WithCancel(context.Background())
eg, egCtx := errgroup.WithContext(egCtx)
executor/runcexecutor/executor.go
Outdated
ended chan struct{} | ||
} | ||
|
||
// newStartingProcess will create a startingProcess that will be monitored, where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be follow-up but I wonder what a better name would be for this. Is it like runcProcessHandle
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also wonder if the signature could be
h, ctx := newRuncProcessHandle(ctx, ...)
The returned context would be the one that is based on Background()
and doesn't immediately get canceled when the input context gets canceled.
runcProcess := &startingProcess{ | ||
ready: make(chan struct{}), | ||
} | ||
runcProcess, runcCtx, cancel := runcProcessHandle(ctx, id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need both cancel
and Release()
? If the timing is different then cancel
can still just be a method on runcProcess
(.Close()
or .Stop()
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the cancel was used to stop the signal/resize loops, allowing the waitgroup to exit cleanly. I have stored the cancel in the procHandle and it will be called with a runcProcess.Shutdown()
call now.
defer runcProcess.Release() | ||
|
||
var eg errgroup.Group | ||
egCtx, cancel := context.WithCancel(ctx) | ||
eg, egCtx := errgroup.WithContext(runcCtx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on line 64 (can't leave comment there), is ctx
correct?
I think we should be able to pretty much just use ctx
var now, without the need for runcCtx
and egCtx
variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch on line 64. I have updated to use ctx consistently now. I also had to copy over the logger from the request context to preserve that for logging in the errgroups.
This patch makes the process handling consistent between runc.Run and runc.Exec usage. Previously runc.Run would use context.Background for the runc.Run process and would monitor the request context for shutdown requests, sending a SIGKILL to the container pid1 process. This allowed runc.Run to gracefully shutdown and reap child processes. This logic was not used for runc.Exec where instead we were passing in the request context to runc.Exec, and if that request context was cancelled the runc process would immediately terminate preventing runc from reaping the child process. In this scenario the extra pid will remain forever and then when the pid1 process will get wedged in zap_pid_ns_processes syscall upon shutdown waiting fo the zombie pid to exit. With this fix both runc.Run and runc.Exec will use context.Background for runc processes and monitor the request context for shutdown request triggering a SIGKILL to the pid being monitored by runc. Signed-off-by: coryb <cbennett@netflix.com>
This patch makes the process handling consistent between runc.Run and runc.Exec usage. Previously runc.Run would use context.Background for the runc.Run process and would monitor the request context for shutdown requests, sending a SIGKILL to the container pid1 process. This allowed runc.Run to gracefully shutdown and reap child processes. This logic was not used for runc.Exec where instead we were passing in the request context to runc.Exec, and if that request context was cancelled the runc process would immediately terminate preventing runc from reaping the child process. In this scenario the extra pid will remain forever and then when the pid1 process will get wedged in zap_pid_ns_processes syscall upon shutdown waiting fo the zombie pid to exit.
With this fix both runc.Run and runc.Exec will use context.Background for runc processes and monitor the request context for shutdown request triggering a SIGKILL to the pid being monitored by runc.
This patch was split off from #3658
The tests from that PR verifies this all works as expected.