Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix process termination handling for runc exec #3722

Merged
merged 1 commit into from
Mar 17, 2023

Conversation

coryb
Copy link
Collaborator

@coryb coryb commented Mar 16, 2023

This patch makes the process handling consistent between runc.Run and runc.Exec usage. Previously runc.Run would use context.Background for the runc.Run process and would monitor the request context for shutdown requests, sending a SIGKILL to the container pid1 process. This allowed runc.Run to gracefully shutdown and reap child processes. This logic was not used for runc.Exec where instead we were passing in the request context to runc.Exec, and if that request context was cancelled the runc process would immediately terminate preventing runc from reaping the child process. In this scenario the extra pid will remain forever and then when the pid1 process will get wedged in zap_pid_ns_processes syscall upon shutdown waiting fo the zombie pid to exit.

With this fix both runc.Run and runc.Exec will use context.Background for runc processes and monitor the request context for shutdown request triggering a SIGKILL to the pid being monitored by runc.

This patch was split off from #3658
The tests from that PR verifies this all works as expected.

var eg errgroup.Group
egCtx, cancel := context.WithCancel(ctx)
egCtx, cancel := context.WithCancel(context.Background())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be:

eg, egCtx := errgroup.WithContext(context.Background)

or maybe at least

egCtx, cancel := context.WithCancel(context.Background())
eg, egCtx := errgroup.WithContext(egCtx)

ended chan struct{}
}

// newStartingProcess will create a startingProcess that will be monitored, where
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be follow-up but I wonder what a better name would be for this. Is it like runcProcessHandle?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also wonder if the signature could be

h, ctx := newRuncProcessHandle(ctx, ...)

The returned context would be the one that is based on Background() and doesn't immediately get canceled when the input context gets canceled.

runcProcess := &startingProcess{
ready: make(chan struct{}),
}
runcProcess, runcCtx, cancel := runcProcessHandle(ctx, id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need both cancel and Release()? If the timing is different then cancel can still just be a method on runcProcess (.Close() or .Stop()).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the cancel was used to stop the signal/resize loops, allowing the waitgroup to exit cleanly. I have stored the cancel in the procHandle and it will be called with a runcProcess.Shutdown() call now.

defer runcProcess.Release()

var eg errgroup.Group
egCtx, cancel := context.WithCancel(ctx)
eg, egCtx := errgroup.WithContext(runcCtx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on line 64 (can't leave comment there), is ctx correct?

I think we should be able to pretty much just use ctx var now, without the need for runcCtx and egCtx variables.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on line 64. I have updated to use ctx consistently now. I also had to copy over the logger from the request context to preserve that for logging in the errgroups.

This patch makes the process handling consistent between runc.Run and
runc.Exec usage.  Previously runc.Run would use context.Background
for the runc.Run process and would monitor the request context for
shutdown requests, sending a SIGKILL to the container pid1 process. This
allowed runc.Run to gracefully shutdown and reap child processes.  This
logic was not used for runc.Exec where instead we were passing in the
request context to runc.Exec, and if that request context was cancelled
the runc process would immediately terminate preventing runc from reaping
the child process.  In this scenario the extra pid will remain forever
and then when the pid1 process will get wedged in zap_pid_ns_processes
syscall upon shutdown waiting fo the zombie pid to exit.

With this fix both runc.Run and runc.Exec will use context.Background
for runc processes and monitor the request context for shutdown request
triggering a SIGKILL to the pid being monitored by runc.

Signed-off-by: coryb <cbennett@netflix.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants