-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runc worker: fix sigkill handling #3754
Conversation
26c29f3
to
393db6f
Compare
Ughh, this change is working for the test cases was working with, but causing other issues, will keep looking... |
db5d937
to
a2c70fa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The basic issue is that the started channel passed into runc.(Run|Exec) returns the pid of the runc process, but generally we need to send signals to the process monitored by runc in the container. We can get this process id via a pidfile that runc will write to. Unfortunately the pidfile creation is sometime after the runc pid is returned on the started channel, so we have to loop waiting for the pidfile to have valid contents.
There is an exception in that we need to send SIGWINCH signals to runc rather than the process inside the container, I suspect that is because the runc process is controlling the tty.
This seems to be the only way that we can manage processes launched via runc exec
. With runc run
we can use runc kill
to signal the in-container process, but runc kill
does not work with processes started via runc exec
. We could abstract the code a bit to use runc kill
for pid1 and the pidfile + process signal for exec'd processes, but not sure that adds any value here. Either way we still have to send SIGWINCH signals to the runc process. (cant use runc kill
with SIGWINCH)
executor/runcexecutor/executor.go
Outdated
return errors.Errorf("context cancelled before runc wrote to pidfile") | ||
} | ||
return errors.Wrap(err, "context cancelled before we found valid runc pid") | ||
case <-time.After(50 * time.Millisecond): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what value to use here for retry, in my experience it was taking ~30-40ms for the pidfile to be updated. We can shorten this duration, not sure there is much harm in making this loop faster, maybe 5-10ms?
executor/runcexecutor/executor.go
Outdated
pidData, err := os.ReadFile(p.pidfile) | ||
if err != nil { | ||
return errors.Wrap(err, "unable to read pidfile after runc process started") | ||
} | ||
pid, err = strconv.Atoi(string(pidData)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, runc could partially write the pidfile while we are reading it? Not sure if this is actually possible. Also not sure how we could know, perhaps we could just add the os.FindProcess call fto the retry loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- runc creates pidfile atomically (create + rename) so no partial read is possible.
- AFAIK os.FIndProcess() is a no-op on Linux (it merely wraps a pid in a structure and returns it). Meaning, it doesn't actually check whether the process exists, it never returns any errors. This might be changed in the future of course.
|
||
go func() { | ||
defer close(pid2Done) | ||
// TODO why doesn't Exec allow for started channel?? Fake it for now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a branch with adding a started channel to exec, it was not too complex. I will follow up with another PR for this after this one is resolved.
I'm not sure if I agree with this. I see no mention of signal codepath changes in #3722 description. The new code looks quite a lot more complicated, and these retry loops are not really clean. I think it would make sense to keep the old codepath and if it had problems with exec(that afaics do not have issues with capturing pid) then add fixes for it. We are also dealing with regression in the patch release. We can't take chances on complicated patches with a lot of new code. So the preference is for simpler solutions that have already proven to be stable in the past, compared to maximum code sharing between the run and exec codepath. If the latter is the only way to fix the regression, we need to consider reverting #3722 instead from v0.11. |
@kolyshkin @AkihiroSuda any ideas if anything could've changed on that ? |
At this point, reverting probably makes the most sense, I will open another PR for that. I am unable to find any other way to get the pid for in-container process via |
executor/runcexecutor/executor.go
Outdated
if err != nil { | ||
return errors.Wrapf(err, "unable to find runc monitor process for pid %d", runcPid) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this code is linux-specific you can safely do just p.MonitorProcess, _ = os.FindProcess(runcPid)
. See below for an explanation.
I seriously doubt runc ever returns the in-container PID (as, except for exec, this would always be The "pid returned on the channel" bit is not about runc, since we do not have such API. I guess this is about github.com/containerd/go-runc package. If that's correct, then it writes the PID of The initial PID of an in-container process, in the host PID namespace, is written into |
pidfile, err := os.CreateTemp("", "runc.*.pid") | ||
if err != nil { | ||
return errors.Wrap(err, "failed to create runc pid file") | ||
} | ||
pidfile.Close() | ||
defer os.Remove(pidfile.Name()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This create/remove dance is racy. Perhaps it's better to create a temp directory and use it to store a pid file.
pidfile, err := os.CreateTemp("", "runc.*.pid") | ||
if err != nil { | ||
return errors.Wrap(err, "failed to create runc pid file") | ||
} | ||
pidfile.Close() | ||
defer os.Remove(pidfile.Name()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
|
Overall, the logic in this PR seems legit, but the description needs to be fixed. |
executor/runcexecutor/executor.go
Outdated
|
||
// the pid reported from the started channel is the pid of the runc | ||
// monitor process, but we will need to send signals to the process | ||
// running in the container, which we can read the pidfile. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/read the pidfile/read from the pidfile/
executor/runcexecutor/executor.go
Outdated
if os.IsNotExist(err) { | ||
select { | ||
case <-ctx.Done(): | ||
return errors.Errorf("context cancelled before runc wrote pidfile") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can use errors.New
instead.
executor/runcexecutor/executor.go
Outdated
// SIGWINCH signals to it, if we send them to the process inside the | ||
// container nothing happens. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would change that to
// SIGWINCH signals to it, as terminal resizing is done in runc.
I'm still confused about the overall design of this. We are using |
We can only use the I have updated the code to use I have been trying to simplify this code, but pretty clearly have not succeeded. The |
Doesn't the |
Yeah, you are correct. I keep getting confused because I naively expected there to be a single way to send all signals to the processes in the container, but it seems like we have 3 distinct use cases for signals for our runc workers, with slightly different requirements for
I think I kept getting confused because generally signals are propagated (case |
@coryb Thanks for the explanation. It looks like the best option could be send to the runc pid directly always, except for The alternative would be to use I'm not sure if it is completely impossible for |
8c6e182
to
758706f
Compare
I have gone ahead and refactored to send all signals to the runc process except when we try to Kill on cancel. For that I have added a I have also fixed an edge case where a user sends SIGKILL over the Signal channel for containers created by gateway NewContainer, previously this would have killed the I have also cleaned up a few comments that were ambiguous about which process we were working with. |
executor/runcexecutor/executor.go
Outdated
}() | ||
|
||
if k.pidfile == "" { | ||
// for `runc runc` process we use `runc kill` to terminate the process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be runc run
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some nitpicks, nothing blocking, LGTM. Thank you for fixing this @coryb! I confirmed that this commit fixes the problem impacting some Dagger users currently.
cc @tonistiigi if you have time would appreciate getting this merged into master soon if it looks good to you too. Dagger relies on some commits only in the master branch of buildkit right now and we're trying to avoid the overhead of creating a fork of buildkit with patches applied as much as possible 🙏
executor/runcexecutor/executor.go
Outdated
// never send SIGKILL directly to runc, it needs to go to the | ||
// process in-container | ||
if err := runcProcess.killer.Kill(ctx); err != nil { | ||
bklog.G(ctx).Errorf("failed to kill the process: %+v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think this log will mostly be a duplicate of the log on line 480
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks fixed.
if err != nil { | ||
if os.IsNotExist(err) { | ||
select { | ||
case <-ctx.Done(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like it might be more robust to use a context like waitCtx, cancel := context.WithTimeout(ctx, 10*time.Seconds)
or similar that's created in this function.
Reason being that we're now relying on the provided context to always have timeouts/cancellations setup properly, and even if that's true currently it seems like something that could be missed in any future modifications. So adding one extra layer of protection that's derived from the parent context probably doesn't hurt IMO.
This is a huge nitpick though, cases where this would matter are likely quite obscure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I am missing something, but I think this edge case is covered. The ctx that is passed into the Kill function is created on like 564 like:
killCtx, timeout := context.WithTimeout(context.Background(), 7*time.Second)
This is the kill that is triggered on the request context being canceled. There is also the client requested kill where client send in a SIGKILL on the Signal channel, that one will be using the cancelable Background context created in runcProcessHandle
on line 543. So it is possible for the client requested SIGKILL via the Signal channel to block, but the client will have a request context to cancel, which would trigger us to call kill again with the 7s timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I totally agree that at this point there wouldn't be a need for a timeout inside this function. I was just thinking that as changes to the rest of the code happen in the future it would be easy to miss something like this and accidentally provide a context that could potentially result in blocking.
But again, this is not important for functionality at this exact moment and a pretty obscure case to begin with, so no need to block merging on this point if you don't think it's worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you are saying, seems fine to me, added it as a fail-safe.
@@ -33,22 +34,34 @@ func (w *runcExecutor) run(ctx context.Context, id, bundle string, process execu | |||
} | |||
|
|||
func (w *runcExecutor) exec(ctx context.Context, id, bundle string, specsProcess *specs.Process, process executor.ProcessInfo, started func()) error { | |||
return w.callWithIO(ctx, id, bundle, process, started, func(ctx context.Context, started chan<- int, io runc.IO) error { | |||
isExec := true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is somewhat weird now as we are passing a callback that does exec but then we are also passing a boolean to call the related behavior. Should the callback return the killer implementation as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are correct, I think I can pass in the correct killer now, will update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
This fixes the incorrect kill handling introduced in b76f8c0. We need to send the SIGKILL to the in-container process, not the runc process. This patch adds an abstraction over the kill handling: * for `runc run` processes use `runc kill` * for `runc exec` processes, read pid (in host PID namespace) from pidfile created by `runc exec`, then send the signal directly to that process. Also use the kill abstraction when we receive a SIGKILL over the signal channel for containers created by gateway NewContainer Signed-off-by: coryb <cbennett@netflix.com>
@tonistiigi does this one need a cherry-pick label? |
I think we were more on the side that because this is a big patch, we will not pick it and instead revert the previous fix in v0.11. |
This fixes the incorrect kill handling introduced in b76f8c0. We need to send the
SIGKILL to the in-container process, not the
runc
process. This patch adds an abstraction over the kill handling:runc run
processes userunc kill
runc exec
processes, read pid (in host PID namespace) from pidfile created byrunc exec
, then send the signal directly to that process.Also use the kill abstraction when we receive a SIGKILL over the signal channel for containers created by gateway NewContainer
The pid returned on the started channel from runc.(Run|Exec) seems to be the pid of the runc monitor process, not the pid of the process run in the container. I have confirmed the pid written to the pidfile is the host pid of the process in the container, so create tmp pid files for runc to write to so we can extract the correct pid to signal.I would have sworn that the pid returned on the channel used to be the for the in-container process, so I am not sure if runc changed or I am just imagining things.this fixes #3751
edit: @kolyshkin thanks for confirming I am just imagining things. The pids returned from go-runc are the pids of the runc process, the pidfiles written by the runc process contain the initial PID of an in-container process, in the host PID namespace.