Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix gateway exec tty cleanup on context.Canceled #3658

Merged
merged 1 commit into from
Mar 22, 2023

Conversation

coryb
Copy link
Collaborator

@coryb coryb commented Feb 21, 2023

This fixes an issue where the tty message handling loop will go into a tight loop and never exit upon context.Canceled. There is select statement in (*procMessageForwarder).Recv that returns nil on ctx.Done, but the control loop in (*container).Start did not exit on this condition. I think the intent was to flush out any inflight messages on cancel, but this is already done in (*procMessageForwarder) Close.

@coryb
Copy link
Collaborator Author

coryb commented Feb 21, 2023

looking at the test failures ...

@@ -310,6 +310,7 @@ func (w *runcExecutor) Run(ctx context.Context, id string, root executor.Mount,
timeout()
select {
case <-time.After(50 * time.Millisecond):
cancelRun()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows the tests to pass, I am a little confused why this was necessary. The pid1 sleep 10 was continuing to run, even though the ctx.Done was triggered almost immediately. Then the w.runc.Kill ran which did not return an error, but the w.run call below didn't terminate. So this loop waited for 50ms then ran the kill 9 again and again, until the pid1 ended after 10s. I am not sure how the kill 9 is not actually terminating the runc process. With this change we cancel the ctx passed into w.run after the kill 9 is ignored, and the process ends up exiting as expected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not look quite right. I'm not sure why sigkill wouldn't work (can we replicate it outside of test?) but even when we handle misbehaving runc that doesn't react(properly) to sigkill, we should give it more than 50ms to shut down normally.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried to reproduce the steps by calling runc directly and so far it works, not sure what the difference is. I will keep looking, although I have very little time these days. The only thing I can think of is that the sigkill signal is not actually being sent to the runc or sleep process, not sure how/why though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you can see that this work outside of the test. can you just add a counter before calling cancelRun() so that case does not get called on 50ms but after a couple of seconds for example? Also some comments describing why we are doing this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some time debugging this, I think the problem is with the test container, not with buildkit. The sleep process gets wedged on zap_pid_ns_processes which seems to be related to parent/child reaping. The cancelRun was just giving up and shutting things down even though the sleep process persisted. I have updated the test container to run with tini as the entrypoint to better handle reaping during the go tests, so far it seems to work well. Hopefully it will make it through the github workflows...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the tests all pass when run under tini with out the cancelRun() hack.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have another idea why the zombie processes might happening, let me test that out. In theory runc should be doing the waitpid so there should be no zombies for tini to handle...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the PR again, the bug was in (*runcExecutor).Exec, and the tini hack put me on the right path. So the runc pid1 (sleep) was getting wedged on zap_pid_ns_process, which according to the docs:

The zap_pid_ns_processes function is used in Linux to terminate all processes within a specific namespace

So the problem was not with the pid1 Kill directly, it was actually a zombie process from the Exec (the sh command) which was preventing zap_pid_ns_processes from finishing. It turns out we were using context.Background() for the runc call via (*runcExecutor).Run but were using the request ctx for the runc call via (*runcExecutor).Exec. So when the ctx.Done happened on the parent runc command for the Exec was getting terminated immediately, before it could call waitpid on the child sh command.

I have moved the wait/kill logic into a common routine now that will be called for both Run and Exec and we use context.Background() for both runc calls now.

@coryb coryb force-pushed the gateway-exec-cancel branch 2 times, most recently from f7eb53c to 38e37c5 Compare March 3, 2023 16:46
@coryb
Copy link
Collaborator Author

coryb commented Mar 3, 2023

I got a build failure that seems unrelated, retrying:

        	Error Trace:	/src/client/client_test.go:6512
        	Error:      	content still exists
        	Test:       	TestIntegration/TestDiffCircularSymlinks/worker=containerd-1.6

@coryb
Copy link
Collaborator Author

coryb commented Mar 3, 2023

Okay, tests passing, it looks like TestIntegration/TestDiffCircularSymlinks is flakey.

@jedevc
Copy link
Member

jedevc commented Mar 6, 2023

The flakiness is likely from #3401.

@coryb
Copy link
Collaborator Author

coryb commented Mar 9, 2023

@tonistiigi would be great to get your review on this one again, I think all the issues are known/fixed at this point.

@tonistiigi tonistiigi requested a review from jedevc March 11, 2023 02:48
Copy link
Member

@jedevc jedevc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, the fix makes sense 🎉 I did spend some time trying to actually find the part of the PR that was relevant to the fix vs the runc issue. Could you split the busy loop fix into a separate commit from the runc fixes (and maybe from the tests as well if there's the potential to cherry-pick just the busy loop fix), to make it a bit easier to trace back in the history?

Am I right in guessing that we could just take the busy loop break for cherry-pick, and leave the tests and runc changes on master? Bit frustrating to not take the tests as well, but from my understanding we'd need the runc changes as well (which seem a bit riskier to cherry-pick to me).

@coryb
Copy link
Collaborator Author

coryb commented Mar 15, 2023

Looks good to me, the fix makes sense 🎉 I did spend some time trying to actually find the part of the PR that was relevant to the fix vs the runc issue. Could you split the busy loop fix into a separate commit from the runc fixes (and maybe from the tests as well if there's the potential to cherry-pick just the busy loop fix), to make it a bit easier to trace back in the history?

Yeah, that is correct, we have two fixes here. One for the busy loop, one for zombie processes via runc.Exec. I can certainly split up the patch. One patch for the runc.Exec fix, one patch for the busy loop + test.

Am I right in guessing that we could just take the busy loop break for cherry-pick, and leave the tests and runc changes on master? Bit frustrating to not take the tests as well, but from my understanding, we'd need the runc changes as well (which seem a bit riskier to cherry-pick to me).

I suspect we really should cherry-pick both fixes. The zombie processes accumulating from context cancel for gateway exec processes is not great and leads to solves "getting stuck". In practice, I don't think gateway containers with a TTY are used much, so that is likely why we have not seen many problems with this, but both issues are pretty frustrating when trying to use gateway containers.

@coryb
Copy link
Collaborator Author

coryb commented Mar 16, 2023

I have have split out the runc.Exec fix into #3722, we will need that merged first before the tests in this PR will pass.

This fixes an issue where the tty message handling loop will go into a
tight loop and never exit upon context.Canceled.  There is select
statement in `(*procMessageForwarder).Recv` that returns nil on
ctx.Done, but the control loop in `(*container).Start` did not exit on
this condition.  I think the intent was to flush out any inflight
messages on cancel, but this is already done in
`(*procMessageForwarder) Close`.

Signed-off-by: coryb <cbennett@netflix.com>
@tonistiigi tonistiigi merged commit eb5a51e into moby:master Mar 22, 2023
@crazy-max crazy-max added this to the v0.11.5 milestone Mar 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants