Handle exec being called before a container is initialized correctly. #379

wrouesnel · 2016-10-26T17:19:35Z

Running a command like the following:

cid=$(./docker run -tid busybox sh) && \
    echo $cid && ./docker exec -ti $cid echo hello

would block indefinitely because we would return success to docker, but fail because the exec is attempted before the container is started up.

This leads to the docker daemon blocking indefinitely waiting for an exec command it believes was successful via containerd.

This patch solves the problem by blocking the containerd responses until the VM startup has returned and moved to process wait (or else returning an error if it fails at this stage). This resolves deadlocks into docker and allows the above command to succeed.

Closes #279

gao-feng · 2016-10-28T10:07:04Z

supervisor/container.go


 	go func() {
 		c.ownerPod.podStatus.AddExec(c.Id, processId, "", spec.Terminal)
 		err := c.ownerPod.vm.AddProcess(c.Id, processId, spec.Terminal, spec.Args, spec.Env, spec.Cwd, p.stdio)


vm.AddProcess will block until process is finished, this change will block AddProcess. And seems like we double call the WaitForFinish, this is unexpected.

IMO, EventProcessStart & EventExit seems in the right order & place. even vm.AddProcess failed, we will send out EventExit.

This was motivated by it being unclear what the expectation higher up containerd is if AddProcess fails. If we fail to ever start the process, then an attempt at recovering containers later isn't going to find it since it was probably a runv/hyperstart error. So better not to notify it was added until we're sure it was.

unfortunately vm.AddProcess exits after process is exited in current codes. the EventProcessStart in this pr is sent out too late

I'll see if I can concoct a test to catch this situation so I can avoid it (my point testing was based on the noted ticket, which of course exits quite quickly). Once #385 is merged that'll bring AddContainer and AddProcess inline in behavior, which should make the fix here more obvious.

gao-feng · 2016-10-28T10:11:31Z

supervisor/container.go

+func (c *Container) run(p *Process) error {
+	// Receives the early result of the attempt to start the container,
+	// which ensures that we do not return before EventContainerStart
+	// is emitted.


Yes, you are right, this part looks good to me, one tips, we can move c.start out of goroutine and leave c.wait inside goroutine to get rid of channel.

bad news is vm.NewContainer in c.start doesn't guarantee container is started in vm, it just passes out the new container message, we still need another patch to fix this.

I've noticed that - this fix is still racey in that regard, but it seems to mostly work. In general the problem seems to be there's a missing "container started" message that hyperstart needs to be returning.

we has the "container started" message in hyperstart as "ack to new container" message, but runv doesn't wait for it.

Looks like you fix that in #385 :)

wrouesnel · 2017-01-06T03:55:26Z

This is a far simpler patch that takes advantage of the new hyperstart / runv changes. I've confirmed without it #279 is still an issue, with it, #279 is fixed.

laijs · 2017-01-12T08:13:28Z

c.run(p) is called in the supervisor RWMutext, I would mind if it is blocked too long, otherwise I would not use go routine to create container. Is there any other solution?

wrouesnel · 2017-01-13T06:17:43Z

Seems unavoidable to me - we can't return from the supervisor until we know the process has actually launched. Under all normal circumstances this is quick - and certainly much better behavior then now where you block indefinitely if it fails.

laijs · 2017-01-13T09:33:08Z

you can add a 'started chan error' argument to hp.createContainer() and c.run() and "doing receive from the chan" after the lock is unlocked.

WDYT?

wrouesnel · 2017-01-16T21:20:13Z

I looked into doing this, and it feels like a very leaky abstraction at the Supervisor level. I'm going to look into splitting the global supervisor lock into a finer-grained pod/container lock so concurrent requests can start/addProcesses to containers.

laijs · 2017-08-03T14:27:15Z

@wrouesnel runv cli is being improved continuously and had just been refactored #537 . As a result, it seams that the changes in this pr is outdated.

Close this PR. If the problem mentioned still exists, a new PR could be created for it.

Thank you for your contribution.

wrouesnel mentioned this pull request Oct 26, 2016

Improve supervisor eventlog locking and API compliance. #378

Closed

wrouesnel force-pushed the make-rapid-exec-sane branch from bc54110 to 36938ee Compare October 27, 2016 13:51

wrouesnel mentioned this pull request Oct 27, 2016

runv-containerd hangs #279

Closed

wrouesnel force-pushed the make-rapid-exec-sane branch 3 times, most recently from 8f8fa52 to 03142f7 Compare October 27, 2016 14:52

gao-feng reviewed Oct 28, 2016

View reviewed changes

Improved rapid exec fix.

ed42f25

wrouesnel force-pushed the make-rapid-exec-sane branch from 03142f7 to ed42f25 Compare January 6, 2017 03:53

laijs closed this Aug 3, 2017

Handle exec being called before a container is initialized correctly. #379

Handle exec being called before a container is initialized correctly. #379

Uh oh!

Conversation

wrouesnel commented Oct 26, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gao-feng Oct 28, 2016

Choose a reason for hiding this comment

Uh oh!

wrouesnel Oct 30, 2016

Choose a reason for hiding this comment

Uh oh!

gao-feng Oct 31, 2016

Choose a reason for hiding this comment

Uh oh!

wrouesnel Oct 31, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gao-feng Oct 28, 2016

Choose a reason for hiding this comment

Uh oh!

wrouesnel Oct 30, 2016

Choose a reason for hiding this comment

Uh oh!

gao-feng Oct 31, 2016

Choose a reason for hiding this comment

Uh oh!

wrouesnel Oct 31, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wrouesnel commented Jan 6, 2017

Uh oh!

laijs commented Jan 12, 2017

Uh oh!

wrouesnel commented Jan 13, 2017

Uh oh!

laijs commented Jan 13, 2017

Uh oh!

wrouesnel commented Jan 16, 2017

Uh oh!

laijs commented Aug 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wrouesnel commented Oct 26, 2016 •

edited

Loading

wrouesnel Oct 31, 2016 •

edited

Loading

wrouesnel Oct 31, 2016 •

edited

Loading

laijs commented Aug 3, 2017 •

edited

Loading