Implement create and start #827

crosbymichael · 2016-05-16T23:06:31Z

This implements create and start in runc without the need for a unix socket or other complexity. It implements it by blocking the init process waiting on a SIGCONT before the users process is started.

This does not remove hooks, that can be done separately. This also updates the libcontainer stats to correctly report if the container is created vs running(user code).

It does not bind mount namespaces, if you want namespaces to be bind mounted then you can write code to bind them after create returns and before calling start.

It retains the current functionality of start today by adding a runc run command that does the same workflow as today.

Closes #506

julz · 2016-05-16T23:16:08Z

Nice.

crosbymichael · 2016-05-16T23:18:06Z

@julz thanks, plz take the time to play with it, runc is fully operational. lets make sure that it will fit and work with all of our needs but I think this implementation is simple and clean and will work fine.

wking · 2016-05-17T03:50:40Z

start.go

-
-		status, err := startContainer(context, spec)
-		if err != nil {
+		if err := container.Signal(syscall.SIGCONT); err != nil {


This probably needs at least a container.Status() guard like you added to delete in 572055d, since you only want to send SIGCONT to runtime code (and not send it after user-specified code has been executed). That's not quite enough though, since “the pending signal set is preserved across an execve(2)”. So “check the state and send SIGCONT if it was ‘created’” is going to be racy, and that race may result in user code seeing the extra SIGCONTs. A more robust solution would lock a resource (setting a flag in the state registry? Hold a Unix socket open?) when triggering code execution to avoid racing between two ‘start’ calls.

We can also simply reset the signal handler before doing the execve.

On Mon, May 16, 2016 at 09:00:22PM -0700, Kenfe-Mickaël Laventure wrote:

We can also simply reset the signal handler before doing the execve.

That happens automatically, no? Also from signal(7):

“During an execve(2), the dispositions of handled signals are reset
to the default; the dispositions of ignored signals are left
unchanged.”

What needs to happen is that after a SIGCONT is received, you block
(somehow) ‘create’ from sending further SIGCONTs, then consume any
SIGCONTs from the pending queue, and then execve the user code.

hqhq · 2016-05-20T08:54:24Z

So if host rebooted, all created containers would be gone? Is that acceptable?

wking · 2016-05-20T16:39:53Z

On Fri, May 20, 2016 at 01:54:26AM -0700, Qiang Huang wrote:

So if host rebooted, all created containers would be gone? Is that
acceptable?

That's fine for me (it's how all my other processes work ;). If you
want to restore a container, have an init system set it up on boot
(possibly using checkpoint/restore, depending on how much you want to
preserve). But that all seems like it's out of scope for the runtime
layer.

mrunalp · 2016-05-20T21:34:39Z

@hqhq I think that is okay as this isn't the same as docker create. As long as we are clear in the spec, it should be fine.

vishh · 2016-05-20T22:31:05Z

One of the use-cases for hooks was to customize mounts on demand. Would it be possible to not pivot root as part of create and switch root only on start?

wking · 2016-05-20T23:00:29Z

On Fri, May 20, 2016 at 03:31:07PM -0700, Vish Kannan wrote:

Would it be possible to not pivot root as part of create and switch
root only on start?

I don't think that's a good idea, because you may be using the
container process to hold open a namespace. In that case, there's no
reason to call ‘start’, but you still want the pivoted root in the
mount namespace.

I'm still not clear on why the pre-pivot mounts need to be dynamic,
instead of setting them up in config.json's mounts during pre-create
processing.

crosbymichael · 2016-05-24T17:29:29Z

FYI

This change makes us require go 1.6 because previous versions of go would not let you handle SIGCONT, it just ignores it and blocks forever.

mrunalp · 2016-05-24T19:07:09Z

libcontainer/integration/template_test.go

-				Device:      "mqueue",
-				Flags:       defaultMountFlags,
-			},
+			/*


Was this temporarily commented out?

This is from the issue of mqueue not working on debian kernels on the CI in userns

LK4D4 · 2016-05-25T17:44:58Z

libcontainer/container.go

+	// ContainerDestroyed - Container no longer exists,
+	// ConfigInvalid - config is invalid,
+	// ContainerPaused - Container is paused,
+	// Systemerror - System error.


SystemError

Probably comment for Start should be changed to describe why it's waiting on signal.

LK4D4 · 2016-05-25T18:10:06Z

I see that all test were changed so they use old version. Maybe couple of tests for create/start are needed.

duglin · 2016-05-25T18:39:52Z

would it be bad if we had the console flag default to /dev/pts/ptmx for runc create? From a UX perspective it would be easier to do that than to force all users to remember to supply it. W/o some value for that flag I think runc fails anyway.

crosbymichael · 2016-05-25T18:40:59Z

@duglin No, because not all containers use a TTY.

duglin · 2016-05-25T18:42:07Z

@crosbymichael how do I do a runc create w/o a --console flag? I can't seem to get it to not error out.

crosbymichael · 2016-05-25T18:42:34Z

@duglin does your spec have terminal true?

duglin · 2016-05-25T18:42:54Z

ah that was it - thanks

duglin · 2016-05-25T18:43:25Z

:-) we could still have it default to that value when terminal is true

LK4D4 · 2016-05-25T19:38:25Z

@crosbymichael I see timeout on CI and somehow it hangs now :/

crosbymichael · 2016-06-02T23:33:54Z

@cyphar thanks!

mrunalp · 2016-06-03T00:12:27Z

Tested. LGTM.

hqhq · 2016-06-03T06:51:18Z

LGTM

cyphar · 2016-06-03T10:14:03Z

One of the biggest issues I can see right off the bat is that runc create doesn't work for user namespaced containers because of #814 (you can't set any console path). While I'm okay with that being the case (for the moment while I work on fixing #814 -- which should be considered as blocking 1.0), we should definitely make it an explicit error rather than "here's some cryptic error message" (we can remove the message once we fix the console bug).

As an aside, it's quite annoying that you can't even specify --console /dev/null ... We need to fix this.

/cc @crosbymichael

crosbymichael · 2016-06-03T17:37:45Z

@cyphar that is unrelated to this PR.

I really don't think you all understand what --console does or is for if you are trying to pass it dev/null. I'll show you want to do in another PR.

cyphar · 2016-06-06T15:44:07Z

@crosbymichael explained on IRC that --console doesn't take a master, it takes a slave path and does no setup (which makes sense if you read the code). So the complaints about --console are not accurate, and our integration tests usage of --console is incorrect.

This gives us a more portable way to discover the container exit code (vs. requiring callers to use subreapers [1] or other platform-specific approaches which require knowledge of the runtime implementation). [1]: opencontainers/runc#827 (comment) Signed-off-by: W. Trevor King <wking@tremily.us>

I added this as an option in 5033c59 (Add an --id option to 'start', 2015-09-15), because some callers might want to leave ID generation to the runtime. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [1], and the coming create/start split will follow the early-exit 'create' from [2], so require an ID here. We can revisit this if we regain a long-running 'create' process. You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'start' (which will become 'create'). [1]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [2]: opencontainers/runc#827 Summary: Implement create and start Signed-off-by: W. Trevor King <wking@tremily.us>

Catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers/runtime-spec#384). One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). I still likes the long-running 'create' API because it makes collecting the exit code easier. I've proposed an 'event' operation [1] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after the 2016-07-13 meeting was to table that while we land docs for the runC API [2], and runC has an early-exit create [3]. The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. The ptrace idea in this commit is from Mrunal [4]. [1]: opencontainers/runtime-spec#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [2]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [3]: opencontainers/runc#827 Summary: Implement create and start [4]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 Signed-off-by: W. Trevor King <wking@tremily.us>

I added this as an option in 5033c59 (Add an --id option to 'start', 2015-09-15), because some callers might want to leave ID generation to the runtime. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [1], and the coming create/start split will follow the early-exit 'create' from [2], so require an ID here. We can revisit this if we regain a long-running 'create' process. You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'start' (which will become 'create'). [1]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [2]: opencontainers/runc#827 Summary: Implement create and start Signed-off-by: W. Trevor King <wking@tremily.us>

Catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers/runtime-spec#384). One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). I still likes the long-running 'create' API because it makes collecting the exit code easier. I've proposed an 'event' operation [1] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after the 2016-07-13 meeting was to table that while we land docs for the runC API [2], and runC has an early-exit create [3]. The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. The ptrace idea in this commit is from Mrunal [4]. [1]: opencontainers/runtime-spec#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [2]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [3]: opencontainers/runc#827 Summary: Implement create and start [4]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 Signed-off-by: W. Trevor King <wking@tremily.us>

…hortcut config-linux: Use the implicit link name shortcut

Address a previous TODO. And now that we are using --bundle, we no longer need to set cmd.Dir. The TODO mentions a lack of runc support, but runc supports --bundle since opencontainers/runc@3fe7d7f3 (Add create and start command for container lifecycle, 2016-05-13, opencontainers/runc#827).

Address a previous TODO. And now that we are using --bundle, we no longer need to set cmd.Dir. The TODO mentions a lack of runc support, but runc supports --bundle since opencontainers/runc@3fe7d7f3 (Add create and start command for container lifecycle, 2016-05-13, opencontainers/runc#827). Signed-off-by: W. Trevor King <wking@tremily.us>

This avoids a panic for containers that do not set Process. And even if Process was set, there is no reason to require the executable to be available *at create time* [1]. Subsequent activity could be scheduled to get a binary in place at the configured location before 'start' is called. [1]: opencontainers#827 (comment) Signed-off-by: W. Trevor King <wking@tremily.us>

GordonTheTurtle added the status/0-triage label May 16, 2016

wking reviewed May 17, 2016
View reviewed changes

crosbymichael added this to the 0.2.0 milestone May 17, 2016

crosbymichael force-pushed the create-start branch from 572055d to 627c980 Compare May 20, 2016 21:07

crosbymichael force-pushed the create-start branch from 627c980 to 285afd3 Compare May 20, 2016 22:22

crosbymichael force-pushed the create-start branch from 285afd3 to 72b4127 Compare May 23, 2016 22:51

mrunalp reviewed May 24, 2016
View reviewed changes

marcosnils mentioned this pull request May 24, 2016

libcontainer not building. #840

Closed

LK4D4 reviewed May 25, 2016
View reviewed changes

crosbymichael force-pushed the create-start branch from 41a5d5a to 1f7eef6 Compare May 25, 2016 20:18

wking mentioned this pull request Jun 3, 2016

state's status detection (based on PID lookups) is imprecise #871

Closed

crosbymichael merged commit c5060ff into opencontainers:master Jun 3, 2016

crosbymichael deleted the create-start branch June 3, 2016 17:38

wking mentioned this pull request Jun 5, 2016

config: Make 'process.args' optional opencontainers/runtime-spec#489

Closed

wking mentioned this pull request Jun 7, 2016

Use fifo for create / start instead of signal handling #886

Merged

wking mentioned this pull request Jun 20, 2016

validation: add args validation for process opencontainers/runtime-tools#116

Closed

wking mentioned this pull request Jul 11, 2016

Add initial pass at a cmd line spec opencontainers/runtime-spec#511

Closed

wking mentioned this pull request Oct 10, 2016

fix issue #228 replace start to run opencontainers/runtime-tools#232

Closed

wking mentioned this pull request Dec 1, 2016

Carry 232: fix issue #228 replace start to run opencontainers/runtime-tools#285

Closed

wking mentioned this pull request Jan 17, 2017

validate: add args validation opencontainers/runtime-tools#301

Merged

wking mentioned this pull request Feb 3, 2017

runtime: Add 'exit' to state for collecting the container exit code opencontainers/runtime-spec#677

Closed

wking mentioned this pull request Feb 27, 2017

config: Make process optional opencontainers/runtime-spec#701

Merged

stefanberger pushed a commit to stefanberger/runc that referenced this pull request Sep 8, 2017

Merge pull request opencontainers#827 from wking/implicit-link-name-s…

0239d87

…hortcut config-linux: Use the implicit link name shortcut

wking mentioned this pull request Jan 11, 2018

validation/util/container: Use --bundle (and stop requiring BundleDir) opencontainers/runtime-tools#551

Merged

wking mentioned this pull request Feb 20, 2018

*: Avoid creation panics when 'process' and 'linux' are unset #1726

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement create and start #827

Implement create and start #827

crosbymichael commented May 16, 2016 •

edited

julz commented May 16, 2016

crosbymichael commented May 16, 2016

wking May 17, 2016

mlaventure May 17, 2016

wking May 17, 2016

hqhq commented May 20, 2016

wking commented May 20, 2016

mrunalp commented May 20, 2016

vishh commented May 20, 2016

wking commented May 20, 2016

crosbymichael commented May 24, 2016

mrunalp May 24, 2016

crosbymichael May 25, 2016

crosbymichael May 27, 2016

LK4D4 May 25, 2016

LK4D4 May 25, 2016

LK4D4 commented May 25, 2016

duglin commented May 25, 2016

crosbymichael commented May 25, 2016

duglin commented May 25, 2016

crosbymichael commented May 25, 2016

duglin commented May 25, 2016

duglin commented May 25, 2016

LK4D4 commented May 25, 2016

crosbymichael commented Jun 2, 2016

mrunalp commented Jun 3, 2016

hqhq commented Jun 3, 2016 •

edited by caniszczyk

cyphar commented Jun 3, 2016 •

edited

crosbymichael commented Jun 3, 2016

cyphar commented Jun 6, 2016

Implement create and start #827

Implement create and start #827

Conversation

crosbymichael commented May 16, 2016 • edited

julz commented May 16, 2016

crosbymichael commented May 16, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hqhq commented May 20, 2016

wking commented May 20, 2016

mrunalp commented May 20, 2016

vishh commented May 20, 2016

wking commented May 20, 2016

crosbymichael commented May 24, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LK4D4 commented May 25, 2016

duglin commented May 25, 2016

crosbymichael commented May 25, 2016

duglin commented May 25, 2016

crosbymichael commented May 25, 2016

duglin commented May 25, 2016

duglin commented May 25, 2016

LK4D4 commented May 25, 2016

crosbymichael commented Jun 2, 2016

mrunalp commented Jun 3, 2016

hqhq commented Jun 3, 2016 • edited by caniszczyk

cyphar commented Jun 3, 2016 • edited

crosbymichael commented Jun 3, 2016

cyphar commented Jun 6, 2016

crosbymichael commented May 16, 2016 •

edited

hqhq commented Jun 3, 2016 •

edited by caniszczyk

cyphar commented Jun 3, 2016 •

edited