-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] containerdexecutor: add network namespace callback #3254
[RFC] containerdexecutor: add network namespace callback #3254
Conversation
To preempt the obvious question: runcexecutor could also implement the same behaviour, albeit with a larger diff and increased overhead, by switching from |
// [network.Namespace] which also implements this interface, the containerd | ||
// executor will run the callback at the appropriate point in the container | ||
// lifecycle. | ||
type NetworkNamespaceHooker interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're not sure yet if we need this; perhaps un-export it for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer it be exported so that implementations can assert that they implement the interface such that the interface being changed/removed will result in a compile error.
var _ containerdexecutor.OnCreateRuntimer = (*MyNamespaceImpl)(nil)
As discussed in the maintainers meeting:
Note that we are about to close |
Update: I missed the comment about I think best performing solution for the 95% case for this is to use |
It's going to be more invasive than I'd thought to make this work with the runc executor. It currently uses foreground mode, but would have to use detached mode in order to separate |
In order to support identity mapping and user namespaces, the Moby project needs to defer the creation of a container's network namespace to the runtime and hook into the container lifecycle to configure the network namespace before the user binary is started. The standard way to do so is by configuring a `createRuntime` OCI lifecycle hook, in which the OCI runtime executes a specified process in the runtime environment after the container has been created and before it is started. In the case of Moby the network namespace needs to be configured from the daemon process, which necessitates that the hook process communicate with the daemon process. This is complicated and slow. All the hook process does is inform the daemon of the container's PID and wait until the daemon has finished applying the network namespace configuration. There is an alternative to the `createRuntime` OCI hook which containerd clients can take advantage of. The `container.NewTask` method is directly analogous to the OCI create operation, and the `task.Start` method is directly analogous to the OCI start operation. Any operations performed between the `NewTask` and `Start` calls are therefore directly analogous to `createRuntime` OCI hooks, without needing to execute any external processes! Provide a mechanism for network.Namespace instances to register a callback function which can be used to configure a container's network namespace instead of, or in addition to, `createRuntime` OCI hooks. Signed-off-by: Cory Snider <csnider@mirantis.com>
bf1c3d1
to
b5fdf90
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in maintainers meeting, on moby side this should be followed up with:
- if NOT userns: keep runcexector, remove libnetwork hook and write netns directly to OCI spec. In the future netns can be pooled as well like the CNI bridge does today.
- if userns: switch to containerdexecutor. Remove libnetwork hook and replace with the detection on this new interface that gets the netns path between create and start.
In order to support identity mapping and user namespaces, dockerd needs to defer the creation of a container's network namespace to the runtime and hook into the container lifecycle to configure the network namespace before the user binary is started. The standard way to do so is by configuring a
createRuntime
OCI lifecycle hook, in which the OCI runtime executes a specified process in the runtime environment after the container has been created and before it is started. In the case of dockerd the network namespace needs to be configured from the daemon process, which necessitates that the hook process communicate with the daemon process. This is complicated and slow. All the hook process does is inform the daemon of the container's PID and wait until the daemon has finished applying the network namespace configuration, but this requires IPC and synchronization.There is an alternative to the
createRuntime
OCI hook which containerd clients can take advantage of. Thecontainer.NewTask
method is directly analogous to the OCI create operation, and thetask.Start
method is directly analogous to the OCI start operation. Any operations performed between theNewTask
andStart
calls are therefore directly analogous tocreateRuntime
OCI hooks, without needing to execute any external processes! Provide a mechanism for network.Namespace instances to register a callback function which can be used to configure a container's network namespace instead of, or in addition to,createRuntime
OCI hooks.(RFC because containerdexecutor does not have ID mapping wired up, though AIUI that will need to change as dockerd needs to be migrated over to containerdexecutor as part of the containerd snapshotter integration project and ID mapping is supported with the runcexecutor currently integrated into dockerd.)