-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set libnetwork sandbox key w/o OCI hooks #44385
base: master
Are you sure you want to change the base?
Conversation
84bf8df
to
18222cb
Compare
I realized that builder-next can utilize user namespaces via the
That leaves us with a few options:
|
@corhere Can you explain more about why the @thaJeztah How does one trigger userns CI for this PR? |
For reasons I do not fully understand, runC fails to create a container with a spec that both sets |
I wonder if we should have a separate codepath for userns and non-userns in this case. Eventually I'd like to have the same netns pooling in buildkit in dockerd that is in upstream as it is the most performant but based on your description that does not seem possible. I assume that dockerd making the user namespace fd itself as well and passing it to runc(then I would assume it can create netns associated with same userns) isn't an option either? |
Multithreaded processes cannot change their user namespace. The kernel will refuse to setns or unshare the user namespace of any task which shares its virtual memory space with any other task. So while dockerd technically could make the user namespace fd itself, it would have to fork a whole new process to do so. I don't see any advantages over deferring user-namespace creation to the runtime. We could pool and reuse runtime-created user and network namespaces if we wanted to; the daemon merely has to persist them through the usual mechanisms (holding open an fd, bind-mounting) before the container is stopped. |
4c4b641
to
96e4118
Compare
I made progress towards #44690, only to discover that |
96e4118
to
ead1361
Compare
Signed-off-by: Cory Snider <csnider@mirantis.com>
The options required by the executor depend on the platform, and soon will also depend on the values of other options. Give the executor constructor the flexibility to pull whatever options it needs out of the Opts struct. Signed-off-by: Cory Snider <csnider@mirantis.com>
Have libnetwork create the network namespace and pass the path to the namespace to runC. Switch to buildkit's containerd executor when userns-remapping is enabled so the in-process OnCreateRuntime callback can be used in place of the OCI hook. Signed-off-by: Cory Snider <csnider@mirantis.com>
Signed-off-by: Cory Snider <csnider@mirantis.com>
ead1361
to
6700cc1
Compare
Unfortunately, runc invokes the prestart OCI hooks before it applies the sysctls in the container spec, contrary to what the OCI runtime spec says MUST be done. Moving setting the libnetwork sandbox key to after the create operation completes is therefore a breaking change. |
- What I did
Made another reexec go away while at the same time potentially improving compatibility with some OCI runtimes.
- How I did it
I made it possible for consumers of libcontainerd to create tasks without immediately starting the user process so that they can run arbitrary code before the process starts. Anything that could be done with a
prestart
orcreateProcess
OCI hook can now also be implemented without one. I then replaced thelibnetwork-setkey
OCI hook with an in-daemon equivalent.- How to verify it
CI
- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)