Skip to content

setns: support user namespaces#13359

Merged
copybara-service[bot] merged 1 commit into
masterfrom
test/cl926411968
Jun 4, 2026
Merged

setns: support user namespaces#13359
copybara-service[bot] merged 1 commit into
masterfrom
test/cl926411968

Conversation

@copybara-service
Copy link
Copy Markdown

setns: support user namespaces

User namespace entries under /proc/[pid]/ns currently render as fake
namespace symlinks. They look like the other namespace files, but opening
them does not produce an nsfs file that setns(2) can use. Rootless
container tools such as buildah and podman rely on that file when they
re-enter the pause process user namespace, so the second lifecycle command
fails with EINVAL.

Make UserNamespace implement vfs.Namespace and give each user namespace
an nsfs inode when it is created. /proc/[pid]/ns/user now uses the
regular namespace symlink path, so opening it returns a joinable namespace
file instead of a fake link target.

Setns now accepts CLONE_NEWUSER from both nsfds and pidfds. It
follows the Linux restrictions for user namespace joins by rejecting the
caller's current user namespace, requiring CAP_SYS_ADMIN in the target
user namespace, rejecting multithreaded callers, and rejecting callers with
fs state shared outside the thread group. The capability checks for any
other namespaces in the same setns call use the credentials the caller
would have after joining the user namespace.

Add a syscall regression test that creates a child user namespace, opens
/proc/<pid>/ns/user, and verifies that setns(CLONE_NEWUSER) succeeds.

Fixes #13314

FUTURE_COPYBARA_INTEGRATE_REVIEW=#13323 from shayonj:issue-13314-userns-setns 8060b5f

@copybara-service copybara-service Bot added the exported Issue was exported automatically label Jun 4, 2026
@copybara-service copybara-service Bot force-pushed the test/cl926411968 branch 3 times, most recently from 4a60d6a to 0eb7e73 Compare June 4, 2026 20:02
User namespace entries under `/proc/[pid]/ns` currently render as fake
namespace symlinks. They look like the other namespace files, but opening
them does not produce an `nsfs` file that `setns(2)` can use. Rootless
container tools such as `buildah` and `podman` rely on that file when they
re-enter the pause process user namespace, so the second lifecycle command
fails with `EINVAL`.

Make `UserNamespace` implement `vfs.Namespace` and give each user namespace
an `nsfs` inode when it is created. `/proc/[pid]/ns/user` now uses the
regular namespace symlink path, so opening it returns a joinable namespace
file instead of a fake link target.

`Setns` now accepts `CLONE_NEWUSER` from both `nsfd`s and `pidfd`s. It
follows the Linux restrictions for user namespace joins by rejecting the
caller's current user namespace, requiring `CAP_SYS_ADMIN` in the target
user namespace, rejecting multithreaded callers, and rejecting callers with
`fs` state shared outside the thread group. The capability checks for any
other namespaces in the same `setns` call use the credentials the caller
would have after joining the user namespace.

Add a syscall regression test that creates a child user namespace, opens
`/proc/<pid>/ns/user`, and verifies that `setns(CLONE_NEWUSER)` succeeds.

Fixes #13314

COPYBARA_INTEGRATE_REVIEW=#13323 from shayonj:issue-13314-userns-setns 8060b5f
PiperOrigin-RevId: 926855389
@copybara-service copybara-service Bot merged commit 3f949a4 into master Jun 4, 2026
0 of 2 checks passed
@copybara-service copybara-service Bot deleted the test/cl926411968 branch June 4, 2026 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

exported Issue was exported automatically

Projects

None yet

Development

Successfully merging this pull request may close these issues.

/proc/[pid]/ns/user is not usable with setns

1 participant