-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Description
What version of Go are you using (go version)?
devel 2770c50
(but it's not a Go problem)
What operating system and processor architecture are you using (go env)?
linux amd64
What did you do?
You can only test this as root so I'm putting the simple program here. It's dead simple. It starts a bash with Unshareflags set to CLONE_NEWS. This worked several years ago, but it stopped working on my ubuntu 16 with systemd.
package main
import (
"os"
"os/exec"
"syscall"
)
func main() {
c := exec.Command("/bin/bash")
c.Stdin, c.Stdout, c.Stderr = os.Stdin, os.Stdout, os.Stderr
c.SysProcAttr = &syscall.SysProcAttr{Unshareflags: syscall.CLONE_NEWNS}
c.Run()
}
What did you expect to see?
Once you're in the bash, do this:
mkdir x
mount -t proc none x
ls x
You should see what you see in /proc.
Now exit the shell
^D
and look at x again.
ls x
On all systems wihout systemd, the mount is gone.
What did you see instead?
And, if you're on a machine with systemd, you'll see that x is ... still there.
So, you carefully set up CLONE_NEWNS, Go packages follow all the rules, but systemd broke it.
More discussion here. http://gittup.org/blog/2015/10/16-linux-namespacing-pitfalls/
Key point:
"It turns out that the systemd developers decided to override the kernel's default setting of 'private' to their own default setting of 'shared'. This means that on Linux machines with systemd, the default is shared (filesystem namespaces don't work out of the box), while on Linux machines without systemd, the default is private (filesystem namespaces work out of the box). Essentially, systemd decided to make it so that there is no default that end programs can rely on. All programs must instead mark the root filesystem as private if they want private namespaces, or as shared if they want shared namespaces if they want to work across all Linux distributions. I'm pretty sure this was done to frustrate as many people as possible."
From my POV, this is really a Linux mistake, as it added a knob that should not exist. Sure, systemd is our current culprit for using that knob, but it could have been anything. By adding that knob, Linux created a situation that was going to bite us sooner or later.
There is a fix, which I'll try to get in today. Basically, in the case that Unshareflags include CLONE_NEWNS, we have to
mount("none", "/", NULL, MS_REC|MS_PRIVATE, NULL)
i.e. change the attributes of /. Note that the standard unshare command does this today.