Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support new pid namespace or ability to kill all children #65

Closed
ibuildthecloud opened this issue May 2, 2019 · 12 comments · Fixed by #67
Closed

Support new pid namespace or ability to kill all children #65

ibuildthecloud opened this issue May 2, 2019 · 12 comments · Fixed by #67

Comments

@ibuildthecloud
Copy link
Contributor

This is a follow up to our discussion at DockerCon.

Rootlesskit (and the usernetes work) has been integrated it k3s but results in several usability issues. For now it seems the simplest approach to making rootless with k3s more user friendly would be if rootkit could ensure some way that if the children of executed process died when the process died. This could be done by creating a new pid namespace or possibly with cgroup (although I don't know if rootless can modify cgroups).

@AkihiroSuda
Copy link
Member

I'm just curious, why Pdeathsig doesn't work?

@AkihiroSuda
Copy link
Member

AkihiroSuda commented May 14, 2019

Minimal reproducer for the issue

// Package main provides a reproducer for the PdeathSig issue.
//
// ```
// Terminal1$ ./pdeathsig-demo tail -f /dev/null
//
// Terminal2$ ps u
// ...
// suda       5705  0.0  0.0 102776  1844 pts/1    Sl+  02:47   0:00 ./pdeathsig-demo tail -f /dev/null
// suda       5709  0.0  0.0 102776  1520 pts/1    Sl+  02:47   0:00 /proc/self/exe tail -f /dev/null
// suda       5714  0.0  0.0   8068   820 pts/1    S+   02:47   0:00 tail -f /dev/null
// ...
// Terminal2$ kill 5705
// Terminal2$ ps u
// ...
// suda       5714  0.0  0.0   8068   820 pts/1    S    02:47   0:00 tail -f /dev/null
// ...
// ```
//
// The `tail -f /dev/null` process is expected to die, but actually it doesn't, due to some reason
// related to the newuidmap binary (SUID).
//
// When SKIP_NEWUIDMAP=1 is set, the `tail -f /dev/null` process dies as expected.
package main

import (
        "os"
        "os/exec"
        "strconv"
        "syscall"
)

func main() {
        if err := xmain(); err != nil {
                panic(err)
        }
}

func xmain() error {
        if os.Getenv("child") != "" {
                return child()
        }
        return parent()
}

func parent() error {
        cmd := exec.Command("/proc/self/exe", os.Args[1:]...)
        cmd.Env = append(os.Environ(), "child=1")
        cmd.Stdin = os.Stdin
        cmd.Stdout = os.Stdout
        cmd.Stderr = os.Stderr
        cmd.SysProcAttr = &syscall.SysProcAttr{
                Pdeathsig:  syscall.SIGKILL,
                Cloneflags: syscall.CLONE_NEWUSER,
        }

        if err := cmd.Start(); err != nil {
                return err
        }

        if os.Getenv("SKIP_NEWUIDMAP") != "1" {
                if err := setupUIDMap(cmd.Process.Pid); err != nil {
                        return err
                }
        }
        return cmd.Wait()
}

func child() error {
        cmd := exec.Command(os.Args[1], os.Args[2:]...)
        cmd.Stdin = os.Stdin
        cmd.Stdout = os.Stdout
        cmd.Stderr = os.Stderr
        cmd.SysProcAttr = &syscall.SysProcAttr{
                Pdeathsig: syscall.SIGKILL,
        }
        return cmd.Run()
}

func setupUIDMap(pid int) error {
        // newuidmap typically has SUID bit
        cmd := exec.Command("newuidmap", strconv.Itoa(pid), "0", strconv.Itoa(os.Geteuid()), "1")
        return cmd.Run()
}

@cyphar @giuseppe
Is this by design or bug (of Go? kernel?) ?
Is there any workaround?

AkihiroSuda added a commit to AkihiroSuda/rootlesskit that referenced this issue May 14, 2019
The parent calls child with Pdeathsig, but it is cleared when newuidmap SUID binary is called
rootless-containers#65 (comment)

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
@AkihiroSuda
Copy link
Member

PR for the Pdeathsig issue: #66

@ibuildthecloud
Does it work for you?

@giuseppe
Copy link
Contributor

@cyphar @giuseppe
Is this by design or bug (of Go? kernel?) ?
Is there any workaround?

it seems the kernel resets the flag when the credentials for the process are changed. The prctl(PR_SET_PDEATHSIG) should be done once the user namespace is configured and the setresuid completed. It is trivial in C, but I am not sure how to achieve it in Go, maybe re-execing the process once in the namespace?

@AkihiroSuda
Copy link
Member

#66 seems working, though not sure correct

@AkihiroSuda
Copy link
Member

Released v0.4.1.

I'm closing this issue but let me know if there is still problem.

@AkihiroSuda
Copy link
Member

reopening, as we need a way to kill grandchildren under the rootlesskit child process...

@AkihiroSuda AkihiroSuda reopened this May 29, 2019
AkihiroSuda added a commit to AkihiroSuda/rootlesskit that referenced this issue May 29, 2019
Fix rootless-containers#65

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
@AkihiroSuda
Copy link
Member

PR for creating PIDNS: #67

@magikid
Copy link

magikid commented Mar 15, 2020

With the merging of #67 for the --pidns flag, does that mean that this section of the docs is no longer valid?

Once you kill K3s and then start a new instance of K3s it will create a new network namespace, but it doesn’t kill the old pods. So you are left with a fairly broken setup.

@gldraphael
Copy link

@magikid I'm wondering the same too, since they link to this issue.

@virtuald
Copy link

The usage of Pdeathsig in #66 isn't quite right. It will guarantee that the child dies, but it won't ensure that the parent doesn't accidentally kill the child. See golang/go#27505 for details.

The correct way to use Pdeathsig would be to launch a goroutine, lock it to the OS thread, start the child in that goroutine, and don't exit the goroutine until the child exits. There's a more complete example in the issue I linked above.

FWIW, if you're launching lots of children, it might make sense to use a single goroutine locked to an OS thread to launch all of the potential children.

@AkihiroSuda
Copy link
Member

Thanks, opened #182 for tracking

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants