New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: OS thread appears to be re-used despite never releasing runtime.LockOSThread() #28979

Open
sipsma opened this Issue Nov 28, 2018 · 8 comments

Comments

Projects
None yet
5 participants
@sipsma

sipsma commented Nov 28, 2018

What did you do?

In Go 1.11.2, spawned a goroutine, called runtime.LockOSThread() and entered a new mount namespace via syscall.Unshare. runtime.UnlockOSThread() was never called, which according to the docs should mean that the locked thread is never re-used.

However, it appears that a separate goroutine started at the beginning of the process (which never calls unshare or does anything with mounts) will after some time be executing in a mount namespace that was created by the independent goroutine mentioned above (which is not expected to be leaking OS threads). This seems to indicate that the OS thread is somehow leaking for re-use by other goroutines despite never calling runtime.UnlockOSThread().

This behavior was originally observed in a larger piece of software, but I was able to reproduce it using the code below. The code:

  • Creates two files
    • /tmp/test-go-thread-leak/source, with file contents "source"
    • /tmp/test-go-thread-leak/dest, with file contents "dest"
  • From main, starts a goroutine that checks every second that the contents of /tmp/test-go-thread-leak/dest have not changed. If the contents changed from the expected value "dest" to "source", it panics, crashing the program.
  • From main, runs in an infinite loop a function that spins off a separate goroutine that:
    • Calls runtime.LockOSThread()
    • Enters a new mount namespace with Unshare
    • Bind mounts /tmp/test-go-thread-leak/source to /tmp/test-go-thread-leak/dest. This means that any threads/processes in the newly created mount namespace will now see the contents of "dest" have changed to be "source".

The goroutine that checks the contents of /tmp/test-go-thread-leak/dest should never read the contents as "source" unless it has suddenly begun executing on an OS thread using one of the mount namespaces created by the other goroutine. However, after 1-10s of execution, the program will crash due to the goroutine reading "source" instead of destination, indicating the OS thread leaked.

Code was compiled with just go build main.go and then run with sudo ./main (sudo needed to make the unshare call).

package main

import (
    "fmt"
    "io/ioutil"
    "os"
    "path/filepath"
    "runtime"
    "syscall"
    "time"
)

var (
    // Files created will be under this tmpRoot.
    // Any existing path at tmpRoot will be deleted
    // on start.
    tmpRoot = "/tmp/test-go-thread-leak"

    sourcePath = filepath.Join(tmpRoot, "source")
    sourceContents = "source"

    destPath = filepath.Join(tmpRoot, "dest")
    destContents = "dest"
)

func init() {
    // Set up the directories used to test and print some debugging information
    if err := os.RemoveAll(tmpRoot); err != nil {
        panic(err.Error())
    }

    if err := os.Mkdir(tmpRoot, 0777); err != nil {
        panic(err.Error())
    }

    if err := ioutil.WriteFile(sourcePath, []byte(sourceContents), 0777); err != nil {
        panic(err.Error())
    }

    if err := ioutil.WriteFile(destPath, []byte(destContents), 0777); err != nil {
        panic(err.Error())
    }

    fmt.Printf("Running on Go version: %s\n", runtime.Version())
    fmt.Printf("Initial mount namespace: %s\n\n", currentMountNsName())
}

// Returns the name of the mount namespace used by the thread of the calling go routine
func currentMountNsName() string {
    mountNsName, err := os.Readlink("/proc/thread-self/ns/mnt")
    if err != nil {
        panic(fmt.Sprintf("failed to get mount ns name: %v", err))
    }

    return mountNsName
}

// The function that appears to cause unexpected leaks of threads.
// It creates a separate go routine, locks to its OS thread, enters
// a new mount namespace and creates a bind mount in that new namespace.
//
// runtime.UnlockOSThread is never called, so it's expected that the OS
// thread on which this go routine executes is never re-used by other
// go-routines. That appears to not be the case though.
func tryLeak() {
    done := make(chan interface{})
    go func() {
        defer close(done)

        runtime.LockOSThread()

        // create the new namespace after locking to the os thread
        err := syscall.Unshare(syscall.CLONE_NEWNS)
        if err != nil {
            panic(err.Error())
        }

        // make sure mounts don't propagate to the previous mount namespace
        err = syscall.Mount("", "/", "", syscall.MS_REC|syscall.MS_PRIVATE, "")
        if err != nil {
            panic(err.Error())
        }

        // Bind mount the source file to the dest file. This means any thread in
        // this mount namespace will now see that "destPath" has the same
        // contents as "sourcePath"
        err = syscall.Mount(sourcePath, destPath, "", syscall.MS_BIND|syscall.MS_PRIVATE, "")
        if err != nil {
            panic(err.Error())
        }
    }()

    <-done
}

func main() {
    // This go routine checks in a loop that the contents of "destPath" have not
    // changed to the contents of "sourcePath". If the contents did change, that
    // means this go routine is now executing on an OS thread created by the
    // tryLeak function above (even though tryLeak never calls UnlockOSThread
    // and should thus never leak threads for re-use).
    go func() {
        ticker := time.Tick(time.Second)
        for range ticker {
            runtime.LockOSThread()
            readDestContents, err := ioutil.ReadFile(destPath)
            if err != nil {
                panic(err.Error())
            }

            if string(readDestContents) == sourceContents {
                panic(fmt.Sprintf("unexpectedly able to view bind mounted file. current mount namespace: %s", currentMountNsName()))
            } else {
                fmt.Printf("found file contents: %s\n", readDestContents)
            }
            runtime.UnlockOSThread()
        }
    }()

    // Try leaking an os thread every 50ms
    ticker := time.Tick(50 * time.Millisecond)
    for range ticker {
        tryLeak()
    }
}

What did you expect to see?

That the program executed indefinitely, never crashing due to the a goroutine unexpectedly executing in a mount namespace created by a different goroutine locked to its os thread.

What did you see instead?

A mount namespace leaks to the separate go routine, causing the program to crash. The time it takes for this to occur is variable but between 1 and 10 seconds. One example:

Running on Go version: go1.11.2
Initial mount namespace: mnt:[4026531840]

found file contents: dest
found file contents: dest
found file contents: dest
panic: unexpectedly able to view bind mounted file. current mount namespace: mnt:[4026532292]

goroutine 18 [running, locked to thread]:
main.main.func1()
        /home/sipsma/tmp/goleak/main.go:112 +0x31b
created by main.main
        /home/sipsma/tmp/goleak/main.go:102 +0x39

System details

go version go1.11.2 linux/amd64
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH=""
GOPROXY=""
GORACE=""
GOTMPDIR=""
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
GOROOT/bin/go version: go version go1.11.2 linux/amd64
GOROOT/bin/go tool compile -V: compile version go1.11.2
uname -sr: Linux 4.9.124-0.1.ac.198.71.329.metal1.x86_64
/lib64/libc.so.6: GNU C Library stable release version 2.12, by Roland McGrath et al.
gdb --version: GNU gdb (GDB) Amazon Linux (7.2-50.11.amzn1)
@sipsma

This comment has been minimized.

sipsma commented Nov 28, 2018

Additionally, it seems that if I make a seemingly unrelated change, the issue is no longer reproducible. The change is to just remove the done channel from tryLeak, i.e. use the same code as above but with tryLeak written as:

  func tryLeak() {
      go func() {
          runtime.LockOSThread()

          // create the new namespace after locking to the os thread
          err := syscall.Unshare(syscall.CLONE_NEWNS)
          if err != nil {
              panic(err.Error())
          }

          // make sure mounts don't propagate to the previous mount namespace
          err = syscall.Mount("", "/", "", syscall.MS_REC|syscall.MS_PRIVATE, "")
          if err != nil {
              panic(err.Error())
          }

          // Bind mount the source file to the dest file. This means any thread in
          // this mount namespace will now see that "destPath" has the same
          // contents as "sourcePath"
          err = syscall.Mount(sourcePath, destPath, "", syscall.MS_BIND|syscall.MS_PRIVATE, "")
          if err != nil {
              panic(err.Error())
          }
      }()
  }

I obviously can't know whether removing the channel changes the behavior due to a timing difference or whether there could be a bug related to a locked go routine sharing a channel with an unlocked go routine, but seems worth mentioning.

@ianlancetaylor ianlancetaylor changed the title from OS thread appears to be re-used despite never releasing runtime.LockOSThread() to runtime: OS thread appears to be re-used despite never releasing runtime.LockOSThread() Nov 28, 2018

@ianlancetaylor ianlancetaylor added this to the Go1.12 milestone Nov 28, 2018

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Nov 28, 2018

@sipsma

This comment has been minimized.

sipsma commented Nov 28, 2018

If helpful, I got some strace output showing a pid, 22828, that did unshare+mount calls followed up by a clone(2) to make a new thread (right before it calls _exit()). The thread it spawned with that clone, 22833, was the one that eventually found itself in the wrong mount namespace and panicked.

    [pid 22828] <... futex resumed> )       = 0
    [pid 22820] clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
    [pid 22821] <... nanosleep resumed> NULL) = 0
    [pid 22828] unshare(CLONE_NEWNS <unfinished ...>
    [pid 22820] <... clock_gettime resumed> {11367, 575113001}) = 0
    [pid 22821] clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
    [pid 22820] futex(0x571ce0, FUTEX_WAIT_PRIVATE, 0, {0, 52273938} <unfinished ...>
    [pid 22821] <... clock_gettime resumed> {11367, 575147924}) = 0
    [pid 22828] <... unshare resumed> )     = 0
    [pid 22832] <... nanosleep resumed> NULL) = 0
    [pid 22828] mount("", "/", "", MS_REC|MS_PRIVATE, NULL <unfinished ...>
    [pid 22821] nanosleep({0, 20000},  <unfinished ...>
    [pid 22832] futex(0xc000033640, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
    [pid 22828] <... mount resumed> )       = 0
    [pid 22828] mount("/tmp/test-go-thread-leak/source", "/tmp/test-go-thread-leak/dest", 0xc00007a06b, MS_BIND|MS_PRIVATE, NULL) = 0
    [pid 22828] futex(0xc000033640, FUTEX_WAKE_PRIVATE, 1) = 1
    [pid 22832] <... futex resumed> )       = 0
    [pid 22832] nanosleep({0, 3000},  <unfinished ...>
    [pid 22828] rt_sigprocmask(SIG_SETMASK, ~[],  <unfinished ...>
    [pid 22821] <... nanosleep resumed> NULL) = 0
    [pid 22828] <... rt_sigprocmask resumed> NULL, 8) = 0
    [pid 22821] clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
    [pid 22828] sigaltstack({ss_sp=0, ss_flags=SS_DISABLE, ss_size=0} <unfinished ...>
    [pid 22821] <... clock_gettime resumed> {11367, 575500977}) = 0
    [pid 22828] <... sigaltstack resumed> , NULL) = 0
    [pid 22821] nanosleep({0, 20000},  <unfinished ...>
    [pid 22828] clock_gettime(CLOCK_MONOTONIC, {11367, 575600016}) = 0
    [pid 22832] <... nanosleep resumed> NULL) = 0
    [pid 22828] rt_sigprocmask(SIG_SETMASK, ~[],  <unfinished ...>
    [pid 22832] rt_sigprocmask(SIG_SETMASK, ~[],  <unfinished ...>
    [pid 22828] <... rt_sigprocmask resumed> ~[KILL STOP], 8) = 0
    [pid 22832] <... rt_sigprocmask resumed> [], 8) = 0
    [pid 22828] clone( <unfinished ...>
    [pid 22832] clone( <unfinished ...>
    [pid 22821] <... nanosleep resumed> NULL) = 0
    Process 22833 attached
    Process 22834 attached
    [pid 22828] <... clone resumed> child_stack=0xc00006c000, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM) = 22833
    [pid 22833] gettid( <unfinished ...>
    [pid 22821] clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
    [pid 22834] gettid( <unfinished ...>
    [pid 22833] <... gettid resumed> )      = 22833
    [pid 22832] <... clone resumed> child_stack=0xc00006a000, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM) = 22834
    [pid 22828] rt_sigprocmask(SIG_SETMASK, ~[KILL STOP],  <unfinished ...>
    [pid 22834] <... gettid resumed> )      = 22834
    [pid 22833] arch_prctl(ARCH_SET_FS, 0xc00005ab10 <unfinished ...>
    [pid 22832] rt_sigprocmask(SIG_SETMASK, [],  <unfinished ...>
    [pid 22828] <... rt_sigprocmask resumed> NULL, 8) = 0
    [pid 22821] <... clock_gettime resumed> {11367, 575837370}) = 0
    [pid 22834] arch_prctl(ARCH_SET_FS, 0xc000033910 <unfinished ...>
    [pid 22833] <... arch_prctl resumed> )  = 0
    [pid 22832] <... rt_sigprocmask resumed> NULL, 8) = 0
    [pid 22828] _exit(0)                    = ?
    [pid 22834] <... arch_prctl resumed> )  = 0
    [pid 22833] sigaltstack(NULL <unfinished ...>
    [pid 22821] nanosleep({0, 20000},  <unfinished ...>
    [pid 22834] sigaltstack(NULL <unfinished ...>
    [pid 22833] <... sigaltstack resumed> , {ss_sp=0, ss_flags=SS_DISABLE, ss_size=0}) = 0
    [pid 22832] futex(0xc000033640, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
    [pid 22828] +++ exited with 0 +++

... (skip ahead)

    [pid 22833] write(2, "panic: ", 7panic: )      = 7
    [pid 22833] write(2, "unexpectedly able to view bind mounted file. current mount namespace: 
@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Nov 28, 2018

I think the problem is that goexit0 sets _g_.m.lockedExt = 0, then jumps to mstart, which calls mexit which calls handoffp which calls startm which calls newm. The code in newm does the right thing if the g is locked, but it's not, because the lock was cleared by goexit0.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Nov 28, 2018

@gopherbot Please open a backport issue for 1.10 and 1.11.

@gopherbot

This comment has been minimized.

gopherbot commented Nov 28, 2018

Backport issue(s) opened: #28985 (for 1.10), #28986 (for 1.11).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@gopherbot

This comment has been minimized.

gopherbot commented Dec 7, 2018

Change https://golang.org/cl/153078 mentions this issue: runtime: don't clear lockedExt on locked M when G exits

@mknyszek

This comment has been minimized.

Contributor

mknyszek commented Dec 7, 2018

@ianlancetaylor, thanks for finding the problem!

I think the solution is to just keep _g_.m.lockedExt as-is when the g exits. It feels a little naive, but I looked at all of the places _g_.m.lockedExt is actually used, and it's really just checked in newm and updated as part of LockOSThread and UnlockOSThread. At the point where the g exits and it's locked to its m, that m is really just either going to exit or get wedged if it's the main thread. Also, for the main thread case, it makes sense that it should retain its state indefinitely. The only concern I had was that the m struct still lives in the free list and that it could be used again in some meaningful way, but as it turns out, it's not.

Anyway, I put up a change. I was able to reproduce the original issue and can confirm that with my change, it's fixed.

@mknyszek mknyszek assigned mknyszek and unassigned aclements Dec 7, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment