Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: on Linux, better do not treat the initial thread/task group leader as any other thread/task #53210

Closed
thediveo opened this issue Jun 2, 2022 · 2 comments

Comments

@thediveo
Copy link

thediveo commented Jun 2, 2022

What version of Go are you using (go version)?

$ go version
go version go1.18.2 linux/arm64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/home/harald/.cache/go-build"
GOENV="/home/harald/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/harald/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/harald/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/snap/go/9768"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/snap/go/9768/pkg/tool/linux_arm64"
GOVCS=""
GOVERSION="go1.18.2"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1573233552=/tmp/go-build -gno-record-gcc-switches"

Note

This issue is based on the discussion LockOSThread, switching (Linux kernel) namespaces: what happens to the main thread...? in golang-nuts, where I was asked by Ian Lance Tayler to file an issue here.

The service as well as the namespace switching Go modules use cgo, in case this might be relevant to this issue discussion.

What did you do?

A Go service enters multiple Linux-kernel network namespaces in order to gather various networking data and statistics. It does so from an HTTP service handler in two different ways: first, by locking the OS thread of the current (handler) goroutine, switching the thread's network namespace, gathering information, switching back into the goroutine thread's original network namespace, and unlocking the OS thread. It uses the open source Go module github.com/thediveo/lxkns for this, and ops.Visit in particular. Corresponding unit test checks (using github.com/thediveo/namspill) ensure that the afterwards all threads are actually correctly in their original states regarding their network namespace attachments.

Second, for some operations as part of the service handler, the handler creates new goroutines that also get OS thread locked and switched into other network namespaces, but never unlocked so that when the corresponding goroutines have gathered their data, they simply end, which should terminate the associated tainted threads. This uses the ops.Execute function in the aforementioned lxkns Go module. Again, a series of unit tests with namespace checks ensure that during testing no network namespace changes end up in unlocked threads and thus unsuspecting goroutines.

What did you expect to see?

After the service handler finishes and the goroutines it had created as part of its service have all wound down, no thread of the service program/process should be switched into a different network namespace as when originally starting the program/process.

What did you see instead?

The initial thread -- or task group leader -- that represents the whole program/process has its network namespace switched and not attached to the original network namespace at start anymore. Based on the Go module used for namespace switching any error in switching between namespaces is reported and logged by the service, or caught while under unit test. No namespace switching errors are logged in production, nor while under test. The fault with the initial thread, however, is reproducible in production, but unfortunately not (yet) under test.

What happens when a Goroutine locks the initial thread and terminates while under lock?

Based on the above mentioned discussion in golang-nuts, the Go scheduler seems to assume that all threads of a process are equal. However, with respect to Linux, this isn't "fully true". The initial task of a process is the (what I think is called) "task group leader". Terminating this thread, such as in the process of a terminated thread-locked goroutine, has consequences which other threads that aren't task group leaders do not exhibit. For instance, some process-related elements in the /proc filesystem become unavailable when the task group leader terminates, while the process with the other tasks still lives on. According to Michael Kerrisk's excellent man page for proc(5):

  • proc/[pid]/cwd: "In a multithreaded process, the contents of this symbolic link are not available if the main thread has already terminated (typically by calling pthread_exit(3))."
  • /proc/[pid]/exe: "In a multithreaded process, the contents of this symbolic link are not available if the main thread has already terminated (typically by calling pthread_exit(3))." (Michael is copy and pasting here)
  • /proc/[pid]/fd/: "In a multithreaded process, the contents of this symbolic link are not available if the main thread has already terminated (typically by calling pthread_exit(3))." (me envisioning Michael starting to wear out Ctrl-V)
    (I think he forgot /proc/[pid]/fdinfo/ as one is the other's evil twin)
  • /proc/[pid]/root: "In a multithreaded process, the contents of this symbolic link are not available if the main thread has already terminated (typically by calling pthread_exit(3))."
  • /proc/[pid]/task: guess what ... yes ... "In a multithreaded process, the contents of this symbolic link are not available if the main thread has already terminated (typically by calling pthread_exit(3))."

The vast majority of Go programs will never need to worry about this situation; however, some system-level tools might need to.

Should the Go scheduler handle the initial task differently in order to avoid pulling part of a process' /proc information from under its feed? If I'm not mistaken one of the side effects of terminating a goroutine that has been scheduled to the initial thread and locked is a crash, based on comments in the runtime sources (IIRC).

Or would the workaround be to advise developers to lock the initial thread immediately in an init function to the initial goroutine and then to make sure that all potentially "deadly" activities always happen on new goroutines that are then locked to other threads, but never to the already locked initial thread? Would this be reliable?

@prattmic
Copy link
Member

prattmic commented Jun 2, 2022

To paraphrase, what you are saying is that the runtime should never exit the thread group leader? (This is already true, see below)

More precisely, it perhaps shouldn't allow locking to the thread group leader in the first place (force the goroutine to switch threads first) so that it never needs to exit. Is my understanding correct?

If I'm not mistaken one of the side effects of terminating a goroutine that has been scheduled to the initial thread and locked is a crash

We actually already don't allow the thread group leader to exit, see here. So is your concern with /proc still relevant? Are you worried that tools will look at /proc/[pid] files and think that the entire thread group is inside of some namespace because the thread group leader is still in that namespace after a locked goroutine exited without leaving the namespace?

cc @golang/runtime

@thediveo
Copy link
Author

thediveo commented Jun 2, 2022

@prattmic many thanks for the quick reply! More so as this explains what I was seeing in production. I totally overlooked how mexit handles m0 especially.

So, my concern isn't relevant (anymore).

And under this condition my "workaround" by locking the initial goroutine to the initial thread/task/m0 looks like the correct thing to do.

Not allowing to lock the thread group leader would actually make it impossible to avoid the situation I've seen in production, as there's otherwise no control over m0 not (accidentally) getting scheduled to a goroutine that want to go to its grave with its personal thread in hand.

Thank you very much again for the quick answer and satisfying explanation, so I am closing this issue fully satisfied 😀

@thediveo thediveo closed this as completed Jun 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants