Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runc doesn't work with go1.22 #4233

Open
Tracked by #4114
cyphar opened this issue Mar 29, 2024 · 1 comment · May be fixed by #4193, #4247 or sylabs/singularity#2837
Open
Tracked by #4114

runc doesn't work with go1.22 #4233

cyphar opened this issue Mar 29, 2024 · 1 comment · May be fixed by #4193, #4247 or sylabs/singularity#2837

Comments

@cyphar
Copy link
Member

cyphar commented Mar 29, 2024

I'm opening this to act as a tracking issue for the go1.22 issue.

I suspect the only bullet-proof solution is going to be adding another re-exec after the C code finishes to re-exec the Go side of runc init (which will make runc even slower...). It really is quite unfortunate that Go's stdlib doesn't provide a lot of knobs we need to remove nsenter, and I suspect we will never be able to full switch away from nsenter...


It seems that cgo may be broken with clone(2) in go1.22.0?
golang/go#65625

So, to summarize the investigation done there -- it's a glibc bug, in fact, two bugs:

  1. pthread_self() returns wrong info after we do what we do in libct/nsenter
  2. pthread_getattr_np(pthread_self(), &attr) (which Go 1.22 calls internally) does a NULL pointer dereference, so the app gets SIGABRT.

These two bugs are apparently specific to glibc used by Ubuntu 20.04 (libc6 2.31-0ubuntu9.14) and maybe also Debian 10 (libc6 2.28-10+deb10u2), as I was able to reproduce on both. With Debian 10, it even prints error from free: free(): invalid pointer, maybe due to some extra Debian-specific patches, but still gets SIGABRT.

For some reason I was unable to repro on older Fedora (F32, glibc-2.31-2.fc32, F33, glibc-2.32-10.fc33) and Debian 11 (libc6 2.31-7).

The bad news is, every version of glibc has the bug 1 above, and https://go-review.googlesource.com/c/go/+/563379 may make it so go 1.22.x will fail runc init on every version of glibc.

Meaning, we need a workaround for that. Perhaps changing runc libct/nsenter logic in some radical way, so that pthread_self works.

Originally posted by @kolyshkin in #4193 (comment)

@AkihiroSuda AkihiroSuda mentioned this issue Apr 3, 2024
11 tasks
saschagrunert added a commit to saschagrunert/kubernetes that referenced this issue Apr 3, 2024
The glibc bug used by go 1.22 is breaking runc 1.2.0 in some cases. This
does not directly influence Kubernetes, but downstream may rely on
Kubernetes as well as runc in their build environments. This means that
forcing users to build Kubernetes with go 1.22 may break their build
toolchain, which is especially the case for container runtimes. We now
relax that restriction to allow building with go 1.21 until the runc
issue has been resolved.

Refers to: opencontainers/runc#4233

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
lifubang added a commit to lifubang/runc that referenced this issue Apr 3, 2024
… nsexec

As the description in opencontainers#4233, there is a bug in glibc, pthread_self() will
return wrong info after we do `clone(CLONE_PARENT)` in libct/nsenter,
it will cause runc can't work in `go 1.22.*`. So we use fork(2) to replace
clone(2) in libct/nsenter, but there is a double-fork in nsenter, so we
need to use `PR_SET_CHILD_SUBREAPER` to let runc can reap grand child
process in libct/nsenter.

Signed-off-by: lifubang <lifubang@acmcoder.com>
lifubang added a commit to lifubang/runc that referenced this issue Apr 3, 2024
… nsexec

As the description in opencontainers#4233, there is a bug in glibc, pthread_self() will
return wrong info after we do `clone(CLONE_PARENT)` in libct/nsenter,
it will cause runc can't work in `go 1.22.*`. So we use fork(2) to replace
clone(2) in libct/nsenter, but there is a double-fork in nsenter, so we
need to use `PR_SET_CHILD_SUBREAPER` to let runc can reap grand child
process in libct/nsenter.

Signed-off-by: lifubang <lifubang@acmcoder.com>
lifubang added a commit to lifubang/runc that referenced this issue Apr 3, 2024
As the description in opencontainers#4233, there is a bug in glibc, pthread_self()
will return wrong info after we do `clone(CLONE_PARENT)` in libct/nsenter,
it will cause runc can't work in `go 1.22.*`. So we use fork(2) to replace
clone(2) in libct/nsenter, but there is a double-fork in nsenter, so we
need to use `PR_SET_CHILD_SUBREAPER` to let runc can reap grand child
process in libct/nsenter.

Signed-off-by: lifubang <lifubang@acmcoder.com>
lifubang added a commit to lifubang/runc that referenced this issue Apr 4, 2024
As the description in opencontainers#4233, there is a bug in glibc, pthread_self()
will return wrong info after we do `clone(CLONE_PARENT)` in libct/nsenter,
it will cause runc can't work in `go 1.22.*`. So we use fork(2) to replace
clone(2) in libct/nsenter, but there is a double-fork in nsenter, so we
need to use `PR_SET_CHILD_SUBREAPER` to let runc can reap grand child
process in libct/nsenter.

Signed-off-by: lifubang <lifubang@acmcoder.com>
@lifubang lifubang linked a pull request Apr 4, 2024 that will close this issue
lifubang added a commit to lifubang/runc that referenced this issue Apr 4, 2024
As the description in opencontainers#4233, there is a bug in glibc, pthread_self()
will return wrong info after we do `clone(CLONE_PARENT)` in libct/nsenter,
it will cause runc can't work in `go 1.22.*`. So we use fork(2) to replace
clone(2) in libct/nsenter, but there is a double-fork in nsenter, so we
need to use `PR_SET_CHILD_SUBREAPER` to let runc can reap grand child
process in libct/nsenter.

Signed-off-by: lifubang <lifubang@acmcoder.com>
@cyphar
Copy link
Member Author

cyphar commented Apr 4, 2024

Copying my comment from #4234 (comment), which explains my current thinking on the issue.

I think the issue is more likely related to the thread-local storage used by pthreads. From golang/go#65625 it seems that the old TID is stored in the TLS and glibc then tries to do sched_getaffinity on that TID -- this just so happens to work in runc's case because the parent process is still alive and so pthreads gets the wrong process's information (oops). I'm a little surprised this is the case -- gettid() works correctly after fork() and while all of the pthread_* stuff is theoretically unsafe to call after fork() it's very weird that they would allow the pthread_* stuff to get outdated if they already put the work into making gettid() work (not to mention IIRC gettid() is cached in thread-local storage). Also the debugging work in golang/go#65625 indicated that it was an issue with CLONE_PARENT in particular, which makes less sense -- if the problem is outdated TLS data, I would expect the issue to occur regardless of what the parent PID is.

Ultimately, our usage of clone is technically not "safe" from a UB perspective because we aren't meant to call any async-unsafe code in a child after a fork. I suspect this is the core issue.

lifubang added a commit to lifubang/runc that referenced this issue Apr 4, 2024
As the description in opencontainers#4233, there is a bug in glibc, pthread_self()
will return wrong info after we do `clone(CLONE_PARENT)` in libct/nsenter,
it will cause runc can't work in `go 1.22.*`. So we use fork(2) to replace
clone(2) in libct/nsenter, but there is a double-fork in nsenter, so we
need to use `PR_SET_CHILD_SUBREAPER` to let runc can reap grand child
process in libct/nsenter.

Signed-off-by: lifubang <lifubang@acmcoder.com>
lifubang added a commit to lifubang/runc that referenced this issue Apr 4, 2024
As the description in opencontainers#4233, there is a bug in glibc, pthread_self()
will return wrong info after we do `clone(CLONE_PARENT)` in libct/nsenter,
it will cause runc can't work in `go 1.22.*`. So we use fork(2) to replace
clone(2) in libct/nsenter, but there is a double-fork in nsenter, so we
need to use `PR_SET_CHILD_SUBREAPER` to let runc can reap grand child
process in libct/nsenter.

Signed-off-by: lifubang <lifubang@acmcoder.com>
lifubang added a commit to lifubang/runc that referenced this issue Apr 4, 2024
As the description in opencontainers#4233, there is a bug in glibc, pthread_self()
will return wrong info after we do `clone(CLONE_PARENT)` in libct/nsenter,
it will cause runc can't work in `go 1.22.*`. So we use fork(2) to replace
clone(2) in libct/nsenter, but there is a double-fork in nsenter, so we
need to use `PR_SET_CHILD_SUBREAPER` to let runc can reap grand child
process in libct/nsenter.

Signed-off-by: lifubang <lifubang@acmcoder.com>
lifubang added a commit to lifubang/runc that referenced this issue Apr 4, 2024
As the description in opencontainers#4233, there is a bug in glibc, pthread_self()
will return wrong info after we do `clone(CLONE_PARENT)` in libct/nsenter,
it will cause runc can't work in `go 1.22.*`. So we use fork(2) to replace
clone(2) in libct/nsenter, but there is a double-fork in nsenter, so we
need to use `PR_SET_CHILD_SUBREAPER` to let runc can reap grand child
process in libct/nsenter.

Signed-off-by: lifubang <lifubang@acmcoder.com>
bcressey added a commit to bcressey/bottlerocket-sdk that referenced this issue Apr 4, 2024
Downgrade to avoid a known incompatibility with runc:
  opencontainers/runc#4233

Signed-off-by: Ben Cressey <bcressey@amazon.com>
bcressey added a commit to bcressey/bottlerocket-sdk that referenced this issue Apr 4, 2024
Downgrade to avoid a known incompatibility with runc:
  opencontainers/runc#4233

Signed-off-by: Ben Cressey <bcressey@amazon.com>
lifubang added a commit to lifubang/runc that referenced this issue Apr 13, 2024
As the description in opencontainers#4233, there is a bug in glibc, pthread_self()
will return wrong info after we do `clone(CLONE_PARENT)` in libct/nsenter,
it will cause runc can't work in `go 1.22.*`. So we use fork(2) to replace
clone(2) in libct/nsenter, but there is a double-fork in nsenter, so we
need to use `PR_SET_CHILD_SUBREAPER` to let runc can reap grand child
process in libct/nsenter.

Signed-off-by: lifubang <lifubang@acmcoder.com>
dtrudg added a commit to dtrudg/singularity that referenced this issue Apr 19, 2024
Adapted from: opencontainers/runc#4247

Execution of a container using a PID namespace can fail on certain
versions of glibc when Singularity is built with Go 1.22.

This is due to Go 1.22 performing calls using pthread_self which,
from glibc 2.25, is not updated for the current TID on clone.

Fixes sylabs#2677

-----

Original runc explanation:

Since glibc 2.25, the thread-local cache of the current TID is no
longer updated in the child when calling clone(2). This results in
very unfortunate behaviour when Go does pthread calls using
pthread_self(), which has the wrong TID stored.

The "simple" solution is to forcefully overwrite this cached value.
Unfortunately (and unsurprisingly), the layout of "struct pthread"
is strictly private and can change without warning.

Luckily, glibc (currently) uses CLONE_CHILD_CLEARTID for all forks
(with the child_tid set to the cached &PTHREAD_SELF->tid), meaning
that as long as runc is using glibc, when "runc init" is spawned
the child process will have a pointer directly to the cached value
we want to change. With CONFIG_CHECKPOINT_RESTORE=y kernels on
Linux 3.5 and later, we can simply use prctl(PR_GET_TID_ADDRESS).
For older kernels we need to memory scan the TLS structure
(pthread_self() returns a pointer to the start of the structure
so we can "just" scan it for a field containing the current TID
and assume that it is the correct field).

Obviously this is all very horrific, and if you are reading this
in the future, it almost certainly has caused some horrific bug
that I did not forsee. Sorry about that. As far as I can tell,
there is no other workable solution that doesn't also depend on the
CLONE_CHILD_CLEARTID behaviour of glibc in some way. We cannot
"just" do a re-exec after clone(2) for security reasons.

Fixes opencontainers/runc#4233
Signed-off-by: Aleksa Sarai cyphar@cyphar.com
dtrudg added a commit to dtrudg/singularity that referenced this issue Apr 19, 2024
Adapted from: opencontainers/runc#4247

Execution of a container using a PID namespace can fail on certain
versions of glibc when Singularity is built with Go 1.22.

This is due to Go 1.22 performing calls using pthread_self which,
from glibc 2.25, is not updated for the current TID on clone.

Fixes sylabs#2677

-----

Original runc explanation:

Since glibc 2.25, the thread-local cache of the current TID is no
longer updated in the child when calling clone(2). This results in
very unfortunate behaviour when Go does pthread calls using
pthread_self(), which has the wrong TID stored.

The "simple" solution is to forcefully overwrite this cached value.
Unfortunately (and unsurprisingly), the layout of "struct pthread"
is strictly private and can change without warning.

Luckily, glibc (currently) uses CLONE_CHILD_CLEARTID for all forks
(with the child_tid set to the cached &PTHREAD_SELF->tid), meaning
that as long as runc is using glibc, when "runc init" is spawned
the child process will have a pointer directly to the cached value
we want to change. With CONFIG_CHECKPOINT_RESTORE=y kernels on
Linux 3.5 and later, we can simply use prctl(PR_GET_TID_ADDRESS).
For older kernels we need to memory scan the TLS structure
(pthread_self() returns a pointer to the start of the structure
so we can "just" scan it for a field containing the current TID
and assume that it is the correct field).

Obviously this is all very horrific, and if you are reading this
in the future, it almost certainly has caused some horrific bug
that I did not forsee. Sorry about that. As far as I can tell,
there is no other workable solution that doesn't also depend on the
CLONE_CHILD_CLEARTID behaviour of glibc in some way. We cannot
"just" do a re-exec after clone(2) for security reasons.

Fixes opencontainers/runc#4233
Signed-off-by: Aleksa Sarai cyphar@cyphar.com
dtrudg added a commit to dtrudg/singularity that referenced this issue Apr 19, 2024
Adapted from: opencontainers/runc#4247

Execution of a container using a PID namespace can fail on certain
versions of glibc when Singularity is built with Go 1.22.

This is due to Go 1.22 performing calls using pthread_self which,
from glibc 2.25, is not updated for the current TID on clone.

Fixes sylabs#2677

-----

Original runc explanation:

Since glibc 2.25, the thread-local cache of the current TID is no
longer updated in the child when calling clone(2). This results in
very unfortunate behaviour when Go does pthread calls using
pthread_self(), which has the wrong TID stored.

The "simple" solution is to forcefully overwrite this cached value.
Unfortunately (and unsurprisingly), the layout of "struct pthread"
is strictly private and can change without warning.

Luckily, glibc (currently) uses CLONE_CHILD_CLEARTID for all forks
(with the child_tid set to the cached &PTHREAD_SELF->tid), meaning
that as long as runc is using glibc, when "runc init" is spawned
the child process will have a pointer directly to the cached value
we want to change. With CONFIG_CHECKPOINT_RESTORE=y kernels on
Linux 3.5 and later, we can simply use prctl(PR_GET_TID_ADDRESS).
For older kernels we need to memory scan the TLS structure
(pthread_self() returns a pointer to the start of the structure
so we can "just" scan it for a field containing the current TID
and assume that it is the correct field).

Obviously this is all very horrific, and if you are reading this
in the future, it almost certainly has caused some horrific bug
that I did not forsee. Sorry about that. As far as I can tell,
there is no other workable solution that doesn't also depend on the
CLONE_CHILD_CLEARTID behaviour of glibc in some way. We cannot
"just" do a re-exec after clone(2) for security reasons.

Fixes opencontainers/runc#4233
Signed-off-by: Aleksa Sarai cyphar@cyphar.com
dtrudg added a commit to dtrudg/singularity that referenced this issue Apr 19, 2024
Adapted from: opencontainers/runc#4247

Execution of a container using a PID namespace can fail on certain
versions of glibc when Singularity is built with Go 1.22.

This is due to Go 1.22 performing calls using pthread_self which,
from glibc 2.25, is not updated for the current TID on clone.

Fixes sylabs#2677

-----

Original runc explanation:

Since glibc 2.25, the thread-local cache of the current TID is no
longer updated in the child when calling clone(2). This results in
very unfortunate behaviour when Go does pthread calls using
pthread_self(), which has the wrong TID stored.

The "simple" solution is to forcefully overwrite this cached value.
Unfortunately (and unsurprisingly), the layout of "struct pthread"
is strictly private and can change without warning.

Luckily, glibc (currently) uses CLONE_CHILD_CLEARTID for all forks
(with the child_tid set to the cached &PTHREAD_SELF->tid), meaning
that as long as runc is using glibc, when "runc init" is spawned
the child process will have a pointer directly to the cached value
we want to change. With CONFIG_CHECKPOINT_RESTORE=y kernels on
Linux 3.5 and later, we can simply use prctl(PR_GET_TID_ADDRESS).
For older kernels we need to memory scan the TLS structure
(pthread_self() returns a pointer to the start of the structure
so we can "just" scan it for a field containing the current TID
and assume that it is the correct field).

Obviously this is all very horrific, and if you are reading this
in the future, it almost certainly has caused some horrific bug
that I did not forsee. Sorry about that. As far as I can tell,
there is no other workable solution that doesn't also depend on the
CLONE_CHILD_CLEARTID behaviour of glibc in some way. We cannot
"just" do a re-exec after clone(2) for security reasons.

Fixes opencontainers/runc#4233
Signed-off-by: Aleksa Sarai cyphar@cyphar.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant