Skip to content

Conversation

@xianzhe-databricks
Copy link
Contributor

@xianzhe-databricks xianzhe-databricks commented Mar 27, 2025

Introduction

When the sandboxed application is killed externally (for example, when it requests more memory than what cgroup allows), the whole sandbox should exit as well. We find this commit 16dba7f#diff-06bbaf439d32bed64b39754ca405a6b8fe4227925bfa92c11428b6ddfc91b962 changed this behavior. This commit led to the fact that when the application is killed, the sandbox is still hanging around.

This PR fixes it.

Step to reproduce the hang

The hang can be reproduced in any release version including the commit above. Here we use release-20250319.0 as an example.

Application:

package main

import (
	"fmt"
	"time"
)

func main() {
	fmt.Println("Application started")
	var loop = 0
	for {
		time.Sleep(2 * time.Second)
		loop++
		fmt.Printf("Application is running %d\n", loop)
	}
	fmt.Println("Shutting down")
}

And run this application with gVisor, e.g. runsc --debug-log=/tmp/runsc/ --debug --strace run test.
Then while the application is running, we kill this process with sudo kill -9 <pid> to mimic OOM. Afterwards the sandbox is still hanging.

runsc command line output

Application started
Application is running 1
Application is running 2
Application is running 3
Application is running 4
Application is running 5
Application is running 6
Application is running 7
Application is running 8
Application is running 9

Then it gets stuck after the kill command (but does not exit).

Gvisor log

runsc.log.20250327-105656.496989.boot.txt (only the last relevant parts)

I0327 10:57:28.770718       1 strace.go:576] [   2:   6] client E futex(0xc000100148, FUTEX_WAIT|FUTEX_PRIVATE_FLAG, 0x0, null, 0x0, 0x0)
I0327 10:57:28.770718       1 strace.go:576] [   2:   2] client E epoll_pwait(0x3 anon_inode:[eventpoll], 0x7eb16a44c51c {}, 0x80, 0x7cf, null, 0x0)
I0327 10:57:28.770740       1 strace.go:614] [   2:   2] client X epoll_pwait(0x3 anon_inode:[eventpoll], 0x7eb16a44c51c {{events=EPOLLOUT data=[0x65b00001, 0x7ed7610c]}{events=EPOLLOUT data=[0x66b80001, 0x7ed7610c]}}, 0x80, 0x7cf, null, 0x0) = 2 (0x2) (6.004µs)
I0327 10:57:28.770767       1 strace.go:576] [   2:   2] client E epoll_pwait(0x3 anon_inode:[eventpoll], 0x7eb16a44c51c {}, 0x80, 0x7cf, null, 0x0)
I0327 10:57:28.770768       1 strace.go:602] [   2:   3] client X nanosleep(0xc00001ff18 {sec=0 nsec=20000}, null) = 0 (0x0) (62.461µs)
I0327 10:57:28.770791       1 strace.go:576] [   2:   3] client E futex(0x5571c0, FUTEX_WAIT|FUTEX_PRIVATE_FLAG, 0x0, 0xc00001feb8 {sec=1 nsec=999872650}, 0x0, 0x0)
I0327 10:57:30.770005       1 strace.go:614] [   2:   2] client X epoll_pwait(0x3 anon_inode:[eventpoll], 0x7eb16a44c51c {}, 0x80, 0x7cf, null, 0x0) = 0 (0x0) (1.999216401s)
I0327 10:57:30.771140       1 strace.go:614] [   2:   3] client X futex(0x5571c0, FUTEX_WAIT|FUTEX_PRIVATE_FLAG, 0x0, 0xc00001feb8 {sec=1 nsec=999872650}, 0x0, 0x0) = 0 (0x0) errno=110 (connection timed out) (2.000334244s)
W0327 10:57:30.780225       1 subprocess_unsafe.go:130] the subprocess 38 exited (status: %!s(unix.WaitStatus=9), err %!s(<nil>))
W0327 10:57:30.780252       1 task_run.go:358] [   2:   2] Unexpected SwitchToApp error: systrap corrupted memory: subprocess died
D0327 10:57:30.780267       1 task_exit.go:215] [   2:   2] Transitioning from exit state TaskExitNone to TaskExitInitiated
D0327 10:57:30.780294       1 task_exit.go:215] [   2:   2] Transitioning from exit state TaskExitInitiated to TaskExitZombie
W0327 10:57:30.781250       1 task_run.go:358] [   2:   3] Unexpected SwitchToApp error: systrap corrupted memory: subprocess died
D0327 10:57:30.781266       1 task_exit.go:215] [   2:   3] Transitioning from exit state TaskExitNone to TaskExitInitiated
D0327 10:57:30.781292       1 task_exit.go:215] [   2:   3] Transitioning from exit state TaskExitInitiated to TaskExitZombie
D0327 10:57:30.781297       1 task_exit.go:215] [   2:   3] Transitioning from exit state TaskExitZombie to TaskExitDead
I0327 10:57:41.715145       1 watchdog.go:294] Watchdog starting loop, tasks: 5, discount: 0s
D0327 10:58:04.664761       1 sampler.go:162] Time: Adjusting syscall overhead up to 692
D0327 10:58:08.665378       1 sampler.go:191] Time: Adjusting syscall overhead down to 606
D0327 10:58:09.664941       1 sampler.go:191] Time: Adjusting syscall overhead down to 531
D0327 10:58:10.665540       1 sampler.go:191] Time: Adjusting syscall overhead down to 465
D0327 10:58:11.664765       1 sampler.go:191] Time: Adjusting syscall overhead down to 407
D0327 10:58:15.664958       1 sampler.go:191] Time: Adjusting syscall overhead down to 357
I0327 10:58:26.715574       1 watchdog.go:294] Watchdog starting loop, tasks: 5, discount: 0s
D0327 10:58:35.665620       1 sampler.go:191] Time: Adjusting syscall overhead down to 313

gofer.txt and run.txt do not reveal anything interesting.

@xianzhe-databricks
Copy link
Contributor Author

xianzhe-databricks commented Mar 27, 2025

@avagin could you take a look? Thanks

The changes in the commit 16dba7f#diff-06bbaf439d32bed64b39754ca405a6b8fe4227925bfa92c11428b6ddfc91b962 do not correspond to the comments

// When creating the new child process, we specify SIGKILL as the
// signal to deliver when the child exits. We never expect a subprocess
// to exit; they are pooled and reused. This is done to ensure that if
// a subprocess is OOM-killed, this process (and all other stubs,
// transitively) will be killed as well. It's simply not possible to
// safely handle a single stub getting killed: the exact state of
// execution is unknown and not recoverable.
return attachedThread(unix.CLONE_FILES|uintptr(unix.SIGCHLD), linux.SECCOMP_RET_TRAP)

@xianzhe-databricks xianzhe-databricks changed the title Sentry: exit the parent process when the subprocess is dead Sentry systrap: exit the parent process when the subprocess is dead Mar 27, 2025
@avagin
Copy link
Collaborator

avagin commented Mar 27, 2025

first commit

We do not squash commits, therefore all commits will be present in the upstream repository after merging. Please take this into account and write proper commit messages.

@avagin
Copy link
Collaborator

avagin commented Mar 27, 2025

Then while the application is running, we kill this process with sudo kill -9 to mimic OOM. Afterwards the sandbox is still hanging.

What process did you kill? Do you kill a stub process? In the strace log, we see that a guest process associated with the killed stub process exited. The sandbox didn't exit, because there was the init process. Did it wait for the killed process? Why didn't it exit? Could you attach the full sentry log?

The changes in the commit 16dba7f#diff-06bbaf439d32bed64b39754ca405a6b8fe4227925bfa92c11428b6ddfc91b962 do not correspond to the comments

If we fork a stub process with SIGKILL, the entire sentry will be killed if a stub process crashes. The commit message says that we handle stub process crashes instead of just killing the sentry.

@xianzhe-databricks
Copy link
Contributor Author

sentry.log
@avagin The full sentry is really long so I can only attach it as a file.

What process did you kill? Do you kill a stub process?

The process is a subprocess of runsc-sandbox: I think it is just go application with the infinity loop I showed in the PR description. I showed my steps here:

$ ps aux | grep "runsc"                   
root      222911  0.0  0.0  11352  5284 pts/5    S+   17:15   0:00 sudo /home/xianzhe.ma/gvisor/bin/runsc --debug-log=/home/xianzhe.ma/uds_test/logs/runsc/ --debug --strace run test
root      222912  0.1  0.0 1257208 24056 pts/5   Sl+  17:15   0:00 /home/xianzhe.ma/gvisor/bin/runsc --debug-log=/home/xianzhe.ma/uds_test/logs/runsc/ --debug --strace run test
root      222919  0.1  0.0 1256696 22972 ?       Ssl  17:15   0:00 runsc-gofer --strace=true --root=/var/run/runsc --debug=true --debug-log=/home/xianzhe.ma/uds_test/logs/runsc/ --debug-log-fd=3 gofer --bundle=/home/xianzhe.ma/uds_test/container --gofer-mount-confs=lisafs:none,lisafs:none --io-fds=7,8 --mounts-fd=5 --rpc-fd=6 --spec-fd=4 --sync-chroot-fd=-1 --sync-userns-fd=-1 --proc-mount-sync-fd=16 --apply-caps=false --setup-root=false
root      222924  3.2  0.0 3433556 36324 ?       Ssl  17:15   0:00 runsc-sandbox --root=/var/run/runsc --debug=true --debug-log=/home/xianzhe.ma/uds_test/logs/runsc/ --strace=true --debug-log-fd=3 boot --apply-caps=false --attached=true --bundle=/home/xianzhe.ma/uds_test/container --controller-fd=8 --cpu-num=32 --dev-io-fd=-1 --gofer-mount-confs=lisafs:none,lisafs:none --io-fds=4,5 --mounts-fd=6 --setup-root=false --spec-fd=9 --start-sync-fd=7 --stdio-fds=10,11,12 --total-host-memory=132941541376 --total-memory=132941541376 --product-name=m6i.8xlarge --host-thp-shmem-enabled=never --host-thp-defrag=madvise --proc-mount-sync-fd=20 test
xianzhe+  223375  0.0  0.0   7964  2680 pts/9    S+   17:15   0:00 grep --color=auto --exclude-dir=.bzr --exclude-dir=CVS --exclude-dir=.git --exclude-dir=.hg --exclude-dir=.svn --exclude-dir=.idea --exclude-dir=.tox --exclude-dir=.venv --exclude-dir=venv runsc

$ pstree -p 222924
exe(222924)─┬─exe(222962)
            ├─exe(222972)───exe(222973)
            ├─exe(222979)───exe(222980)
            ├─exe(222981)─┬─exe(222982)
            │             ├─exe(222984)
            │             └─exe(222997)
            ├─{exe}(222947)
            ├─{exe}(222948)
            ├─{exe}(222949)
            ├─{exe}(222950)
            ├─{exe}(222956)
            ├─{exe}(222957)
            ├─{exe}(222958)
            ├─{exe}(222959)
            ├─{exe}(222960)
            ├─{exe}(222961)
            ├─{exe}(222963)
            ├─{exe}(222964)
            ├─{exe}(222965)
            ├─{exe}(222966)
            ├─{exe}(222967)
            ├─{exe}(222968)
            ├─{exe}(222969)
            ├─{exe}(222970)
            ├─{exe}(222971)
            └─{exe}(222983)

$ sudo kill -9 222981

@xianzhe-databricks
Copy link
Contributor Author

@avagin regardless of the above, this is to mimic the case in production where the application running in gVisor requires more memory than what cgroup allows, and gets OOM-killed. We observed that the sandbox processes runsc-sandbox, runsc-gofer, etc. are still running and the sentry logs are similar as what I showed as a toy reproduction.

@avagin
Copy link
Collaborator

avagin commented Mar 28, 2025

In the attached log, we killed a few threads that detected their stub process had died. We actually need to kill the entire process.
Could you try out this patch:

diff --git a/google3/third_party/gvisor/pkg/sentry/kernel/task_run.go b/google3/third_party/gvisor/pkg/sentry/kernel/task_run.go
--- a/google3/third_party/gvisor/pkg/sentry/kernel/task_run.go
+++ b/google3/third_party/gvisor/pkg/sentry/kernel/task_run.go
@@ -356,7 +356,7 @@ func (app *runApp) execute(t *Task) task
        default:
                // What happened? Can't continue.
                t.Warningf("Unexpected SwitchToApp error: %v", err)
-               t.PrepareExit(linux.WaitStatusExit(int32(ExtractErrno(err, -1))))
+               t.PrepareGroupExit(linux.WaitStatusTerminationSignal(linux.SIGKILL))
                return (*runExit)(nil)
        }
 }

@avagin avagin self-requested a review March 28, 2025 00:11
@xianzhe-databricks xianzhe-databricks force-pushed the fix-hang-with-subprocess-death branch from 5c6ea8c to 162de7b Compare March 28, 2025 10:29
@xianzhe-databricks xianzhe-databricks changed the title Sentry systrap: exit the parent process when the subprocess is dead Sentry systrap: exit the entire process when one stubprocess is dead Mar 28, 2025
@xianzhe-databricks xianzhe-databricks force-pushed the fix-hang-with-subprocess-death branch from 162de7b to e7520af Compare March 28, 2025 10:31
@xianzhe-databricks
Copy link
Contributor Author

@avagin it works as well! Thanks. I revised the PR as well as the commit message. Could we merge this PR?

copybara-service bot pushed a commit that referenced this pull request Mar 28, 2025
# Introduction
When the sandboxed application is killed externally (for example, when it requests more memory than what cgroup allows), the whole sandbox should exit as well. We find this commit 16dba7f#diff-06bbaf439d32bed64b39754ca405a6b8fe4227925bfa92c11428b6ddfc91b962 changed this behavior. This commit led to the fact that when the application is killed, the sandbox is still hanging around.

This PR fixes it.

# Step to reproduce the hang

The hang can be reproduced in any release version including the commit above. Here we use `release-20250319.0` as an example.
## Application:
```go
package main

import (
	"fmt"
	"time"
)

func main() {
	fmt.Println("Application started")
	var loop = 0
	for {
		time.Sleep(2 * time.Second)
		loop++
		fmt.Printf("Application is running %d\n", loop)
	}
	fmt.Println("Shutting down")
}
```

And run this application with gVisor, e.g. `runsc --debug-log=/tmp/runsc/ --debug --strace run test`.
Then while the application is running, we kill this process with `sudo kill -9 <pid>` to mimic OOM. Afterwards the sandbox is still hanging.

## runsc command line output
```
Application started
Application is running 1
Application is running 2
Application is running 3
Application is running 4
Application is running 5
Application is running 6
Application is running 7
Application is running 8
Application is running 9
```
Then it gets stuck after the kill command (but does not exit).

## Gvisor log
runsc.log.20250327-105656.496989.boot.txt (only the last relevant parts)
```
I0327 10:57:28.770718       1 strace.go:576] [   2:   6] client E futex(0xc000100148, FUTEX_WAIT|FUTEX_PRIVATE_FLAG, 0x0, null, 0x0, 0x0)
I0327 10:57:28.770718       1 strace.go:576] [   2:   2] client E epoll_pwait(0x3 anon_inode:[eventpoll], 0x7eb16a44c51c {}, 0x80, 0x7cf, null, 0x0)
I0327 10:57:28.770740       1 strace.go:614] [   2:   2] client X epoll_pwait(0x3 anon_inode:[eventpoll], 0x7eb16a44c51c {{events=EPOLLOUT data=[0x65b00001, 0x7ed7610c]}{events=EPOLLOUT data=[0x66b80001, 0x7ed7610c]}}, 0x80, 0x7cf, null, 0x0) = 2 (0x2) (6.004µs)
I0327 10:57:28.770767       1 strace.go:576] [   2:   2] client E epoll_pwait(0x3 anon_inode:[eventpoll], 0x7eb16a44c51c {}, 0x80, 0x7cf, null, 0x0)
I0327 10:57:28.770768       1 strace.go:602] [   2:   3] client X nanosleep(0xc00001ff18 {sec=0 nsec=20000}, null) = 0 (0x0) (62.461µs)
I0327 10:57:28.770791       1 strace.go:576] [   2:   3] client E futex(0x5571c0, FUTEX_WAIT|FUTEX_PRIVATE_FLAG, 0x0, 0xc00001feb8 {sec=1 nsec=999872650}, 0x0, 0x0)
I0327 10:57:30.770005       1 strace.go:614] [   2:   2] client X epoll_pwait(0x3 anon_inode:[eventpoll], 0x7eb16a44c51c {}, 0x80, 0x7cf, null, 0x0) = 0 (0x0) (1.999216401s)
I0327 10:57:30.771140       1 strace.go:614] [   2:   3] client X futex(0x5571c0, FUTEX_WAIT|FUTEX_PRIVATE_FLAG, 0x0, 0xc00001feb8 {sec=1 nsec=999872650}, 0x0, 0x0) = 0 (0x0) errno=110 (connection timed out) (2.000334244s)
W0327 10:57:30.780225       1 subprocess_unsafe.go:130] the subprocess 38 exited (status: %!s(unix.WaitStatus=9), err %!s(<nil>))
W0327 10:57:30.780252       1 task_run.go:358] [   2:   2] Unexpected SwitchToApp error: systrap corrupted memory: subprocess died
D0327 10:57:30.780267       1 task_exit.go:215] [   2:   2] Transitioning from exit state TaskExitNone to TaskExitInitiated
D0327 10:57:30.780294       1 task_exit.go:215] [   2:   2] Transitioning from exit state TaskExitInitiated to TaskExitZombie
W0327 10:57:30.781250       1 task_run.go:358] [   2:   3] Unexpected SwitchToApp error: systrap corrupted memory: subprocess died
D0327 10:57:30.781266       1 task_exit.go:215] [   2:   3] Transitioning from exit state TaskExitNone to TaskExitInitiated
D0327 10:57:30.781292       1 task_exit.go:215] [   2:   3] Transitioning from exit state TaskExitInitiated to TaskExitZombie
D0327 10:57:30.781297       1 task_exit.go:215] [   2:   3] Transitioning from exit state TaskExitZombie to TaskExitDead
I0327 10:57:41.715145       1 watchdog.go:294] Watchdog starting loop, tasks: 5, discount: 0s
D0327 10:58:04.664761       1 sampler.go:162] Time: Adjusting syscall overhead up to 692
D0327 10:58:08.665378       1 sampler.go:191] Time: Adjusting syscall overhead down to 606
D0327 10:58:09.664941       1 sampler.go:191] Time: Adjusting syscall overhead down to 531
D0327 10:58:10.665540       1 sampler.go:191] Time: Adjusting syscall overhead down to 465
D0327 10:58:11.664765       1 sampler.go:191] Time: Adjusting syscall overhead down to 407
D0327 10:58:15.664958       1 sampler.go:191] Time: Adjusting syscall overhead down to 357
I0327 10:58:26.715574       1 watchdog.go:294] Watchdog starting loop, tasks: 5, discount: 0s
D0327 10:58:35.665620       1 sampler.go:191] Time: Adjusting syscall overhead down to 313
```
`gofer.txt` and `run.txt` do not reveal anything interesting.

FUTURE_COPYBARA_INTEGRATE_REVIEW=#11590 from xianzhe-databricks:fix-hang-with-subprocess-death e7520af
PiperOrigin-RevId: 741613636
copybara-service bot pushed a commit that referenced this pull request Mar 28, 2025
# Introduction
When the sandboxed application is killed externally (for example, when it requests more memory than what cgroup allows), the whole sandbox should exit as well. We find this commit 16dba7f#diff-06bbaf439d32bed64b39754ca405a6b8fe4227925bfa92c11428b6ddfc91b962 changed this behavior. This commit led to the fact that when the application is killed, the sandbox is still hanging around.

This PR fixes it.

# Step to reproduce the hang

The hang can be reproduced in any release version including the commit above. Here we use `release-20250319.0` as an example.
## Application:
```go
package main

import (
	"fmt"
	"time"
)

func main() {
	fmt.Println("Application started")
	var loop = 0
	for {
		time.Sleep(2 * time.Second)
		loop++
		fmt.Printf("Application is running %d\n", loop)
	}
	fmt.Println("Shutting down")
}
```

And run this application with gVisor, e.g. `runsc --debug-log=/tmp/runsc/ --debug --strace run test`.
Then while the application is running, we kill this process with `sudo kill -9 <pid>` to mimic OOM. Afterwards the sandbox is still hanging.

## runsc command line output
```
Application started
Application is running 1
Application is running 2
Application is running 3
Application is running 4
Application is running 5
Application is running 6
Application is running 7
Application is running 8
Application is running 9
```
Then it gets stuck after the kill command (but does not exit).

## Gvisor log
runsc.log.20250327-105656.496989.boot.txt (only the last relevant parts)
```
I0327 10:57:28.770718       1 strace.go:576] [   2:   6] client E futex(0xc000100148, FUTEX_WAIT|FUTEX_PRIVATE_FLAG, 0x0, null, 0x0, 0x0)
I0327 10:57:28.770718       1 strace.go:576] [   2:   2] client E epoll_pwait(0x3 anon_inode:[eventpoll], 0x7eb16a44c51c {}, 0x80, 0x7cf, null, 0x0)
I0327 10:57:28.770740       1 strace.go:614] [   2:   2] client X epoll_pwait(0x3 anon_inode:[eventpoll], 0x7eb16a44c51c {{events=EPOLLOUT data=[0x65b00001, 0x7ed7610c]}{events=EPOLLOUT data=[0x66b80001, 0x7ed7610c]}}, 0x80, 0x7cf, null, 0x0) = 2 (0x2) (6.004µs)
I0327 10:57:28.770767       1 strace.go:576] [   2:   2] client E epoll_pwait(0x3 anon_inode:[eventpoll], 0x7eb16a44c51c {}, 0x80, 0x7cf, null, 0x0)
I0327 10:57:28.770768       1 strace.go:602] [   2:   3] client X nanosleep(0xc00001ff18 {sec=0 nsec=20000}, null) = 0 (0x0) (62.461µs)
I0327 10:57:28.770791       1 strace.go:576] [   2:   3] client E futex(0x5571c0, FUTEX_WAIT|FUTEX_PRIVATE_FLAG, 0x0, 0xc00001feb8 {sec=1 nsec=999872650}, 0x0, 0x0)
I0327 10:57:30.770005       1 strace.go:614] [   2:   2] client X epoll_pwait(0x3 anon_inode:[eventpoll], 0x7eb16a44c51c {}, 0x80, 0x7cf, null, 0x0) = 0 (0x0) (1.999216401s)
I0327 10:57:30.771140       1 strace.go:614] [   2:   3] client X futex(0x5571c0, FUTEX_WAIT|FUTEX_PRIVATE_FLAG, 0x0, 0xc00001feb8 {sec=1 nsec=999872650}, 0x0, 0x0) = 0 (0x0) errno=110 (connection timed out) (2.000334244s)
W0327 10:57:30.780225       1 subprocess_unsafe.go:130] the subprocess 38 exited (status: %!s(unix.WaitStatus=9), err %!s(<nil>))
W0327 10:57:30.780252       1 task_run.go:358] [   2:   2] Unexpected SwitchToApp error: systrap corrupted memory: subprocess died
D0327 10:57:30.780267       1 task_exit.go:215] [   2:   2] Transitioning from exit state TaskExitNone to TaskExitInitiated
D0327 10:57:30.780294       1 task_exit.go:215] [   2:   2] Transitioning from exit state TaskExitInitiated to TaskExitZombie
W0327 10:57:30.781250       1 task_run.go:358] [   2:   3] Unexpected SwitchToApp error: systrap corrupted memory: subprocess died
D0327 10:57:30.781266       1 task_exit.go:215] [   2:   3] Transitioning from exit state TaskExitNone to TaskExitInitiated
D0327 10:57:30.781292       1 task_exit.go:215] [   2:   3] Transitioning from exit state TaskExitInitiated to TaskExitZombie
D0327 10:57:30.781297       1 task_exit.go:215] [   2:   3] Transitioning from exit state TaskExitZombie to TaskExitDead
I0327 10:57:41.715145       1 watchdog.go:294] Watchdog starting loop, tasks: 5, discount: 0s
D0327 10:58:04.664761       1 sampler.go:162] Time: Adjusting syscall overhead up to 692
D0327 10:58:08.665378       1 sampler.go:191] Time: Adjusting syscall overhead down to 606
D0327 10:58:09.664941       1 sampler.go:191] Time: Adjusting syscall overhead down to 531
D0327 10:58:10.665540       1 sampler.go:191] Time: Adjusting syscall overhead down to 465
D0327 10:58:11.664765       1 sampler.go:191] Time: Adjusting syscall overhead down to 407
D0327 10:58:15.664958       1 sampler.go:191] Time: Adjusting syscall overhead down to 357
I0327 10:58:26.715574       1 watchdog.go:294] Watchdog starting loop, tasks: 5, discount: 0s
D0327 10:58:35.665620       1 sampler.go:191] Time: Adjusting syscall overhead down to 313
```
`gofer.txt` and `run.txt` do not reveal anything interesting.

FUTURE_COPYBARA_INTEGRATE_REVIEW=#11590 from xianzhe-databricks:fix-hang-with-subprocess-death e7520af
PiperOrigin-RevId: 741613636
copybara-service bot pushed a commit that referenced this pull request Mar 28, 2025
# Introduction
When the sandboxed application is killed externally (for example, when it requests more memory than what cgroup allows), the whole sandbox should exit as well. We find this commit 16dba7f#diff-06bbaf439d32bed64b39754ca405a6b8fe4227925bfa92c11428b6ddfc91b962 changed this behavior. This commit led to the fact that when the application is killed, the sandbox is still hanging around.

This PR fixes it.

# Step to reproduce the hang

The hang can be reproduced in any release version including the commit above. Here we use `release-20250319.0` as an example.
## Application:
```go
package main

import (
	"fmt"
	"time"
)

func main() {
	fmt.Println("Application started")
	var loop = 0
	for {
		time.Sleep(2 * time.Second)
		loop++
		fmt.Printf("Application is running %d\n", loop)
	}
	fmt.Println("Shutting down")
}
```

And run this application with gVisor, e.g. `runsc --debug-log=/tmp/runsc/ --debug --strace run test`.
Then while the application is running, we kill this process with `sudo kill -9 <pid>` to mimic OOM. Afterwards the sandbox is still hanging.

## runsc command line output
```
Application started
Application is running 1
Application is running 2
Application is running 3
Application is running 4
Application is running 5
Application is running 6
Application is running 7
Application is running 8
Application is running 9
```
Then it gets stuck after the kill command (but does not exit).

## Gvisor log
runsc.log.20250327-105656.496989.boot.txt (only the last relevant parts)
```
I0327 10:57:28.770718       1 strace.go:576] [   2:   6] client E futex(0xc000100148, FUTEX_WAIT|FUTEX_PRIVATE_FLAG, 0x0, null, 0x0, 0x0)
I0327 10:57:28.770718       1 strace.go:576] [   2:   2] client E epoll_pwait(0x3 anon_inode:[eventpoll], 0x7eb16a44c51c {}, 0x80, 0x7cf, null, 0x0)
I0327 10:57:28.770740       1 strace.go:614] [   2:   2] client X epoll_pwait(0x3 anon_inode:[eventpoll], 0x7eb16a44c51c {{events=EPOLLOUT data=[0x65b00001, 0x7ed7610c]}{events=EPOLLOUT data=[0x66b80001, 0x7ed7610c]}}, 0x80, 0x7cf, null, 0x0) = 2 (0x2) (6.004µs)
I0327 10:57:28.770767       1 strace.go:576] [   2:   2] client E epoll_pwait(0x3 anon_inode:[eventpoll], 0x7eb16a44c51c {}, 0x80, 0x7cf, null, 0x0)
I0327 10:57:28.770768       1 strace.go:602] [   2:   3] client X nanosleep(0xc00001ff18 {sec=0 nsec=20000}, null) = 0 (0x0) (62.461µs)
I0327 10:57:28.770791       1 strace.go:576] [   2:   3] client E futex(0x5571c0, FUTEX_WAIT|FUTEX_PRIVATE_FLAG, 0x0, 0xc00001feb8 {sec=1 nsec=999872650}, 0x0, 0x0)
I0327 10:57:30.770005       1 strace.go:614] [   2:   2] client X epoll_pwait(0x3 anon_inode:[eventpoll], 0x7eb16a44c51c {}, 0x80, 0x7cf, null, 0x0) = 0 (0x0) (1.999216401s)
I0327 10:57:30.771140       1 strace.go:614] [   2:   3] client X futex(0x5571c0, FUTEX_WAIT|FUTEX_PRIVATE_FLAG, 0x0, 0xc00001feb8 {sec=1 nsec=999872650}, 0x0, 0x0) = 0 (0x0) errno=110 (connection timed out) (2.000334244s)
W0327 10:57:30.780225       1 subprocess_unsafe.go:130] the subprocess 38 exited (status: %!s(unix.WaitStatus=9), err %!s(<nil>))
W0327 10:57:30.780252       1 task_run.go:358] [   2:   2] Unexpected SwitchToApp error: systrap corrupted memory: subprocess died
D0327 10:57:30.780267       1 task_exit.go:215] [   2:   2] Transitioning from exit state TaskExitNone to TaskExitInitiated
D0327 10:57:30.780294       1 task_exit.go:215] [   2:   2] Transitioning from exit state TaskExitInitiated to TaskExitZombie
W0327 10:57:30.781250       1 task_run.go:358] [   2:   3] Unexpected SwitchToApp error: systrap corrupted memory: subprocess died
D0327 10:57:30.781266       1 task_exit.go:215] [   2:   3] Transitioning from exit state TaskExitNone to TaskExitInitiated
D0327 10:57:30.781292       1 task_exit.go:215] [   2:   3] Transitioning from exit state TaskExitInitiated to TaskExitZombie
D0327 10:57:30.781297       1 task_exit.go:215] [   2:   3] Transitioning from exit state TaskExitZombie to TaskExitDead
I0327 10:57:41.715145       1 watchdog.go:294] Watchdog starting loop, tasks: 5, discount: 0s
D0327 10:58:04.664761       1 sampler.go:162] Time: Adjusting syscall overhead up to 692
D0327 10:58:08.665378       1 sampler.go:191] Time: Adjusting syscall overhead down to 606
D0327 10:58:09.664941       1 sampler.go:191] Time: Adjusting syscall overhead down to 531
D0327 10:58:10.665540       1 sampler.go:191] Time: Adjusting syscall overhead down to 465
D0327 10:58:11.664765       1 sampler.go:191] Time: Adjusting syscall overhead down to 407
D0327 10:58:15.664958       1 sampler.go:191] Time: Adjusting syscall overhead down to 357
I0327 10:58:26.715574       1 watchdog.go:294] Watchdog starting loop, tasks: 5, discount: 0s
D0327 10:58:35.665620       1 sampler.go:191] Time: Adjusting syscall overhead down to 313
```
`gofer.txt` and `run.txt` do not reveal anything interesting.

FUTURE_COPYBARA_INTEGRATE_REVIEW=#11590 from xianzhe-databricks:fix-hang-with-subprocess-death e7520af
PiperOrigin-RevId: 741613636
@copybara-service copybara-service bot merged commit 7a07e27 into google:master Mar 28, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants