Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pty.spawn'd sh shell doesn't exit via 'exit' or CTRL-D (^D) #9333

Closed
thundergolfer opened this issue Sep 1, 2023 · 7 comments · Fixed by #9979
Closed

pty.spawn'd sh shell doesn't exit via 'exit' or CTRL-D (^D) #9333

thundergolfer opened this issue Sep 1, 2023 · 7 comments · Fixed by #9979
Assignees
Labels
type: bug Something isn't working

Comments

@thundergolfer
Copy link
Contributor

thundergolfer commented Sep 1, 2023

Description

On a Linux host, the following terminal program can be exited by using the exit program or using the Ctrl + d control signal:

python3 -c "import pty; pty.spawn('sh')"

In Docker when using the runc runtime, the same is possible:

docker run -it python:3.9.9-slim-bullseye python3 -c "import pty; pty.spawn('sh')"

But in gVisor:

docker run -it --runtime=runsc python:3.9.9-slim-bullseye python3 -c "import pty; pty.spawn('sh')"

On issuing exit it hangs:

docker run -it --runtime=runsc python:3.9.9-slim-bullseye python3 -c "import pty; pty.spawn('sh')"
# exit

and Ctrl + d does nothing.

Steps to reproduce

1. Install gvisor

ARCH="x86_64"  # all workers are x86
URL="https://storage.googleapis.com/gvisor/releases/master/latest/${ARCH}"
wget -nc "${URL}/runsc" "${URL}/runsc.sha512"
chmod +x runsc
cp runsc /usr/local/bin/runsc
sudo /usr/local/bin/runsc install
sudo systemctl reload docker

2. Setup gvisor config in Docker config

/etc/docker/daemon.json

{
    "features": {
        "buildkit": true
    },
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        },
        "runsc": {
            "path": "/usr/local/bin/runsc"
	    "runtimeArgs": ["-debug-log=/tmp/runsc/", "-debug", "-strace"]
        }
    }
}

Run sudo systemctl reload docker

  1. Run the reproducing Docker command

docker run -it --runtime=runsc python:3.9.9-slim-bullseye python3 -c "import pty; pty.spawn('sh')"

Then try to exit the shell.

  1. Cleanup

Kill the container with docker kill $(docker ps | grep bulls | awk '{print $1;}')

runsc version

`runsc version release-20230717.0-12-g0244c8c19fb7
spec: 1.1.0-rc.1`

docker version (if using docker)

Client: Docker Engine - Community
 Version:           23.0.1
 API version:       1.42
 Go version:        go1.19.5
 Git commit:        a5ee5b1
 Built:             Thu Feb  9 19:46:56 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          23.0.1
  API version:      1.42 (minimum version 1.12)
  Go version:       go1.19.5
  Git commit:       bc3805a
  Built:            Thu Feb  9 19:46:56 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.18
  GitCommit:        2456e983eb9e37e47538f59ea18f2043c9a73640
 runc:
  Version:          1.1.4
  GitCommit:        v1.1.4-0-g5fd4c4d
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

uname

Linux ip-172-31-70-51 5.15.0-1039-aws #44~20.04.1-Ubuntu SMP Thu Jun 22 12:21:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

kubectl (if using Kubernetes)

No response

repo state (if built from source)

No response

runsc debug logs (if available)

runsc.log.20230901-231408.518416.delete.txt

runsc.log.20230901-231408.550463.delete.txt

runsc.log.20230901-231329.146610.create.txt
runsc.log.20230901-231329.180715.gofer.txt
runsc.log.20230901-231329.182201.boot.txt
runsc.log.20230901-231329.462248.start.txt
runsc.log.20230901-231408.386346.kill.txt

@thundergolfer thundergolfer added the type: bug Something isn't working label Sep 1, 2023
@thundergolfer
Copy link
Contributor Author

I think the failure to exit has something to do with an ignored SIGCHILD. I can see this in the debug logs:

D0901 23:13:30.747875  529435 task_signals.go:443] [   1:   1] Discarding ignored signal 17

and if I modify the Python program to handle SIGCHILD I can get the shell to exit using the exit program.

import os
import pty
import signal


def handler(signum, frame):
    print('Signal handler called with signal', signum)
    os.waitpid(-1, 0)
    raise KeyboardInterrupt()

# gVisor is ignoring the SIGCHILD 'Discarding ignored signal 17'
signal.signal(signal.SIGCHLD, handler)

try:
    pty.spawn("sh")
except KeyboardInterrupt:
    print("child exited")
$ docker run -it --runtime=runsc python:3.9.9-slim-bullseye python3 -c "$(cat minimal_exit.py)"
# exit
Signal handler called with signal 17
                                    child exited
$

@thundergolfer
Copy link
Contributor Author

The issue with CTRL-D handling appears to be that inputQueueRead in line_discipline.go cannot send EOF (return 0, nil) to the replicaFD when a CTRL-D is sent and the buffer is empty.

@milantracy
Copy link
Contributor

Thanks for digging into the issue! Do you have a ready patch for a fix? I am happy to review and merge the PR

thundergolfer added a commit to modal-labs/modal-client that referenced this issue Sep 4, 2023
@thundergolfer
Copy link
Contributor Author

I've opened #9336 which attempts to address the CTRL-D issue, but haven't yet looked at the signal issue.

Copy link

github-actions bot commented Jan 3, 2024

A friendly reminder that this issue had no activity for 120 days.

@github-actions github-actions bot added the stale-issue This issue has not been updated in 120 days. label Jan 3, 2024
@ayushr2 ayushr2 removed the stale-issue This issue has not been updated in 120 days. label Jan 3, 2024
@thecodingwizard
Copy link
Contributor

thecodingwizard commented Feb 1, 2024

I can reproduce this with

import pty
pty.spawn(["python3", "-c", "print('hello')"])

As well as

import os
from pty import _copy

pid, fd = os.forkpty()
if pid == 0:
    try:
        os.setsid()
    except OSError:
        # os.forkpty() already set us session leader
        pass
    os.execlp("python3", "python3", "-c", "print('hi')")

_copy(fd)

print("I will never run on gvisor")

os.close(fd)

print(os.waitpid(pid, 0)[1])

I think the root cause may be that _copy calls select() system call, and that system call never returns when the child process exits. I made a new issue for that here: #9951

@ayushr2
Copy link
Collaborator

ayushr2 commented Feb 5, 2024

Whoops looks like this one fell through the cracks. Thanks for deduping this with the other issue @thecodingwizard. #9979 should fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants