-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os: (*Cmd).Wait hangs on macOS when setctty is used after commit b15c399a3 #61779
Comments
I'm sure I'm doing something dumb but I can't find the sources for the If the cause is https://go.dev/cl/420334 then my first guess would be that Darwin kqueue does not permit adding a pty. But I don't understand what the pty is being used for. It looks like the subcommand stdin, stdout, and stderr will all be pipes. |
@ianlancetaylor that testdata file is added in the PR https://github.com/rogpeppe/go-internal/pull/172/files, so you can find it at the branch https://github.com/FiloSottile/go-internal/tree/filippo/pty. |
But I don't understand what the pty is being used for. It looks like the subcommand stdin, stdout, and stderr will all be pipes.
Yes, fds 0, 1, and 2 will be pipes, but the ctty, passed as an extra fd and accessed as /dev/tty by the subcommand, will be a pty.
(This is testing the situation where stdin and stdout/err are redirected, but the command can use the tty to prompt the user.)
|
I don't fully understand what is happening. It's clear that macOS does not permit opening Before CL 420334 we would first try to add the descriptor to kqueue, and if that succeeded put the descriptor into non-blocking mode. After that CL we would first put the descriptor into non-blocking mode, and then try to add it to kqueue. We made that change because of #54100, in which we could sometimes add a descriptor to kqueue and then fail to make it non-blocking, which is bad. So now we first set the descriptor to non-blocking, and then try to add it to kqueue. If we fail to add it to kqueue, we put the descriptor back into blocking mode. That is the sequence that happens for macOS with It is also possible that the problem arises because we put /dev/ptmx or the /dev/ttysNNN device into non-blocking mode. I have not been able to write my own test case for this, although I can see that the test described above fails. I'm not sure what is different, other than that I am using the internal/testpty package from the standard library. I have a simple fix that fixes this issue. Maybe we should just go with it. I'm not sure. |
Change https://go.dev/cl/517555 mentions this issue: |
I don't have access to the hardware, but darwin has a test suite which sets the returned I'd love to see some system call traces showing whether the pair of opened ptys in the parent fail to be added to kqueue or it is the opening of FreeBSD man page on tty devices has this section regarding
Maybe the child OSX process needs to set This does not explain the blocking of _exit of course. |
The next time someone sees a failure mode matching this issue, could you paste a goroutine dump from the timed-out process directly into the GitHub issue? The CI links from the original post seem to have expired. |
This is still an issue in Go 1.22. https://github.com/FiloSottile/age/tree/tmp/61779 has tests that fail on macOS.
|
Does anyone want to pick up https://go.dev/cl/517555 and complete it? |
Has it been tested on various iterations of Apple Silicon? I've observed some rare instances of different behaviour on M1 Max vs M2 Pro with the same macOS version (and stock SSH client), but the code is using several layers of abstraction over this low-level aspect and I don't have direct access to those computers. I just checked that test, which @FiloSottile posted (
This disproves my theory with differences between M1 and M2 as I checked it on same CPUs that were giving me original issues with SSH. However, something interesting came up - Apple changed something XNU which aligned PTY behaviour with Linux. Nothing posted in release notes, but those kind of internals would only be visible once they push code to XNU OSS repo (I believe they do so after RC). If you have any specific code and dumps - I can help :) |
A test that involves setting the command's TTY with the code below, and then calling
(*Cmd).Wait
, started timing out on macOS in Go 1.20.I bisected it down to b15c399.
You can see the failing tests in the CI of rogpeppe/go-internal#172 (https://github.com/rogpeppe/go-internal/actions/runs/5005273594?pr=172) or in age's tests (https://github.com/FiloSottile/age/actions/runs/5633140339/job/15261730363).
/cc @paulzhol @ianlancetaylor @dmgk #54100
The text was updated successfully, but these errors were encountered: