-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WSL "Could not send process" bug when redirecting and piping #4235
Comments
If the file being (e.g. this issue is not triggered by |
When this error is experienced: |
I'm not sure if this happens on all platforms, but under WSL with the existing codebase, processes in the job chain that pipe their stdout/stderr to the next process in the job could terminate before the next job started (on fast enough machines for quick enough jobs). This caused issues like fish-shell#4235 and possibly fish-shell#3952, at least for external commands. What was happening is that the first process was finishing before the second process was fully set up. fish would then try to assign (in both the child and the parent) the process group id belonging to the process group leader to the new process; and if the first process had already terminated, it would have ended its process group with it as well before that happened. I'm not sure if there was already a mechanism in place for ensuring that a process remains running at least as long as it takes for the next process in the chain to join its group, etc., but if that code was there, it wasn't working in my test setup (WSL). This patch definitely needs some review; I'm not sure how I should handle non-external commands (and external commands executed via posix_spawn). I don't know if they are affected by the race condition in the first place, but when I tried to add the same "wait for next command in chain to run before unblocking" that would cause black screens requiring ctrl+c to bypass. The "unblock previous command" code was originally run by the next child to be forked, but was then moved to the shell code instead, making it more-centrally located and less error-prone. Note that additional headers may be required for the mmap system call on other platforms.
I'm not sure if this happens on all platforms, but under WSL with the existing codebase, processes in the job chain that pipe their stdout/stderr to the next process in the job could terminate before the next job started (on fast enough machines for quick enough jobs). This caused issues like #4235 and possibly #3952, at least for external commands. What was happening is that the first process was finishing before the second process was fully set up. fish would then try to assign (in both the child and the parent) the process group id belonging to the process group leader to the new process; and if the first process had already terminated, it would have ended its process group with it as well before that happened. I'm not sure if there was already a mechanism in place for ensuring that a process remains running at least as long as it takes for the next process in the chain to join its group, etc., but if that code was there, it wasn't working in my test setup (WSL). This patch definitely needs some review; I'm not sure how I should handle non-external commands (and external commands executed via posix_spawn). I don't know if they are affected by the race condition in the first place, but when I tried to add the same "wait for next command in chain to run before unblocking" that would cause black screens requiring ctrl+c to bypass. The "unblock previous command" code was originally run by the next child to be forked, but was then moved to the shell code instead, making it more-centrally located and less error-prone. Note that additional headers may be required for the mmap system call on other platforms.
PR#4268 that should fix this has been merged. |
I'm hoping to get this addressed in WSL, filed microsoft/WSL#2786 with a reduced test case. |
fish, version 2.6.0-208-gc577d01
, similar to #3454 and possibly related to the issues in #3952I'm able to reproduce this issue almost every time under the latest final release of Windows 10 (15063.483), filed as an issue with the WSL team here: microsoft/WSL#1210
It seems that there is a race condition in fish somewhere when both piping and redirecting, but I'm unable to reproduce this issue when fish is running under strace.
Here's an example command that always fails:
If you look carefully, you can see the text is being interlaced with itself, indicating a synchronization issue. But that may just be a separate bug where fish errors are not synchronized to the console output alongside the regular console output.
I've been using fish under WSL for over a year, and have experienced this from the start across many versions of Windows 10/WSL and fish. I've never experienced similar errors under bash, fwiw.
Attempting to debug this by running
strace -f -o fish.log fish -c "cat src/structs.rs | less 2>/dev/null"
results in no issues whatsoever, so I'm not able to pin down what's happening. I'll probably try my hand at debugging this by adding a bunch of debug prints to the source, but I'm wondering if anyone has any ideas.Thanks.
EDIT
I should add that the console hangs thereafter and the terminal window must be forcefully exited.
The text was updated successfully, but these errors were encountered: