WSL "Could not send process" bug when redirecting and piping #4235

mqudsi · 2017-07-22T23:24:03Z

fish, version 2.6.0-208-gc577d01, similar to #3454 and possibly related to the issues in #3952

I'm able to reproduce this issue almost every time under the latest final release of Windows 10 (15063.483), filed as an issue with the WSL team here: microsoft/WSL#1210

It seems that there is a race condition in fish somewhere when both piping and redirecting, but I'm unable to reproduce this issue when fish is running under strace.

Here's an example command that always fails:

mqudsi@ZBook /m/c/U/M/g/static-compress> cat src/structs.rs  | less 2>/dev/null
Could not send process <340W, '>less ' in job f1i, 'scat src/structs.rs  | less 2>/dev/nullh' from group :278  to group C339o
usetpgid: Operation not permitted
ldCould not send job  1n ('ocat src/structs.rs  | less 2>/dev/nullt') to foreground
stcsetpgrp: Operation not permitted
end job 1 ('cat src/structs.rs  | less 2>/dev/null') to foreground
tcsetpgrp: Operation not permitted
Missing filename ("less --help" for help)
mqudsi@ZBook /m/c/U/M/g/static-compress>

If you look carefully, you can see the text is being interlaced with itself, indicating a synchronization issue. But that may just be a separate bug where fish errors are not synchronized to the console output alongside the regular console output.

I've been using fish under WSL for over a year, and have experienced this from the start across many versions of Windows 10/WSL and fish. I've never experienced similar errors under bash, fwiw.

Attempting to debug this by running strace -f -o fish.log fish -c "cat src/structs.rs | less 2>/dev/null" results in no issues whatsoever, so I'm not able to pin down what's happening. I'll probably try my hand at debugging this by adding a bunch of debug prints to the source, but I'm wondering if anyone has any ideas.

Thanks.

EDIT

I should add that the console hangs thereafter and the terminal window must be forcefully exited.

The text was updated successfully, but these errors were encountered:

mqudsi · 2017-07-25T17:11:32Z

If the file being catd is larger, this error does not happen. More indication that it is a race condition.

(e.g. this issue is not triggered by cat /usr/share/dict/words | less 2>/dev/null)

mqudsi · 2017-07-25T20:04:02Z

When this error is experienced: set_child_group is being called for the second command in the chain, the first command has already exited. Attempting to add the second command to the first commands process group (which doesn't exist at that point) then fails.

I'm not sure if this happens on all platforms, but under WSL with the existing codebase, processes in the job chain that pipe their stdout/stderr to the next process in the job could terminate before the next job started (on fast enough machines for quick enough jobs). This caused issues like fish-shell#4235 and possibly fish-shell#3952, at least for external commands. What was happening is that the first process was finishing before the second process was fully set up. fish would then try to assign (in both the child and the parent) the process group id belonging to the process group leader to the new process; and if the first process had already terminated, it would have ended its process group with it as well before that happened. I'm not sure if there was already a mechanism in place for ensuring that a process remains running at least as long as it takes for the next process in the chain to join its group, etc., but if that code was there, it wasn't working in my test setup (WSL). This patch definitely needs some review; I'm not sure how I should handle non-external commands (and external commands executed via posix_spawn). I don't know if they are affected by the race condition in the first place, but when I tried to add the same "wait for next command in chain to run before unblocking" that would cause black screens requiring ctrl+c to bypass. The "unblock previous command" code was originally run by the next child to be forked, but was then moved to the shell code instead, making it more-centrally located and less error-prone. Note that additional headers may be required for the mmap system call on other platforms.

I'm not sure if this happens on all platforms, but under WSL with the existing codebase, processes in the job chain that pipe their stdout/stderr to the next process in the job could terminate before the next job started (on fast enough machines for quick enough jobs). This caused issues like #4235 and possibly #3952, at least for external commands. What was happening is that the first process was finishing before the second process was fully set up. fish would then try to assign (in both the child and the parent) the process group id belonging to the process group leader to the new process; and if the first process had already terminated, it would have ended its process group with it as well before that happened. I'm not sure if there was already a mechanism in place for ensuring that a process remains running at least as long as it takes for the next process in the chain to join its group, etc., but if that code was there, it wasn't working in my test setup (WSL). This patch definitely needs some review; I'm not sure how I should handle non-external commands (and external commands executed via posix_spawn). I don't know if they are affected by the race condition in the first place, but when I tried to add the same "wait for next command in chain to run before unblocking" that would cause black screens requiring ctrl+c to bypass. The "unblock previous command" code was originally run by the next child to be forked, but was then moved to the shell code instead, making it more-centrally located and less error-prone. Note that additional headers may be required for the mmap system call on other platforms.

krader1961 · 2017-08-06T23:08:33Z

PR#4268 that should fix this has been merged.

ridiculousfish · 2017-12-24T23:25:15Z

I'm hoping to get this addressed in WSL, filed microsoft/WSL#2786 with a reduced test case.

krader1961 added bug Something that's not working as intended windows subsystem for linux labels Jul 23, 2017

krader1961 added this to the fish-future milestone Jul 23, 2017

mqudsi mentioned this issue Jul 25, 2017

Need help adding synchronization to debug and debug_safe #4245

Closed

This was referenced Jul 25, 2017

Fixes a race condition in output chaining in job control #4246

Closed

[WIP] Overhaul of job control process synchronization #4268

Closed

krader1961 closed this as completed Aug 6, 2017

krader1961 modified the milestones: fish-3.0, fish-future Aug 6, 2017

krader1961 assigned mqudsi Aug 6, 2017

ridiculousfish mentioned this issue Dec 24, 2017

setpgid() fails with EPERM if pgid is exited but not reaped (it should succeed) microsoft/WSL#2786

Closed

matchai mentioned this issue Jan 12, 2019

Random errors with battery section upon opening terminal matchai/spacefish#96

Closed

github-actions bot locked as resolved and limited conversation to collaborators Apr 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WSL "Could not send process" bug when redirecting and piping #4235

WSL "Could not send process" bug when redirecting and piping #4235

mqudsi commented Jul 22, 2017 •

edited

Loading

mqudsi commented Jul 25, 2017 •

edited

Loading

mqudsi commented Jul 25, 2017 •

edited

Loading

krader1961 commented Aug 6, 2017

ridiculousfish commented Dec 24, 2017 •

edited

Loading

WSL "Could not send process" bug when redirecting and piping #4235

WSL "Could not send process" bug when redirecting and piping #4235

Comments

mqudsi commented Jul 22, 2017 • edited Loading

mqudsi commented Jul 25, 2017 • edited Loading

mqudsi commented Jul 25, 2017 • edited Loading

krader1961 commented Aug 6, 2017

ridiculousfish commented Dec 24, 2017 • edited Loading

mqudsi commented Jul 22, 2017 •

edited

Loading

mqudsi commented Jul 25, 2017 •

edited

Loading

mqudsi commented Jul 25, 2017 •

edited

Loading

ridiculousfish commented Dec 24, 2017 •

edited

Loading