New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

`read | cat` crashes fish #4540

Closed
JohnMH opened this Issue Nov 17, 2017 · 12 comments

Comments

Projects
None yet
8 participants
@JohnMH
Copy link

JohnMH commented Nov 17, 2017

fish 2.3.1
operating system: Irrelevant, running under Linux kernel (Also tested on FreeBSD with unknown fish version)
term: xterm-256color

@JohnMH

This comment has been minimized.

Copy link

JohnMH commented Nov 17, 2017

Sometimes this also closes the terminal emulator.

@zanchey zanchey added the bug label Nov 18, 2017

@zanchey

This comment has been minimized.

Copy link
Member

zanchey commented Nov 18, 2017

It sure does, even with 2.7b1 - in a subshell the terminal hangs, in it's own window the emulator just closes!

@floam

This comment has been minimized.

Copy link
Member

floam commented Nov 19, 2017

Doesn't happen here (macOS), for some reason.

@gyroninja

This comment has been minimized.

Copy link

gyroninja commented Nov 27, 2017

Running fish in gdb results with:
Thread 1 fish received signal SIGTTIN, Stopped (tty input). 0x00007ffff776e15d in read () from /lib/libpthread.so.0

From the glibc documentation on SIGTTIN:

A process cannot read from the user’s terminal while it is running as a background job. When any process in a background job tries to read from the terminal, all of the processes in the job are sent a SIGTTIN signal. The default action for this signal is to stop the process. For more information about how this interacts with the terminal driver, see Access to the Terminal.

@ridiculousfish

This comment has been minimized.

Copy link
Member

ridiculousfish commented Nov 27, 2017

I can't repro the crash but I can repro the hang. Attaching in the debugger, fish just gets SIGTTIN repeatedly. I wonder if this broken when fish stopped blocking signals (#3691).

@ridiculousfish ridiculousfish added this to the fish-3.0 milestone Nov 27, 2017

@Asternitix

This comment has been minimized.

Copy link

Asternitix commented Nov 29, 2017

I can repro the hang as well but the text in the terminal goes black but still works.

@gyroninja

This comment has been minimized.

Copy link

gyroninja commented Nov 29, 2017

I'm using xterm 330 and fish 2.7.0.
The terminal does not seem to be crashing, but rather closing due to an error:
https://github.com/joejulian/xterm/blob/master/ptydata.c#L213
Essentially, the read call is failing with a errno of EIO. From the man page:

EIO I/O error. This will happen for example when the process is in a background process group, tries to read from its controlling terminal, and either it is ignoring or blocking SIGTTIN or its process group is orphaned.
It may also occur when there is a low-level I/O error while reading from a disk or tape.

@faho

This comment has been minimized.

Copy link
Member

faho commented Dec 26, 2017

So, I've checked this a bit, and this is the issue as I understand it:

For certain cases, fish creates a "keepalive" process. This is used to prevent reuse of the foreground pgroup leader PID, which would grant that new process access to the terminal. (Which in itself is an edgecase's edgecase, but theoretically more sound, so okay)

However, the check for a keepalive process is this:

    if (j->get_flag(JOB_CONTROL) && !exec_error) {
        for (const process_ptr_t &p : j->processes) {
            if (p->type != EXTERNAL && (!p->is_last_in_job || !p->is_first_in_job)) {
                needs_keepalive = true;
                break;
            }
        }
    }

So, if the job contains any process that isn't "EXTERNAL" (e.g. a builtin) and that process isn't both the first and last in the job (which would itself mean the job has only one process), create a keepalive process.

This keepalive process then becomes the foreground pgroup leader via set_child_group(j, keepalive.pid).

But in this case - read is executed in the fish process, so the pgroup that reads isn't the foreground pgroup. So it gets SIGTTIN/EIO/whatever, and that presents as this issue.

As I understand it, the process that should be allowed to read() from the terminal here is the first in the pipe - the read. Which is fish itself. So we don't actually need set_child_group, and we don't actually need the keepalive process (since fish itself survives).

And the process that should be allowed to write() to the terminal should be the cat. However IIUC the cat is actually started later since we buffer the read output (I can't actually see a running cat process here), so we don't need to explicitly do anything more.

TBD is the actual conditions that a keepalive process is necessary.

And now I'm down the job control rabbit hole again...

faho added a commit to faho/fish-shell that referenced this issue Jan 10, 2018

Don't start keepalive process if the first or last process is internal
Currently, fish spawns a keepalive process when a non-external
process (a block/builtin/function) is inside a pipeline.

This keepalive process is then made the process group leader, which
theoretically stops the process group from being coopted by PID
reuse (which I'm still not sure can actually happen).

But if a non-external process is first or last in the pipeline, then
fish itself needs to read/write to the terminal, so it should remain
pgroup leader, and therefore the keepalive process is not needed.

Fixes fish-shell#4540.
@mqudsi

This comment has been minimized.

Copy link
Contributor

mqudsi commented Feb 9, 2018

The minimal read | cat test case appears to be fixed with the recent changes:

image

But I'd have to try on a non-WSL machine too, for sure.

@faho

This comment has been minimized.

Copy link
Member

faho commented Feb 10, 2018

Definitely not fixed on Linux.

@JohnMH

This comment has been minimized.

Copy link

JohnMH commented Feb 11, 2018

@mqudsi I'd be willing to bet that's more due to some issue with signals on Windows.

@mqudsi

This comment has been minimized.

Copy link
Contributor

mqudsi commented Mar 5, 2018

@JohnMH it was actually due to the extraneous setpgid calls. The changes introduced to fix #4778 cause WSL to similarly hang now.

@s417-lama s417-lama referenced this issue Mar 25, 2018

Merged

wait for processes by their name #4849

1 of 3 tasks complete

ridiculousfish added a commit to ridiculousfish/fish-shell that referenced this issue Aug 5, 2018

Acquire tty if interactive when running builtins
When running a builtin, if we are an interactive shell and stdin is a tty,
then acquire ownership of the terminal via tcgetpgrp() before running the
builtin, and set it back after.

Fixes fish-shell#4540

ridiculousfish added a commit that referenced this issue Aug 12, 2018

Acquire tty if interactive when running builtins
When running a builtin, if we are an interactive shell and stdin is a tty,
then acquire ownership of the terminal via tcgetpgrp() before running the
builtin, and set it back after.

Fixes #4540
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment