Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

external commands launched from a function may not be killed when fish receives a SIGHUP #3737

Closed
krader1961 opened this issue Jan 16, 2017 · 21 comments
Assignees
Labels
bug Something that's not working as intended release notes Something that is or should be mentioned in the release notes
Milestone

Comments

@krader1961
Copy link
Contributor

krader1961 commented Jan 16, 2017

This issue is derived from issue #3644. The core complaint is that programs like ranger and fzf may not be killed if launched from a function if fish dies because it's controlling tty becomes invalid.

Notably, @Ambrevar, reports that when he attempt to kill the xterm that fails with fish writing this to the terminal:

> sh
There are still jobs active (use the jobs command to see them).
A second attempt to exit will terminate them.
test: Missing argument at index 2
> exit
<E> fish: More than one job in foreground: job 1: “test (id -u) -eq 0” job 2: “id -u”
<E> fish: Errors detected, shutting down. Break on sanity_lose() to debug.
<E> fish: More than one job in foreground: job 1: “id -u” job 2: “sh”
<E> fish: Errors detected, shutting down. Break on sanity_lose() to debug.

Note that @Ambrevar is doing exec fish from their ~/.bashrc rather than make fish their login shell. Recent PR #3658 might be relevant but note that change was made several days after issue #3644 was opened.

One known problem that is that show_stacktrace() cannot be used safely on the child side of a fork(). And we have at least one stack trace showing it being called in that situation (see the final comments on issue #3644). That should be fixed. That isn't going to be the root cause of this issue but is a contributing factor that at least makes it harder to debug.

It turns out my previous paragraph was in error. The show_stacktrace() call in question was made in the parent process which should be okay. That appears to be another manifestation of the glibc stdio bug when the fd is returning EIO.

@krader1961 krader1961 added the bug Something that's not working as intended label Jan 16, 2017
@krader1961 krader1961 added this to the fish-future milestone Jan 16, 2017
@krader1961 krader1961 self-assigned this Jan 16, 2017
krader1961 added a commit that referenced this issue Jan 16, 2017
@krader1961
Copy link
Contributor Author

@Ambrevar: None of the functions shipped by this project include the statement that fish is complaining about; i.e., test (id -u) -eq 0. Please run command grep -F 'test (id -u) -eq 0' against the fish scripts on your system. A good place to start is every file in or below ~/.config/fish. I'd like to know if that statement is in a prompt function or something else. As well as any other context you can provide regarding that statement including a copy of the script as an attachment to this issue.

Please understand that I am not claiming the fish scripts on your system are broken. Fish should never report the errors you saw which implies a bug in fish. But understanding the conditions that trigger the problem is the first, and most important, step to fixing the problem.

@krader1961
Copy link
Contributor Author

Googling fish "test (id -u) -eq 0" returned only three results. The first being an ancient fish issue already resolved. The second was a custom fish_prompt function. The third was an ancient python script written by @siteshwar.

@Ambrevar
Copy link
Contributor

Yes, test (id -u) -eq 0 comes from my prompt:

function fish_prompt
	set HOST (hostname)
	printf "%s%s%s%s%s" (set_color -o)'(' \
	(set_color normal) \
	(set_color $fish_color_user)$USER \
	(set_color normal) \
	(set_color -o)@ \
	(set_color normal) \
	(set_color $fish_color_hostname)$HOST \
	(set_color normal) \
	(set_color -o)'}'

	## Path
	set -l cwd_color $fish_color_cwd
	if test (id -u) -eq 0
		set cwd_color $fish_color_cwd_root
	end
	set PROMPT_PWD (prompt_pwd)
	printf "%s%s%s%s" \
	(set_color -o)'[' \
	(set_color -o $cwd_color)$PROMPT_PWD \
	(set_color normal) \
	(set_color -o)']'

	echo
	echo '> '
end

If I comment

	if test (id -u) -eq 0
		set cwd_color $fish_color_cwd_root
	end

Then I either get the same error message with (hostname) instead, or just

There are still jobs active (use the jobs command to see them).
A second attempt to exit will terminate them.

Pressing any key will close the terminal.

If I comment set HOST (hostname) in turn, then I just get the above 2 lines. But then pressing any keys will not close the terminal.

Same thing if I reset my .config/fish.

Command substitutions / jobs seem to have a role here.

@Ambrevar
Copy link
Contributor

Ambrevar commented Jan 16, 2017

fzf just released version 0.16 (and 0.16.1) which does not require ncurses anymore, and thus does not trigger the issue. So let's focus on ncdu to simplify the matter.

Let me repeat the requirements and steps:

  • Arch Linux
  • glibc 2.24 (stock package)
  • ncurses 6.0+ (stock package)
  • urxvt 9.22 (stock package)
  • ncdu 1.12 (but any ncurses application will do).
    Run the following function:
function ncdu-wrapper
	ncdu
	echo
	echo
	echo
	# ... I suppose you should add more echo if your machine is fast.
end

And close the terminal or send SIGTERM to fish while ncdu is running. ncdu gets reparented to pid 1 and starts spinning (i.e. hogs a full CPU).

I just found out that if you send SIGHUP to fish, you get the same behaviour as for xterm. So the difference between the 2 is that urxvt closes its children with SIGTERM while xterm does it with SIGHUP.

@krader1961
Copy link
Contributor Author

Good news. I installed Ubuntu 16.10 which has glibc 2.24 and ncurses 6.0 (plus Ubuntu changes for both packages). I was able to reproduce the problem using @Ambrevar's instructions. I did not need urxvt. I simply ssh'd into the system, ran the function which invoked ncdu, then did a kill $pid where $pid was that of the parent fish shell for the ncdu process.

It shouldn't take much effort now to determine the root cause of this behavior. Which is not the same thing as saying a fix will be easy.

@krader1961
Copy link
Contributor Author

Also, sending SIGTERM produces the problem where ncdu is reattached to pid 1 and spins. Sending SIGHUP produces a result very similar to the behavior described in the original comment of this issue.

@krader1961
Copy link
Contributor Author

Also, it isn't necessary to invoke ncdu indirectly via a function. If I run ncdu directly then send SIGTERM to the parent fish shell the same behavior occurs. To be determined is whether fish should kill child processes when it receives a SIGTERM.

@krader1961
Copy link
Contributor Author

Argh! The problem with SIGHUP is due to the change I recently introduced to warn the user about exiting when external commands are running. That warning, which also means not killing those child processes, should only be done when the shell is exiting for reasons other than having received a signal. If we're exiting because we received SIGHUP we should silently kill the child processes.

@krader1961
Copy link
Contributor Author

krader1961 commented Jan 22, 2017

I have a fix for the SIGHUP aspect of this problem. I'll merge it as soon as I've created a unit test to keep it from regressing in the future.

Bash does not kill background tasks when it receives SIGTERM. So neither should fish. On the other hand it doesn't exit when interactive and it receives SIGTERM while running another program in the foreground. Whereas fish does exit which leaves the foreground process reparented to PID 1. That is wrong.

@krader1961
Copy link
Contributor Author

Note that bash does exit when it receives SIGTERM when not interactive (e.g., bash -c 'sleep 666; echo done') and leaves the foreground process running and, of course, reparented to PID 1. Which sort of makes sense since interactive mode is fundamentally different from non-interactive mode.

krader1961 added a commit that referenced this issue Jan 24, 2017
This is a partial fix for issue #3737. It only addresses the SIGHUP
aspect of the problem. Fixing SIGTERM is TBD.
krader1961 added a commit that referenced this issue Jan 25, 2017
This is a partial fix for issue #3737. It only addresses the SIGHUP
aspect of the problem. Fixing SIGTERM is TBD.

(cherry picked from commit 31adc22)
@krader1961 krader1961 modified the milestones: fish 2.5.0, fish-future Jan 25, 2017
@krader1961
Copy link
Contributor Author

I'm closing this because the core problem, SIGHUP handling, is fixed. Changing how SIGTERM is handled should be dealt with in a separate issue since the current behavior is quite old and upon further reflect it isn't obvious we should change it.

@ridiculousfish
Copy link
Member

Thanks for doing this!

@ridiculousfish ridiculousfish added the release notes Something that is or should be mentioned in the release notes label Jan 26, 2017
@zanchey
Copy link
Member

zanchey commented Jan 29, 2017

I'm not sure what we need to say in the release notes - has the behaviour here changed since 2.4.0?

@zanchey
Copy link
Member

zanchey commented Jan 31, 2017

@ridiculousfish @krader1961 any guidance?

@krader1961
Copy link
Contributor Author

@zanchey: I think the only thing needed is to mention that attempting to exit when jobs are in the background will now result in a warning and tell you to type exit a second time if you really want to exit. This issue was really just about fixing an unexpected interaction between that change in behavior and handling SIGHUP. No reason to mention that fix in the release notes since it existed for just a couple of weeks in unreleased code.

@ridiculousfish
Copy link
Member

Did we change the behavior of sending SIGHUPs to children on shell exit?

@krader1961
Copy link
Contributor Author

krader1961 commented Jan 31, 2017

Did we change the behavior of sending SIGHUPs to children on shell exit?

Yes, issue #3155.

This issue is in response to the warning introduced by issue #3497 and PR #3658. That change (and this fix) didn't change our sending of SIGHUP to child process but did change how fish reacts to receiving SIGHUP.

@ridiculousfish
Copy link
Member

Sorry, still trying to wrap my head around this...

I think the only thing needed is to mention that attempting to exit when jobs are in the background will now result in a warning and tell you to type exit a second time if you really want to exit

Wasn't this the behavior already? What's changed here?

@zanchey
Copy link
Member

zanchey commented Feb 1, 2017

The new behaviour is that running, as well as stopped jobs, will get terminated - this was noted in the 2.5b1 release notes.

@krader1961
Copy link
Contributor Author

...this was noted in the 2.5b1 release notes.

Ah, yes it is. When I glanced at them last night I didn't see that line item. I think we're good to go.

@ridiculousfish
Copy link
Member

Cool, thanks for the clarification

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 18, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something that's not working as intended release notes Something that is or should be mentioned in the release notes
Projects
None yet
Development

No branches or pull requests

4 participants