New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only waitpid() for processes that we care about #8737

Merged
merged 1 commit into from Jul 13, 2018

Conversation

Projects
None yet
4 participants
@dimbleby
Contributor

dimbleby commented Jul 12, 2018

Fixes #8104.

It seems that when running in an appimage there are bonus SIGCHLDs flying around. I haven't figured out the exact steps that lead to hangs - but it seems likely that if we're catching SIGCHLD that we don't really need, then someone else is not catching one that they do...

Anyway, fix is tested manually and the neovim appimage doesn't hang any more.

@dimbleby

This comment has been minimized.

Contributor

dimbleby commented Jul 12, 2018

}
proc->internal_exit_cb(proc);
break;

This comment has been minimized.

@oni-link

oni-link Jul 13, 2018

Contributor

What if three or more children are terminated? Would the first signal be handled here, the second signal be marked pending and every further (of the same) signal be dropped (we miss a child termination)? Would that be fixed if we remove the break here?

This comment has been minimized.

@dimbleby

dimbleby Jul 13, 2018

Contributor

I expect we get a SIGCHLD for each terminated child, and therefore enter this function once for each.

At any rate my intention was to preserve the existing behaviour here - if there is a bug of the sort that you describe, then it ought to be handled separately I think.

This comment has been minimized.

@jamessan

jamessan Jul 13, 2018

Member

Although in both the original code and this PR we only process one job, I can understand oni-link's concern that we may be handling the wrong job because we end up reaping a different child than the one that delivered the signal.

In order to truly be deterministic, though, we would likely need to use sigaction() so we can be directly told which child delivered the signal. Since we're using libuv's abstraction, that's not really an option here.

I wonder if a simpler fix is to change the original code to use waitpid(0, &stat, WNOHANG);. It looks like that should still fix the problem. If that does work, we probably want to actually loop until waitpid() tells us none of our children are waitable.

This comment has been minimized.

@dimbleby

dimbleby Jul 13, 2018

Contributor

If I understand correctly the concern is not that we handle jobs in the 'wrong' order - that really shouldn't matter. (A dies, B dies, we handle B, we handle A - so what? B might just as well have died first).

Rather the worry was that we might somehow miss terminations. I don't think that'll happen, though I'm not 100% sure and I can't see either how removing the break would be unsafe. (But I still say, if you want to do that, let's do it in a new issue).

Either way, I'd favour sticking with being completely explicit about what PIDs we're waiting for as the most foolproof approach.

This comment has been minimized.

@dimbleby

dimbleby Jul 13, 2018

Contributor

I've done some more googling and I now think that @oni-link is right all along - SIGCHLDs are not queued, so we might see fewer of them than terminated processes.

I'll raise a new issue and propose that the fix is simply to remove the break

@justinmk

This comment has been minimized.

Member

justinmk commented Jul 13, 2018

@dimbleby This is a huge help, thanks for investigating. Could you please include the full explanation from AppImage/AppImageKit#812 (comment) in the commit message, including the link to https://patchwork.kernel.org/patch/9949491/ ?

Only waitpid() for processes that we care about
It seems as though in an AppImage there's an extra child process that
dies at some early point, before we have set up a SIGCHLD handler.  So
when we later get a SIGCHLD from a child that we do care about,
waitpid(-1, ...) tells us about the extra child - and we don't notice
that the interesting child has exited.

Or something like that!

See also:

* https://patchwork.kernel.org/patch/9949491/ in which perf hit
something similar
* discussion at the AppImage repository:
AppImage/AppImageKit#812 (comment).

Fix is to be explicit about which process we are waitpid()'ing for, so
we never need be distracted by children that we don't know about.

Fixes #8104
@dimbleby

This comment has been minimized.

Contributor

dimbleby commented Jul 13, 2018

re-pushed with a bigger comment

@jamessan jamessan merged commit 4874214 into neovim:master Jul 13, 2018

1 of 3 checks passed

QuickBuild Build pr-8737 finished with status FAILED
Details
continuous-integration/travis-ci/pr The Travis CI build failed
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details

@dimbleby dimbleby deleted the dimbleby:overly-general-waitpid branch Jul 14, 2018

justinmk added a commit that referenced this pull request Jul 18, 2018

NVIM v0.3.1
FEATURES:
07499a8 #8709 man.vim: C highlighting for EXAMPLES section
07f82ad #8699 TUI: urxvt: also send xterm focus-reporting seqs
40911e4 #8616 API: emit nvim_buf_lines_event from :terminal
c46997a #8546 fillchars: Add "eob" flag

FIXES:
74d19f6 #8576 startup: avoid blank stdin buffer if other files were opened
4874214 #8737 Only waitpid() for processes that we care about
cd6e7e8 #8743 Check all child processes for exit in SIGCHLD handler
c230ef2 #8746 channel.c: Prevent channel_destroy_early() from freeing uninitialized rpc stuff
0ed8b12 #8681 transstr_buf: fix length comparison
d241f27 #8708 TUI: Fix standout mode
9afed40 #8698 man.vim: fix for mandoc
e889640 #8682 provider/node: npm --loglevel silent
1cbc830 #8613 API: nvim_win_set_cursor: set curswant
bf6048e #8628 checkhealth: Python: fix VIRTUAL_ENV check
3cc3506 #8528 checkhealth: node.js: also search yarn

CHANGES:
b751449 #8619 defaults: shortmess+=F
1248178 #8578 highlight: high-priority CursorLine if fg is set.
01570f1 #8726 terminal: handle &confirm and :confirm on unloading
56065bb #8721 screen: truncate showmode messages
bf2460e #7551 buffer: fix copying :setlocal options
c1c14fa #8520 Ex mode: always "improved" (gQ)
050f397 #7992 options: remove 'maxcombine` option (always 6)

INTERNAL:
463da84 #7992 screen: use UTF-8 representation

UtkarshMe added a commit to UtkarshMe/neovim that referenced this pull request Jul 28, 2018

NVIM v0.3.1
FEATURES:
07499a8 neovim#8709 man.vim: C highlighting for EXAMPLES section
07f82ad neovim#8699 TUI: urxvt: also send xterm focus-reporting seqs
40911e4 neovim#8616 API: emit nvim_buf_lines_event from :terminal
c46997a neovim#8546 fillchars: Add "eob" flag

FIXES:
74d19f6 neovim#8576 startup: avoid blank stdin buffer if other files were opened
4874214 neovim#8737 Only waitpid() for processes that we care about
cd6e7e8 neovim#8743 Check all child processes for exit in SIGCHLD handler
c230ef2 neovim#8746 channel.c: Prevent channel_destroy_early() from freeing uninitialized rpc stuff
0ed8b12 neovim#8681 transstr_buf: fix length comparison
d241f27 neovim#8708 TUI: Fix standout mode
9afed40 neovim#8698 man.vim: fix for mandoc
e889640 neovim#8682 provider/node: npm --loglevel silent
1cbc830 neovim#8613 API: nvim_win_set_cursor: set curswant
bf6048e neovim#8628 checkhealth: Python: fix VIRTUAL_ENV check
3cc3506 neovim#8528 checkhealth: node.js: also search yarn

CHANGES:
b751449 neovim#8619 defaults: shortmess+=F
1248178 neovim#8578 highlight: high-priority CursorLine if fg is set.
01570f1 neovim#8726 terminal: handle &confirm and :confirm on unloading
56065bb neovim#8721 screen: truncate showmode messages
bf2460e neovim#7551 buffer: fix copying :setlocal options
c1c14fa neovim#8520 Ex mode: always "improved" (gQ)
050f397 neovim#7992 options: remove 'maxcombine` option (always 6)

INTERNAL:
463da84 neovim#7992 screen: use UTF-8 representation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment