-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
job-control: avoid kill race #8273
Conversation
src/nvim/event/process.c
Outdated
@@ -228,29 +228,26 @@ void process_stop(Process *proc) FUNC_ATTR_NONNULL_ALL | |||
} | |||
|
|||
Loop *loop = proc->loop; | |||
if (!loop->children_stop_requests++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small cleanup: I removed loop->children_stop_requests
because it seems useless. It will always be 0 here, because process_stop()
is guarded by proc->stopped_time
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think I get it now: loop->children_stop_requests
is "global" for all processes, and we only want one children_kill_timer
to reap all processes.
But the libuv doc for uv_timer_start
says:
If the timer is already active, it is simply updated.
so it's harmless to just call uv_timer_start
every time a process-stop is attempted.
static void children_kill_cb(uv_timer_t *handle) | ||
{ | ||
Loop *loop = handle->loop->data; | ||
uint64_t now = os_hrtime(); | ||
|
||
kl_iter(WatcherPtr, loop->children, current) { | ||
Process *proc = (*current)->data; | ||
if (!proc->stopped_time) { | ||
bool exited = (proc->status >= 0); | ||
if (exited || !proc->stopped_time) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attempt to avoid the race in #8269
runtime/doc/eval.txt
Outdated
nvim process. The process will not get killed | ||
when nvim exits. If the process dies before | ||
nvim exits, "on_exit" will still be invoked. | ||
detach : (non-pty only) Detach the job process, so it will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would focus on "The process will not get killed when nvim exits." because that is the main effect this option now has regardless of platform.
Looks sane to me on a quick glance |
src/nvim/event/process.c
Outdated
&& !--loop->children_stop_requests) { | ||
// Stop the timer if no more stop requests are pending | ||
DLOG("Stopping process kill timer"); | ||
ILOG("exited: pid=%d status=%d stoptime=%" PRId64, proc->pid, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
proc->stopped_time
is an uint64_t, so PRIu64
instead of PRId64
?
src/nvim/event/process.c
Outdated
DLOG("Stopping process kill timer"); | ||
ILOG("exited: pid=%d status=%d stoptime=%" PRId64, proc->pid, | ||
proc->status, proc->stopped_time); | ||
if (proc->stopped_time) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we already have a process that is waiting to be killed, but the next ending process stops the kill-timer now? Then the first process would be waiting until another process starts the kill timer.
284dfc1
to
b123f98
Compare
@oni-link The last commit changes the timer to be non-repeating, and |
src/nvim/event/process.c
Outdated
uv_timer_start(&loop->children_kill_timer, children_kill_cb, | ||
KILL_TIMEOUT_MS, KILL_TIMEOUT_MS); | ||
uv_timer_start(&proc->loop->children_kill_timer, children_kill_cb, | ||
KILL_TIMEOUT_MS, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the process loop here also the main loop? If so all processes use the same timer and a sequence of stopped processes could reset the timer without ever being killed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the process loop here also the main loop? If so all processes use the same timer
Yes, that's how it has always been.
Based on libuv docs and my testing, each uv_timer_start call resets the timer. (So the minimum window is 2s, with unlimited "maximum".)
And the timer callback always iterates the processes.
So I think this is fine?
src/nvim/event/process.c
Outdated
continue; | ||
} | ||
uint64_t elapsed = (now - proc->stopped_time) / 1000000 + 1; | ||
|
||
if (elapsed >= KILL_TIMEOUT_MS) { | ||
int sig = proc->type == kProcessTypePty && elapsed < KILL_TIMEOUT_MS * 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this code be changed so that always a SIGKILL
is send?
- Calling
process_stop()
for a pty that ignoresSIGTERM
would probably never on its own reach an elapsed time of2*KILL_TIMEOUT_MS
. Thepty
would have to wait until another process is stoped so that the kill timer is restarted. - If the timer fires too early no signal could be send and the process would again had to wait for another stopped process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is not uncommon for there to be special handling of SIGHUP
, e.g. to reload config or show progress. SIGTERM
, on the other hand, is unlikely to have an overloaded meaning. Therefore issuing SIGTERM
still gives an opportunity for graceful shutdown before the sledgehammer of SIGKILL
.
@jamessan Ok, |
@justinmk, in what situation can a |
@oni-link good call :) I was thinking of the situation where "elapsed time" was used as the guard. But now that's removed. |
It serves no purpose because process_stop() is already guarded by `proc->stopped_time`.
children_kill_cb() is racey. One obvious problem is that process_close_handles() is *queued* by on_process_exit(), so when children_kill_cb() is invoked, the dead process might still be in the `loop->children` list. If the OS already reclaimed the dead PID, Nvim may try to SIGKILL it. Avoid that by checking `proc->status`. Vim doesn't have this problem because it doesn't attempt to kill processes that ignored SIGTERM after a timeout. closes neovim#8269
Before f31c26f the timer was used to try SIGTERM *and* SIGKILL, so a repeating timer was needed. After f31c26f process_stop() sends SIGTERM immediately, and the timer only sends SIGKILL. So we don't need a repeating timer. - Simplifies the logic: don't need to call uv_timer_stop() explicitly. - Avoids a problem: if process_stop() is called more than once in the 2-second window, the first on_process_exit() would call uv_timer_stop() which stops the timer for all stopped processes.
1. Don't check elapsed time in children_kill_cb(), it's already implied by the start-time of the timer itself. 2. Restart timer from children_kill_cb() for PTY jobs, to send SIGKILL after SIGTERM. There is an edge case where SIGKILL might follow SIGTERM too quickly, if jobstop() is called near the 2-second timer window. But this edge case is not worth code complication.
FEATURES: 3cc7ebf #7234 built-in VimL expression parser 6a7c904 #4419 implement <Cmd> key to invoke command in any mode b836328 #7679 'startup: treat stdin as text instead of commands' 58b210e :digraphs : highlight with hl-SpecialKey #2690 7a13611 #8276 'startup: Let `-s -` read from stdin' 1e71978 events: VimSuspend, VimResume #8280 1e7d5e8 #6272 'stdpath()' f96d99a #8247 server: introduce --listen e8c39f7 #8226 insert-mode: interpret unmapped META as ESC 98e7112 msg: do not scroll entire screen (#8088) f72630b #8055 let negative 'writedelay' show all redraws 5d2dd2e win: has("wsl") on Windows Subsystem for Linux #7330 a4f6cec cmdline: CmdlineEnter and CmdlineLeave autocommands (#7422) 207b7ca #6844 channels: support buffered output and bytes sockets/stdio API: f85cbea #7917 API: buffer updates 418abfc #6743 API: list information about all channels/jobs. 36b2e3f #8375 API: nvim_get_commands 273d2cd #8329 API: Make nvim_set_option() update `:verbose set …` 8d40b36 #8371 API: more reliable/descriptive VimL errors ebb1acb #8353 API: nvim_call_dict_function 9f994bb #8004 API: nvim_list_uis 3405704 #7520 API/UI: forward option updates to UIs 911b1e4 #7821 API: improve nvim_command_output WINDOWS OS: 9cefd83 #8084, #8516 build/win: support MSVC ee4e1fd win: Fix reading content from stdin (#8267) TUI: ffb8904 #8309 TUI: add support for mouse release events in urxvt 8d5a46e #8081 TUI: implement "standout" attribute 6071637 TUI: support TERM=konsole-256color 67848c0 #7653 TUI: report TUI info with -V3 ('verbose' >= 3) 3d0ee17 TUI/rxvt: enable focus-reporting d109f56 #7640 TUI: 'term' option: reflect effective terminal behavior FIXES: ed6a113 #8273 'job-control: avoid kill-timer race' 4e02f1a #8107 'jobs: separate process-group' 451c48a terminal: flush vterm output buffer on pty output #8486 5d6732f :checkhealth fixes #8335 53f11dc #8218 'Fix errors reported by PVS' d05712f inccommand: pause :terminal redraws (#8307) 51af911 inccommand: do not execute trailing commands #8256 84359a4 terminal: resize to the max dimensions (#8249) d49c1dd #8228 Make vim_fgets() return the same values as in Vim 60e96a4 screen: winhl=Normal:Background should not override syntax (#8093) 0c59ac1 #5908 'shada: Also save numbered marks' ba87a2c cscope: ignore EINTR while reading the prompt (#8079) b1412dc #7971 ':terminal Enter/Leave should not increment jumplist' 3a5721e TUI: libtermkey: force CSI driver for mouse input #7948 6ff13d7 #7720 TUI: faster startup 1c6e956 #7862 TUI: fix resize-related segfaults a58c909 #7676 TUI: always hide cursor when flushing, never flush buffers during unibilium output 303e1df #7624 TUI: disable BCE almost always 249bdb0 #7761 mark: Make sure that jumplist item will not have zero lnum 6f41ce0 #7704 macOS: Set $LANG based on the system locale a043899 #7633 'Retry fgets on EINTR' CHANGES: ad60927 #8304 default to 'nofsync' f3f1970 #8035 defaults: 'fillchars' a6052c7 #7984 defaults: sidescroll=1 b69fa86 #7888 defaults: enable cscopeverbose 7c4bb23 defaults: do :filetype stuff unless explicitly "off" 2aa308c #5658 'Apply :lmap in macros' 8ce6393 terminal: Leave 'relativenumber' alone (#8360) e46534b #4486 refactor: Remove maxmem, maxmemtot options 131aad9 win: defaults: 'shellcmdflag', 'shellxquote' #7343 c57d315 #8031 jobwait(): return -2 on interrupt also with timeout 6452831 clipboard: macOS: fallback to tmux if pbcopy is broken #7940 300d365 #7919 Make 'langnoremap' apply directly after a map ada1956 #7880 'lua/executor: Remove lightuserdata' INTERNAL: de0a954 #7806 internal statistics for list impl dee78a4 #7708 rewrite internal list impl
see #8269
@oni-link @bfredl @jamessan @blueyed sanity-check appreciated.
children_kill_cb() is racey. One obvious problem is that process_close_handles() is queued by on_process_exit(), so when children_kill_cb() is invoked, the dead process might still be in the
loop->children
list. If the OS already reclaimed the dead PID, Nvim may try to SIGKILL it.Avoid that by checking
proc->status
.Vim doesn't have this problem because it doesn't attempt to kill processes that ignored SIGTERM after a timeout. (Future note: maybe we should not attempt this, too ...)