Skip to content

Commit

Permalink
proc: fix race between proxy calls and process termination
Browse files Browse the repository at this point in the history
When a ustack() or similar thing is done, DTrace's main thread grabs the
process and makes a proxy call into its process control thread.  Now that
waitfd() is gone this involves dodging a race via arming and firing a timer
that hammers the process control thread with a dedicated realtime signal.
Unfortunately, the process can die at any point, and proxy_call includes
two potentially high-latency points (around the actual proxy call, and
around the call to get the return value) at which point the process might
have terminated and the timer been freed. Everything else that far down the
proxy_call checks dpr->dpr_done to avoid this causing trouble, but the timer
disarm does not.  Fix this.

(Spotted via valgrind causing its usual massive slowdown and widening this
race until it was wide enough for the already-deleted state of the timer to
be detectable.)

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com>
  • Loading branch information
nickalcock authored and kvanhees committed Mar 6, 2024
1 parent 2c72eff commit 09725ac
Showing 1 changed file with 7 additions and 1 deletion.
8 changes: 7 additions & 1 deletion libdtrace/dt_proc.c
Original file line number Diff line number Diff line change
Expand Up @@ -652,8 +652,14 @@ proxy_call(dt_proc_t *dpr, long (*proxy_rq)(), int exec_retry)
* dt_proc_waitpid_lock() so that the signal stops as soon as the
* waitpid() is done: but if the control thread was not waiting at
* waitpid() at all, we'll want to disarm it regardless.
*
* From this point on, a substantial delay may have happened, so we need
* to consider that the process may have terminated, in which case dpr
* will still be allocated but most other things will be freed (like the
* timer).
*/
if (timer_settime(dpr->dpr_proxy_timer, 0, &nonpinger, NULL) < 0)
if (!dpr->dpr_done &&
timer_settime(dpr->dpr_proxy_timer, 0, &nonpinger, NULL) < 0)
dt_proc_error(dpr->dpr_hdl, dpr,
"Cannot disarm fallback wakeup timer: %s\n",
strerror(errno));
Expand Down

0 comments on commit 09725ac

Please sign in to comment.