Skip to content

Linux STL XTaskQueue wait timer teardown can race in-flight delayed callback dispatch #974

@jhugard

Description

@jhugard

Summary

The Linux STL wait-timer backend allows delayed callback teardown to race the
worker thread after a due timer entry has already been popped from the heap.
The implementation stores raw WaitTimerImpl* pointers in the timer heap,
unlocks the queue, and then invokes the callback through that raw pointer.

If teardown destroyed the underlying timer object in that window, the worker
could invoke the callback through a freed timer object. In the current
implementation, WaitTimer::Terminate() funnels through Cancel() and then
destroys the impl, but Cancel() can no longer neutralize an entry that has
already been popped from the heap.

This is a lifetime and teardown correctness bug. The external symptom is
typically a crash or other undefined behavior during queue teardown,
especially when delayed callbacks run on immediate or otherwise synchronous
dispatch paths.

Public API affected

STDAPI XTaskQueueSubmitDelayedCallback(
    _In_ XTaskQueueHandle queue,
    _In_ XTaskQueuePort port,
    _In_ uint32_t delayMs,
    _In_opt_ void* callbackContext,
    _In_ XTaskQueueCallback* callback);

STDAPI XTaskQueueTerminate(
    _In_ XTaskQueueHandle queue,
    _In_ bool wait,
    _In_opt_ void* callbackContext,
    _In_opt_ XTaskQueueCallback* callback);

No signature change is required. The issue is an internal lifetime race in the
Linux STL wait-timer backend used to service delayed callbacks.

Expected behavior

Once delayed callback teardown begins, the timer backend must either:

  • prevent the callback from being dispatched at all, or
  • wait for any already-started dispatch to quiesce before destroying the timer
    object and its callback context.

At no point should the worker thread retain a raw timer pointer that can outlive
the owning timer object.

Actual behavior

The current STL backend allows the following race:

  1. The worker thread popped a due timer entry containing a raw
    WaitTimerImpl*.
  2. The worker released the queue lock before invoking the callback.
  3. Another thread began teardown; in the current implementation, WaitTimer::Terminate() calls Cancel() and then destroys the timer object.
  4. The worker resumed and invoked the callback through the stale raw pointer.

This could produce use-after-free behavior during delayed callback dispatch.
The risk is highest when callback execution can synchronously trigger teardown,
such as immediate-port usage or callback-driven queue destruction.

Reproduction conditions

The race is timing-sensitive and becomes easier to trigger under:

  • Linux or other STL wait-timer builds that use Source/Task/WaitTimer_stl.cpp
  • delayed callbacks whose execution path can synchronously terminate or destroy
    the owning queue or timer
  • immediate-port or other dispatch configurations where callback teardown can
    happen on or near the timer worker thread
  • workloads with frequent queue creation and teardown while delayed callbacks are
    pending or due

Impact

  • Can crash the process or produce other undefined behavior during delayed
    callback teardown.
  • Violates the expected lifetime rules for internal delayed callback dispatch.
  • Is difficult to isolate because it depends on a narrow interleaving between
    worker-thread pop/unlock and timer teardown.

Affected area

The root cause is in the Linux STL timer backend implementation:

  • Source/Task/WaitTimer_stl.cpp — raw timer pointer lifetime, worker-thread
    dispatch after unlock, and teardown ordering

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions