Skip to content

Update TaskQueue with changes from Windows repo#980

Merged
brianpepin merged 4 commits into
mainfrom
user/brianpe/fromwindows
May 19, 2026
Merged

Update TaskQueue with changes from Windows repo#980
brianpepin merged 4 commits into
mainfrom
user/brianpe/fromwindows

Conversation

@brianpepin
Copy link
Copy Markdown
Contributor

This PR is the result of a round trip between libhttpclient and the windows codebase. There are three areas of change:

  1. General cleanup / "fit" into windows codebase (no behavioral changes)
  2. Address very minor race condition during CAS operation
  3. Eat task queue submits during process detach

Changes

General Cleanup

I removed the HC_UNITTEST_API #ifdefs. In the Windows codebase the task queue library is built and then merged into either the runtime DLL or the unit test DLL. This means the same library needs to link to both (we don't build it twice). The code add for unit tests is small enough I think this is fine.

I also renamed SubmitPendingCallback to be plural (since now its behavior is any callback that is ready) and unwrapped the test-specific API cover fn.

CAS Race

In SubmitPendingCallbacks there was a very narrow window where the CAS is successfully exchanged and then the timer is set. There is a narrow gap here where the CAS could have been updated by another thread after this thread "wins" but before this thread changes the timer. My fix for this is to only return if the m_timerDue is still what we expect after the SetTimer call.

Process Detach

The PlayFab team and I were researching a game crash that looked like we were submitting a threadpool callback after the task queue handle was released. This turned out to be the wrong analysis. What was happening was the game was hitting some failure and calling ExitProcess while other parts of the game were still running. This ended up submitting task queue callbacks in late shutdown during process detach, and the Windows thread pool handles this by throwing an exception and crashing the process. The intent of this change is to stop masking the real problem with a process crash. This uses RtlDllShutdownInProgress, which while documented, is not exposed in any header. It does work if you just declare it and link to ntdll, but that causes a link time break for all consumers of libHttpClient since ntdll is not in the current linkage requirements. Instead, this code does a GetModuleHandle of ntdll (which is loaded into every process on Windows). We don't do this for UWP because GMH is not defined for that API set. We don't bother releasing the module handle because a) we don't want to re-acquire it on every task queue submit and b) it is never unloaded anyway, so a leaked reference doesn't matter.

Testing / Verification

  • All unit tests pass both in libHttpClient and Windows repos.
  • Verified xCode builds.
  • Deployed runtime to Xbox console and verified RtlDllShutdownInProgress code works correctly in that environment.

@brianpepin brianpepin merged commit 91bbfaa into main May 19, 2026
15 checks passed
@jhugard
Copy link
Copy Markdown
Collaborator

jhugard commented May 20, 2026

@brianpepin: Good catch! I took a deep look at this last night and there are still two other, similar TOCTOU races between updating the next start time atomic and setting the OS timer with Start. I'll have a PR up later today with a fix that consolidates the logic in one method (I have the code written but need to verify before pushing).

@brianpepin
Copy link
Copy Markdown
Contributor Author

@brianpepin: Good catch! I took a deep look at this last night and there are still two other, similar TOCTOU races between updating the next start time atomic and setting the OS timer with Start. I'll have a PR up later today with a fix that consolidates the logic in one method (I have the code written but need to verify before pushing).

I'd like to take credit but an automated code review AI actually caught it :)

jhugard added a commit that referenced this pull request May 21, 2026
Follow-up to #975 and #980: the CAS-then-Start pattern for retargeting
m_timerDue had a TOCTOU window where thread A could win the CAS but
before calling Start(), thread B could CAS+Start an earlier deadline,
which thread A's Start() would then overwrite — stranding the earlier
callback until independent traffic arrived.

The fix in #980 added post-Start verification in SubmitPendingCallbacks,
but the same unguarded pattern existed in QueueItem and
PromoteReadyPendingCallbacks.

This change extracts the CAS+Start+verify logic into three helpers
(ArmTimerIfEarlier, ArmNextPendingCallback, RearmObservedDueTime) so
the post-Start verification is applied uniformly at every call site.
~67 lines of duplicated inline CAS logic removed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants