Update TaskQueue with changes from Windows repo#980
Conversation
|
@brianpepin: Good catch! I took a deep look at this last night and there are still two other, similar TOCTOU races between updating the next start time atomic and setting the OS timer with Start. I'll have a PR up later today with a fix that consolidates the logic in one method (I have the code written but need to verify before pushing). |
I'd like to take credit but an automated code review AI actually caught it :) |
Follow-up to #975 and #980: the CAS-then-Start pattern for retargeting m_timerDue had a TOCTOU window where thread A could win the CAS but before calling Start(), thread B could CAS+Start an earlier deadline, which thread A's Start() would then overwrite — stranding the earlier callback until independent traffic arrived. The fix in #980 added post-Start verification in SubmitPendingCallbacks, but the same unguarded pattern existed in QueueItem and PromoteReadyPendingCallbacks. This change extracts the CAS+Start+verify logic into three helpers (ArmTimerIfEarlier, ArmNextPendingCallback, RearmObservedDueTime) so the post-Start verification is applied uniformly at every call site. ~67 lines of duplicated inline CAS logic removed.
This PR is the result of a round trip between libhttpclient and the windows codebase. There are three areas of change:
Changes
General Cleanup
I removed the HC_UNITTEST_API #ifdefs. In the Windows codebase the task queue library is built and then merged into either the runtime DLL or the unit test DLL. This means the same library needs to link to both (we don't build it twice). The code add for unit tests is small enough I think this is fine.
I also renamed SubmitPendingCallback to be plural (since now its behavior is any callback that is ready) and unwrapped the test-specific API cover fn.
CAS Race
In SubmitPendingCallbacks there was a very narrow window where the CAS is successfully exchanged and then the timer is set. There is a narrow gap here where the CAS could have been updated by another thread after this thread "wins" but before this thread changes the timer. My fix for this is to only return if the m_timerDue is still what we expect after the SetTimer call.
Process Detach
The PlayFab team and I were researching a game crash that looked like we were submitting a threadpool callback after the task queue handle was released. This turned out to be the wrong analysis. What was happening was the game was hitting some failure and calling ExitProcess while other parts of the game were still running. This ended up submitting task queue callbacks in late shutdown during process detach, and the Windows thread pool handles this by throwing an exception and crashing the process. The intent of this change is to stop masking the real problem with a process crash. This uses RtlDllShutdownInProgress, which while documented, is not exposed in any header. It does work if you just declare it and link to ntdll, but that causes a link time break for all consumers of libHttpClient since ntdll is not in the current linkage requirements. Instead, this code does a GetModuleHandle of ntdll (which is loaded into every process on Windows). We don't do this for UWP because GMH is not defined for that API set. We don't bother releasing the module handle because a) we don't want to re-acquire it on every task queue submit and b) it is never unloaded anyway, so a leaked reference doesn't matter.
Testing / Verification