Skip to content

Data Race in AsyncBlockTests::VerifyAsyncBlockReuse #938

@jhugard

Description

@jhugard

Summary

Access violation crash in AsyncBlockTests::VerifyAsyncBlockReuse caused by concurrent
std::vector::push_back operations from overlapping async call lifecycle phases.

Environment

  • Platform: Windows x64
  • Configuration: Debug build with page heap (gflags /p /enable)
  • Test Framework: TAEF (Test Authoring and Execution Framework)
  • Detection: 6-hour soak test under Windows CDB debugger

Reproduction

Frequency: Heisenbug - intermittent crash after extended stress testing
Trigger: Rapid async block reuse with shared FactorialCallData context

Test Case

AsyncBlockTests::VerifyAsyncBlockReuse - Tests XAsyncBlock reuse scenario where:

  1. First async call completes with result 120 (factorial of 5)
  2. Same XAsyncBlock and FactorialCallData immediately reused for second call
  3. Second call completes with result 720 (factorial of 6)

Race Condition Window

When XAsyncGetStatus(&async, true) returns for the first call, the main thread proceeds
to start the second async operation while the first call's cleanup is still executing in
the completion callback thread.

Thread 1 (Completion):

CompletionCallback
  → AsyncState::Release
    → AsyncState::~AsyncState
      → provider(XAsyncOp::Cleanup)
        → FactorialWorkerSimple(Cleanup)
          → opCodes.push_back(XAsyncOp::Cleanup)  ← RACE

Thread 2 (Main/Test):

VerifyAsyncBlockReuse
  → FactorialAsync (second call)
    → XAsyncBegin
      → provider(XAsyncOp::Begin)
        → FactorialWorkerSimple(Begin)
          → opCodes.push_back(XAsyncOp::Begin)  ← RACE

Crash Details

Stack Trace

# 17  Id: 3914.22c4 Suspend: 1 Teb: 000000b8`cec33000 Unfrozen
Child-SP          RetAddr               Call Site
000000b8`cf7ff0b0 00007ffa`a5024208     ucrtbased!_free_dbg+0x2e
000000b8`cf7ff0f0 00007ffa`a5023788     libHttpClient_UnitTest_TAEF!operator delete+0x18
000000b8`cf7ff120 00007ffa`a4f363e9     libHttpClient_UnitTest_TAEF!operator delete+0x18
000000b8`cf7ff150 00007ffa`a4f8e6df     libHttpClient_UnitTest_TAEF!std::_Deallocate<16>+0x39
000000b8`cf7ff180 00007ffa`a4f8c868     libHttpClient_UnitTest_TAEF!std::allocator<XAsyncOp>::deallocate+0x8f
000000b8`cf7ff1c0 00007ffa`a4f806c6     libHttpClient_UnitTest_TAEF!std::vector<XAsyncOp>::_Change_array+0xb8
000000b8`cf7ff220 00007ffa`a4f80393     libHttpClient_UnitTest_TAEF!std::vector<XAsyncOp>::_Emplace_reallocate+0x296
000000b8`cf7ff330 00007ffa`a4f8e9ee     libHttpClient_UnitTest_TAEF!std::vector<XAsyncOp>::_Emplace_one_at_back+0x83
000000b8`cf7ff380 00007ffa`a4f852a1     libHttpClient_UnitTest_TAEF!std::vector<XAsyncOp>::push_back+0x1e
000000b8`cf7ff3b0 00007ffa`a4f60237     libHttpClient_UnitTest_TAEF!AsyncBlockTests::FactorialWorkerSimple+0x51
000000b8`cf7ff430 00007ffa`a4f60468     libHttpClient_UnitTest_TAEF!AsyncState::~AsyncState+0x67

Debugger Analysis

EXCEPTION_CODE: c0000005 (Access violation)
READ_ADDRESS: 000001c01c088fdc
FAULTING_SOURCE_LINE: minkernel\crts\ucrt\src\appcrt\heap\debug_heap.cpp:1026
FAILURE_BUCKET_ID: INVALID_POINTER_READ_AVRF_c0000005_ucrtbased.dll!_free_dbg

Root Cause

The FactorialCallData structure contains:

std::vector<XAsyncOp> opCodes;  // Shared, unsynchronized

Both FactorialWorkerSimple and FactorialWorkerDistributed record opcodes via:

d->opCodes.push_back(opCode);  // NOT thread-safe

Concurrent push_back during vector reallocation corrupts the heap allocator metadata,
leading to crash in subsequent _free_dbg call.

Impact

  • Severity: Test flakiness under stress conditions
  • Scope: Test code only, no production impact
  • Workaround: None (intermittent failure)

Proposed Fix

Replace std::vector<XAsyncOp> with lock-free fixed-capacity buffer using
std::array<XAsyncOp, N> + std::atomic<size_t> counter. This eliminates:

  • Dynamic allocation/reallocation
  • Need for mutex synchronization
  • Race condition in concurrent append operations

Maintains test semantics while aligning with library philosophy of avoiding sync primitives.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions