Skip to content

Additional test stabilization (#935): Drain Queue Before Verifying Results in AsyncBlockTests#940

Merged
jasonsandlin merged 3 commits intomainfrom
additional-test-stabilization-935
Feb 20, 2026
Merged

Additional test stabilization (#935): Drain Queue Before Verifying Results in AsyncBlockTests#940
jasonsandlin merged 3 commits intomainfrom
additional-test-stabilization-935

Conversation

@jhugard
Copy link
Copy Markdown
Collaborator

@jhugard jhugard commented Feb 20, 2026

Summary

This PR fixes intermittent test failures in AsyncBlockTests by ensuring the task queue is fully drained before verifying async opcodes. The race condition occurred because the Cleanup opcode is recorded asynchronously, and tests were taking opcode snapshots before that work completed. Fixes issue introduced in #935 as well as existing issue in other tests.

Changes

Apply consistent queue drain pattern to 8 test methods that verify async completion state. Comprehensive audit found both opcode verification races (4 tests) and queue empty verification races (4 additional tests).

Root Cause

The async framework's Cleanup operation is initiated by the provider but completed asynchronously through the task queue. Tests that checked queue state or opcode snapshots immediately after XAsyncGetStatus() could race with the pending Cleanup opcode write, resulting in:

  • Opcode snapshot race: Missing Cleanup opcode in snapshot (8 vs 9)
  • Queue empty race: Pending cleanup work leaves queue non-empty
  • Intermittent failures in CI (heisenbug-like behavior)
  • Confusing debug output showing inconsistent test results

Testing

✅ Comprehensive test audit completed:

  • Identified 8 tests vulnerable to timing races
  • Applied consistent queue drain pattern to all 8
  • Verified 2 tests use MANUAL mode (no drain needed)
  • Confirmed 13 other tests have no vulnerabilities

✅ All 23 AsyncBlockTests pass after fixes
✅ No regressions in other test suites
✅ Validated with extended soak testing (743 full test suite passes under page heap)
✅ Pattern aligns with existing drain waits in VerifyCleanupWaitsForWork and VerifyCleanupWaitsForWorkDistributed

Related Work

This is part of the async test stabilization effort:

Notes

  • The queue drain pattern (busy-wait with 10ms sleep, 2000ms timeout) is already used in VerifyCleanupWaitsForWork tests
  • No API changes or behavioral modifications to the async framework itself
  • Purely a test reliability improvement in the verification layer

Checklist

  • Code follows existing patterns in the test suite
  • All tests pass locally
  • No new warnings or errors introduced
  • Consistent with upstream test stabilization work
  • Soak testing validates fix is robust

Apply queue drain timing fix to tests that verify async opcodes.
These tests were checking opcodes immediately after async completion
without ensuring all cleanup work had been recorded to the opcode log,
causing intermittent test failures.
Refactor cdb test script to capture stacks independently, as well as
output log, stacks, and dmp for all abnormal exits (including Ctrl+C).
Apply consistent queue drain pattern to 8 AsyncBlockTests before final queue
verification to eliminate timing races where cleanup work completes asynchronously
after XAsyncGetStatus() returns.

Root Cause:
  The async framework's Cleanup operation is initiated by the provider but
  completed asynchronously through the task queue. Tests checking queue state
  or opcode snapshots immediately after XAsyncGetStatus() could race with the
  pending Cleanup work, resulting in intermittent failures (heisenbug-like
  behavior with "8 vs 9 opcodes" or "queue not empty" errors).

Solution:
  All queue verification now preceded by explicit drain loop:
    - Checks both Completion and Work ports
    - 10ms sleep granularity, 2000ms timeout
    - Ensures all async cleanup completes before verification
@jhugard jhugard marked this pull request as ready for review February 20, 2026 17:00
@jasonsandlin jasonsandlin merged commit e0be0f0 into main Feb 20, 2026
15 checks passed
@jasonsandlin jasonsandlin deleted the additional-test-stabilization-935 branch February 20, 2026 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants