Skip to content

test(driver): wait for wasm db sleep recovery#4905

Closed
NathanFlurry wants to merge 1 commit intographite-base/4905from
driver-test-complaints/wait-wasm-db-sleep-recovery
Closed

test(driver): wait for wasm db sleep recovery#4905
NathanFlurry wants to merge 1 commit intographite-base/4905from
driver-test-complaints/wait-wasm-db-sleep-recovery

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented May 3, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 3, 2026

Review: test(driver): wait for wasm db sleep recovery

PR #4905 — Draft PR improving WASM sleep recovery reliability in driver tests for SQLite DB.


Overview

This PR makes two key changes to actor-db.test.ts:

  1. Replaces the misuse of vi.waitFor (as a retry-until-success loop) with a cleaner retryActorStoppingDbError helper that uses explicit deadline-based polling.
  2. Adds lifecycle observer wiring to the integrity check test so the code waits for the actual sleep event before attempting integrity checks, rather than triggering sleep and immediately polling.

The second change is the right architectural direction. The first is an improvement in clarity but still a polling approach.


What's Good

  • Removing vi.waitFor misuse: The old code used vi.waitFor as a retry-until-success mask, which CLAUDE.md explicitly prohibits. Replacing it with an honest retryActorStoppingDbError loop with a clear deadline is more honest about what's happening.

  • Event-driven sleep detection in the integrity test: The integrity check test now sets up a lifecycle observer and waits for the actual sleep event before calling expectIntegrityCheckOk. This is the right approach — event-ordered coordination instead of racing a fixed wait.

  • Fixed dead code in old expectIntegrityCheckOk: The original catch block re-threw all non-stopping errors unchanged, making the entire catch block for those errors a no-op. The new version cleanly removes that dead path.

  • Fixed unused _driverTestConfig parameters: Both runWithActorStoppingRetry and expectIntegrityCheckOk previously declared the parameter as _driverTestConfig (unused). Now that waitFor(driverTestConfig, 100) is called inside the retry loop, the parameter is actually used. Without this, the retry would have broken in fake-timer test paths.


Issues and Suggestions

1. retryActorStoppingDbError is still a polling loop — consider a comment explaining necessity

Per CLAUDE.md: "When a test flakes, root-cause the race, fix the underlying ordering in core/napi/typescript." The actor stopping: database accessed window exists because the driver doesn't surface a "ready again" event. This retry loop is a pragmatic necessity while that signal is absent, but it should have a comment:

// The driver does not expose a "ready" signal after leaving the stopping window.
// Poll until the actor accepts DB access again, or until the deadline expires.
async function retryActorStoppingDbError<T>(

A follow-up todo would also be worth capturing: add a "ready after sleep" lifecycle event so this retry can be replaced with event-driven coordination.

2. WASM_REMOTE_SLEEP_RECOVERY_TIMEOUT_MS = 30_000 needs a comment

30 seconds vs the native ~1.2 seconds (SLEEP_WAIT_MS * 8) is a 25x difference. This should explain why WASM remote SQLite needs this much more time:

// WASM remote SQLite sleep recovery involves a round-trip to the remote depot,
// which is significantly slower than the native in-process SQLite path.
const WASM_REMOTE_SLEEP_RECOVERY_TIMEOUT_MS = 30_000;

3. await waitFor(driverTestConfig, SLEEP_WAIT_MS) after the sleep event (line ~280)

After waitForDbLifecycleEvent(..., "sleep", ...) returns, there's a fixed 150ms wait before reading countAfterWake. This suggests the actor is in the stopping window after the sleep event fires. Is the sleep event emitted before or after the actor finishes its stop sequence? If it's before, this wait is a race. A comment (or ideally waiting for a "woke" lifecycle event if one exists) would clarify this. The same pattern appears at lines ~343, 357, 577, 593.

4. waitForDbLifecycleEvent still uses a fixed iteration count

for (let i = 0; i < LIFECYCLE_POLL_ATTEMPTS; i++) {  // 40 * 25ms = 1 second max

This is fine for native but may be insufficient for WASM where sleep recovery takes up to 30 seconds. Currently the WASM path would still hit this 1-second timeout when waiting for the initial sleep event in waitForDbLifecycleEvent. Should LIFECYCLE_POLL_ATTEMPTS be scaled by runtime, or should waitForDbLifecycleEvent take a timeout parameter similar to retryActorStoppingDbError?


Minor

  • The vi import removal is correct since no vi.* calls remain.
  • runWithActorStoppingRetry is now a thin wrapper over retryActorStoppingDbError with a hardcoded SLEEP_WAIT_MS * 4 timeout. Consider a named ACTOR_STOPPING_RETRY_TIMEOUT_MS constant for clarity.

Summary

The direction is right: event-driven sleep observation for the integrity test, honest retry semantics instead of vi.waitFor abuse. The main gaps before merging from draft are: (1) a comment on why 30 seconds for WASM, (2) a comment on why the retry loop exists rather than an event, and (3) verifying that waitForDbLifecycleEvent's fixed iteration count is sufficient for the WASM path.

@NathanFlurry NathanFlurry force-pushed the driver-test-complaints/restore-serverful-hibernating-requests branch from 3444072 to 4ade2d1 Compare May 3, 2026 07:12
@NathanFlurry NathanFlurry force-pushed the driver-test-complaints/wait-wasm-db-sleep-recovery branch from a6ba640 to 014ac10 Compare May 3, 2026 07:12
@NathanFlurry NathanFlurry marked this pull request as draft May 3, 2026 07:19
@NathanFlurry NathanFlurry force-pushed the driver-test-complaints/wait-wasm-db-sleep-recovery branch from 014ac10 to 1899baf Compare May 3, 2026 07:34
@NathanFlurry NathanFlurry force-pushed the driver-test-complaints/restore-serverful-hibernating-requests branch from 4ade2d1 to 4931c2e Compare May 3, 2026 07:34
This was referenced May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant