Skip to content

Race condition in TestRunner timeout timer: this.#timeout!.reject fires against nulled #timeout under event-loop contention #90

@mjn298

Description

@mjn298

Summary

Under a saturated event loop (e.g. a parallel test runner with many CPU-bound test files sharing a worker), TestRunner emits 8–9 TypeError: Cannot read properties of undefined (reading 'reject') as unhandled errors at the tail of a run — even when every test passes. In consumers that use @japa/runner, this flips exceptionsManager.hasErrors = true and sets process.exitCode = 1, failing CI despite all green tests.

Root cause

The timer callback created in #createTimeoutTimer reads this.#timeout!.reject at fire time, not at capture time:

#createTimeoutTimer(duration: number) {
  return new Promise((_, reject) => {
    this.#timeout = {
      reject,
      timer: setTimeout(() => this.#timeout!.reject(this.#createError('Test timeout')), duration),
      //                      ^^^^^^^^^^^^^^^^^^^^ read at fire time
    }
  })
}

#clearTimer() nulls this.#timeout in the finally block of #wrapTestInTimeout:

#clearTimer() {
  if (this.#timeout) {
    clearTimeout(this.#timeout.timer)
    this.#timeout = undefined
  }
}

clearTimeout cannot un-queue a callback that Node has already dispatched to the macrotask queue. When a test resolves while the event loop is saturated, Node's timer-processing tick can pop the timer off the heap and queue its callback before the finally block's clearTimeout runs. The callback then fires against this.#timeout === undefined, and the non-null assertion (!.) becomes a runtime crash.

The same pattern exists in #resetTimer.

Reproduction conditions

  • 5 parallel worker processes (~78 test files each)
  • 2 of those files use group.each.timeout(20_000) (so a 20 s setTimeout is created for each group test)
  • Mix of CPU-bound tests (forecasting / seeding / heavy mocking) on the same worker saturates the loop
  • Consistently produces 8–9 unhandled errors per run, all on the worker carrying the timeout-using files

Running the same timeout-using files in isolation produces zero errors — the event loop is clean, clearTimeout wins the race every time. This is what narrows it to contention + queued-callback timing.

Versions verified

  • @japa/core@10.4.0 — reproduces
  • @japa/core@10.3.0 — reproduces (race code bit-identical)
  • Node v24.11.1
  • macOS (darwin 25.4.0)

Fix

Replace the non-null assertion with optional chaining in both methods:

   #createTimeoutTimer(duration: number) {
     return new Promise((_, reject) => {
       this.#timeout = {
         reject,
-        timer: setTimeout(() => this.#timeout!.reject(this.#createError('Test timeout')), duration),
+        timer: setTimeout(() => this.#timeout?.reject(this.#createError('Test timeout')), duration),
       }
     })
   }

   #resetTimer(duration: number) {
     if (this.#timeout) {
       clearTimeout(this.#timeout.timer)
-      this.#timeout.timer = setTimeout(
-        () => this.#timeout!.reject(this.#createError('Test timeout')),
-        duration
-      )
+      this.#timeout.timer = setTimeout(
+        () => this.#timeout?.reject(this.#createError('Test timeout')),
+        duration
+      )
     }
   }

Two characters per method. The callback becomes a no-op if #clearTimer has already nulled #timeout, which is the correct semantic — the test has already resolved, nobody needs to know the timer fired.

Happy to open a PR with these changes + a regression test that reproduces the race with a loaded event loop if helpful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions