test(ci): log unhandledRejection in backend test bootstrap to debug ~22% silent CI flake by JohnMcLear · Pull Request #7663 · ether/etherpad

JohnMcLear · 2026-05-03T05:42:07Z

Summary

Backend tests on develop fail silently ~22% of the time (13 of the last 60 runs). Mocha exits with code 1 mid-suite, producing no test-failure marker, no error, and no Mocha summary. Different exit points each run, mostly on Windows runners but occasionally Linux too.

Root cause discovery is blocked by src/tests/backend/common.ts:33, which rethrows unhandled Promise rejections as uncaught exceptions but never logs the reason first:

process.on('unhandledRejection', (reason: string) => { throw reason; });

When the rethrow fires between specs (e.g. a delayed DirtyDB write rejecting after a test completed), mocha exits with code 1 and the original rejection is lost — especially on Windows, where stderr is not always flushed before abrupt exit. CI logs end with ELIFECYCLE Test failed and a clean ✔ from the previous test, with no clue what rejected.

What this PR does

Pure diagnostic patch — no behavior change on success:

unhandledRejection handler now writes the reason (or stack) to stderr before rethrowing.
Adds a matching uncaughtException handler so non-Promise crashes also leave a trace.

The next CI failure will surface what is actually rejecting (DirtyDB write? plugin lifecycle? socket cleanup?), at which point we can fix the real cause.

Why this isn't using the test logger

The 'test' logger goes through log4js, which uses console layouts on top of process.stdout/stderr — same flushing problem. process.stderr.write is the most deterministic synchronous-ish path, and matters specifically on Windows.

Test plan

pnpm exec tsc --noEmit — type-checks cleanly (no new errors in `src/tree)
pnpm exec mocha --import=tsx tests/backend/specs/anonymizeIp.ts — 15/15 passing locally
After merge: wait for the next Backend tests CI failure and inspect the surfaced rejection reason; open follow-up PR to fix the real culprit

🤖 Generated with Claude Code

…bootstrap Backend tests on develop have a ~22% silent failure rate (mostly Windows, sometimes Linux) where mocha exits with code 1 mid-suite, producing no test failure marker, no error, and no Mocha summary. Different exit points each run. Root cause discovery is blocked by src/tests/backend/common.ts:33, which rethrows unhandled Promise rejections as uncaught exceptions but never logs the reason first. When the rethrow happens between specs, mocha exits with code 1 and the original rejection is lost - especially on Windows, where stderr is not always flushed before abrupt exit. This patch is purely diagnostic: it writes the reason (or stack) to stderr before rethrowing, and adds a matching uncaughtException handler for the same purpose. Behavior on success is unchanged. The next CI failure will surface what is actually rejecting (DirtyDB write? plugin lifecycle? socket cleanup?), so we can fix the real cause. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

qodo-code-review · 2026-05-03T05:42:12Z

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

qodo-free-for-open-source-projects · 2026-05-03T05:42:21Z

Review Summary by Qodo

Log unhandledRejection and uncaughtException in backend test bootstrap

🧪 Tests 🐞 Bug fix

Walkthroughs

Description

• Add diagnostic logging for unhandled Promise rejections in backend tests
• Log rejection reason/stack to stderr before rethrowing to prevent silent failures
• Add uncaughtException handler to surface uncaught errors with stack traces
• Improve CI debugging for ~22% silent test failure rate on Windows/Linux runners

Diagram

flowchart LR
  A["Unhandled Promise Rejection"] -->|"log reason to stderr"| B["process.stderr.write"]
  B -->|"then rethrow"| C["throw reason"]
  D["Uncaught Exception"] -->|"log error to stderr"| E["process.stderr.write"]
  E -->|"then exit"| F["process.exit 1"]
  C -->|"prevents silent mocha exit"| G["CI logs show rejection reason"]
  F -->|"prevents silent mocha exit"| G

File Changes

1. src/tests/backend/common.ts 🐞 Bug fix +21/-1

Add diagnostic logging for unhandled rejections and exceptions

• Enhanced unhandledRejection handler to log rejection reason/stack to stderr before rethrowing
• Changed parameter type from string to any to handle Error objects with stack traces
• Added new uncaughtException handler that logs errors to stderr and explicitly exits with code 1
• Added detailed comments explaining the diagnostic purpose and Windows stderr flushing issues

src/tests/backend/common.ts

qodo-free-for-open-source-projects · 2026-05-03T05:42:22Z

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (1)

1. No regression test for logging 📘 Rule violation ☼ Reliability

Description

This PR changes backend test bootstrap behavior for unhandledRejection/uncaughtException
handling but does not add a regression test that would fail if this logging behavior were removed.
Without a test, the diagnostic fix can be reverted or broken without detection.

Code

src/tests/backend/common.ts[R33-53]

+//
+// Log via process.stderr.write before throwing: when the rethrown rejection
+// kills mocha between specs, the runner exits with code 1 and no summary.
+// Without this, the rejection reason is lost (worst on Windows runners,
+// where stderr is not always flushed before abrupt exit) and CI shows a
+// silent ELIFECYCLE with no clue what rejected.
+process.on('unhandledRejection', (reason: any) => {
+  process.stderr.write(`[backend tests] unhandledRejection: ${
+    reason && reason.stack ? reason.stack : String(reason)}\n`);
+  throw reason;
+});
+
+// Surface uncaught exceptions for the same reason. Node's default behavior
+// (exit non-zero) is preserved by the explicit process.exit below — without
+// the handler, Node would write to stderr and exit; with the handler we have
+// to do it ourselves.
+process.on('uncaughtException', (err: any) => {
+  process.stderr.write(`[backend tests] uncaughtException: ${
+    err && err.stack ? err.stack : String(err)}\n`);
+  process.exit(1);
+});

Evidence

PR Compliance ID 2 requires bug fixes to include a regression test; the diff only adds/changes
process-level error handlers in src/tests/backend/common.ts and includes no accompanying test/spec
changes that assert the new stderr logging behavior.

src/tests/backend/common.ts[33-53]
Best Practice: Repository guidelines

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The change adds stderr logging + rethrow/exit behavior for `unhandledRejection` and `uncaughtException`, but there is no regression test ensuring these handlers log the error reason/stack. This makes the diagnostic fix easy to regress without noticing.

## Issue Context
Compliance requires bug fixes to include a regression test that would fail if the fix were reverted.

## Fix Focus Areas
- src/tests/backend/common.ts[33-53]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. Exit bypasses server cleanup 🐞 Bug ☼ Reliability

Description

src/tests/backend/common.ts now calls process.exit(1) in an uncaughtException handler, which
can interrupt src/node/server.ts's own uncaughtException handler that invokes the async
exports.exit() cleanup/logging path. This can truncate shutdown diagnostics (for example,
metrics-at-fatal-error logging) and prevent graceful teardown during CI flakes.

Code

src/tests/backend/common.ts[R49-53]

+process.on('uncaughtException', (err: any) => {
+  process.stderr.write(`[backend tests] uncaughtException: ${
+    err && err.stack ? err.stack : String(err)}\n`);
+  process.exit(1);
+});

Evidence
The backend test bootstrap starts the Etherpad server (server.start()), and the server startup
registers its own uncaughtException handler that logs and calls exports.exit(err) for cleanup.
With the new bootstrap handler exiting the process immediately, the server’s async exit path may not
get a chance to run or finish, reducing diagnostics and skipping cleanup work.
src/tests/backend/common.ts[11-13]
src/tests/backend/common.ts[87-105]
src/tests/backend/common.ts[49-53]
src/node/server.ts[139-155]
src/node/server.ts[247-307]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`src/tests/backend/common.ts` installs an `uncaughtException` handler that always calls `process.exit(1)`. Because the Etherpad server (`src/node/server.ts`) also installs an `uncaughtException` handler that runs async `exports.exit()` cleanup/logging, the unconditional `process.exit(1)` can cut off that path.

### Issue Context
- Backend tests call `server.start()` from `common.init()`.
- `server.start()` registers its own `uncaughtException` handler and uses `exports.exit(err)` to log diagnostics (including metrics) and shut down gracefully.

### Fix Focus Areas
- src/tests/backend/common.ts[39-53]
- src/node/server.ts[139-155]
- src/node/server.ts[247-307]

### Suggested change
Adjust the `uncaughtException` handler in `common.ts` to *not* always `process.exit(1)`.

One practical pattern for tests is:
- Always write the diagnostic line to stderr.
- If there are other `uncaughtException` listeners (for example, the server’s), **do not** call `process.exit(1)` and let the other handler manage shutdown.
- Only force `process.exit(1)` when this handler is effectively the only one (to preserve default “exit non-zero” behavior for early bootstrap failures).

(Implement with a named handler function so it can compare `process.rawListeners('uncaughtException')` / `process.listenerCount('uncaughtException')` and decide whether to force-exit.)

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. Duplicate failure logging 🐞 Bug ◔ Observability

Description

The unhandledRejection handler logs and then throws, which will immediately trigger the
uncaughtException handler and log the same underlying failure a second time. This can add noise
and make CI logs harder to scan when diagnosing flakes.

Code

src/tests/backend/common.ts[R39-53]

+process.on('unhandledRejection', (reason: any) => {
+  process.stderr.write(`[backend tests] unhandledRejection: ${
+    reason && reason.stack ? reason.stack : String(reason)}\n`);
+  throw reason;
+});
+
+// Surface uncaught exceptions for the same reason. Node's default behavior
+// (exit non-zero) is preserved by the explicit process.exit below — without
+// the handler, Node would write to stderr and exit; with the handler we have
+// to do it ourselves.
+process.on('uncaughtException', (err: any) => {
+  process.stderr.write(`[backend tests] uncaughtException: ${
+    err && err.stack ? err.stack : String(err)}\n`);
+  process.exit(1);
+});

Evidence

Both handlers are registered in src/tests/backend/common.ts, and the unhandledRejection handler
explicitly throws the rejection reason, which results in an uncaught exception in the same process
and therefore triggers the uncaughtException handler too.

src/tests/backend/common.ts[39-53]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Unhandled rejections will be logged twice: once in `unhandledRejection` and again in `uncaughtException` after the rethrow.

### Issue Context
This is test-only diagnostic code; the goal is to keep the *first* useful reason/stack without clutter.

### Fix Focus Areas
- src/tests/backend/common.ts[39-53]

### Suggested change
Add a small guard to avoid double-printing for the specific “rethrow from unhandledRejection” path (for example, a module-level boolean set just before throwing, and checked/reset in `uncaughtException`, or a WeakSet/marker on Error objects where possible).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

…p to #7663) (#7665) * test(ci): stronger diagnostics for silent backend-test exit PR #7663 added unhandledRejection / uncaughtException handlers in common.ts. The next failure after merge (run 25279692065 - Windows without plugins, Node 24) showed mocha exiting with code 1 mid-suite 261ms after the last passing test, with NEITHER handler firing. So something more drastic is killing the process - SIGKILL, OOM, fatal native error - or mocha itself called process.exit before the JS handlers in common.ts could run. Two issues with the previous attempt: 1. Handlers in common.ts only register when a spec imports common.ts. Only 27 of 47 specs do. If a non-common spec triggers the death, handlers may never have been registered. 2. process.stderr.write is asynchronous on Windows when stderr is piped (which it is under GitHub Actions). On a hard kill the buffered line never reaches the runner log. This patch: - Moves diagnostic handlers to a dedicated tests/backend/diagnostics.ts loaded via mocha --require, so they register at startup before any spec runs. - Uses fs.writeSync(2, ...) for synchronous stderr writes that the kernel completes before returning - the line lands in the log even if the process is killed milliseconds later. - Adds beforeExit / exit / signal handlers so we can discriminate the exit mechanism: clean drain vs process.exit vs SIGKILL vs signal. - Tracks last-seen test via mocha root afterEach hook so the death point is visible in the log. The next CI failure should print enough context to identify the cause, after which we can fix the real bug and drop this file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(diagnostics): exit(1) on uncaughtException so fatal errors fail fast Qodo flagged on PR #7665: the uncaughtException handler in tests/backend/diagnostics.ts only logged and returned. Once a handler is registered, Node no longer exits on its own. Specs that don't import tests/backend/common.ts (20 of 47) have only this handler — so a fatal error would have been swallowed and tests would limp along instead of failing fast. Mirror common.ts and call process.exit(1) after logging. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

JohnMcLear merged commit a0b85dd into ether:develop May 3, 2026
15 of 16 checks passed

JohnMcLear mentioned this pull request May 3, 2026

test(ci): stronger diagnostics for silent backend-test exit (follow-up to #7663) #7665

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(ci): log unhandledRejection in backend test bootstrap to debug ~22% silent CI flake#7663

test(ci): log unhandledRejection in backend test bootstrap to debug ~22% silent CI flake#7663
JohnMcLear merged 1 commit intoether:developfrom
JohnMcLear:fix/ci-backend-test-flake-diagnostics

JohnMcLear commented May 3, 2026

Uh oh!

qodo-code-review Bot commented May 3, 2026

Uh oh!

qodo-free-for-open-source-projects Bot commented May 3, 2026

Uh oh!

qodo-free-for-open-source-projects Bot commented May 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

JohnMcLear commented May 3, 2026

Summary

What this PR does

Why this isn't using the test logger

Test plan

Uh oh!

qodo-code-review Bot commented May 3, 2026

Uh oh!

qodo-free-for-open-source-projects Bot commented May 3, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-free-for-open-source-projects Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

qodo-free-for-open-source-projects Bot commented May 3, 2026 •

edited

Loading