Skip to content

test(ci): log unhandledRejection in backend test bootstrap to debug ~22% silent CI flake#7663

Merged
JohnMcLear merged 1 commit intoether:developfrom
JohnMcLear:fix/ci-backend-test-flake-diagnostics
May 3, 2026
Merged

test(ci): log unhandledRejection in backend test bootstrap to debug ~22% silent CI flake#7663
JohnMcLear merged 1 commit intoether:developfrom
JohnMcLear:fix/ci-backend-test-flake-diagnostics

Conversation

@JohnMcLear
Copy link
Copy Markdown
Member

Summary

Backend tests on develop fail silently ~22% of the time (13 of the last 60 runs). Mocha exits with code 1 mid-suite, producing no test-failure marker, no error, and no Mocha summary. Different exit points each run, mostly on Windows runners but occasionally Linux too.

Root cause discovery is blocked by src/tests/backend/common.ts:33, which rethrows unhandled Promise rejections as uncaught exceptions but never logs the reason first:

process.on('unhandledRejection', (reason: string) => { throw reason; });

When the rethrow fires between specs (e.g. a delayed DirtyDB write rejecting after a test completed), mocha exits with code 1 and the original rejection is lost — especially on Windows, where stderr is not always flushed before abrupt exit. CI logs end with ELIFECYCLE Test failed and a clean ✔ from the previous test, with no clue what rejected.

What this PR does

Pure diagnostic patch — no behavior change on success:

  • unhandledRejection handler now writes the reason (or stack) to stderr before rethrowing.
  • Adds a matching uncaughtException handler so non-Promise crashes also leave a trace.

The next CI failure will surface what is actually rejecting (DirtyDB write? plugin lifecycle? socket cleanup?), at which point we can fix the real cause.

Why this isn't using the test logger

The 'test' logger goes through log4js, which uses console layouts on top of process.stdout/stderr — same flushing problem. process.stderr.write is the most deterministic synchronous-ish path, and matters specifically on Windows.

Test plan

  • pnpm exec tsc --noEmit — type-checks cleanly (no new errors in `src/tree)
  • pnpm exec mocha --import=tsx tests/backend/specs/anonymizeIp.ts — 15/15 passing locally
  • After merge: wait for the next Backend tests CI failure and inspect the surfaced rejection reason; open follow-up PR to fix the real culprit

🤖 Generated with Claude Code

…bootstrap

Backend tests on develop have a ~22% silent failure rate (mostly Windows,
sometimes Linux) where mocha exits with code 1 mid-suite, producing no
test failure marker, no error, and no Mocha summary. Different exit
points each run.

Root cause discovery is blocked by src/tests/backend/common.ts:33, which
rethrows unhandled Promise rejections as uncaught exceptions but never
logs the reason first. When the rethrow happens between specs, mocha
exits with code 1 and the original rejection is lost - especially on
Windows, where stderr is not always flushed before abrupt exit.

This patch is purely diagnostic: it writes the reason (or stack) to
stderr before rethrowing, and adds a matching uncaughtException handler
for the same purpose. Behavior on success is unchanged. The next CI
failure will surface what is actually rejecting (DirtyDB write?
plugin lifecycle? socket cleanup?), so we can fix the real cause.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@qodo-code-review
Copy link
Copy Markdown

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

Review Summary by Qodo

Log unhandledRejection and uncaughtException in backend test bootstrap

🧪 Tests 🐞 Bug fix

Grey Divider

Walkthroughs

Description
• Add diagnostic logging for unhandled Promise rejections in backend tests
• Log rejection reason/stack to stderr before rethrowing to prevent silent failures
• Add uncaughtException handler to surface uncaught errors with stack traces
• Improve CI debugging for ~22% silent test failure rate on Windows/Linux runners
Diagram
flowchart LR
  A["Unhandled Promise Rejection"] -->|"log reason to stderr"| B["process.stderr.write"]
  B -->|"then rethrow"| C["throw reason"]
  D["Uncaught Exception"] -->|"log error to stderr"| E["process.stderr.write"]
  E -->|"then exit"| F["process.exit 1"]
  C -->|"prevents silent mocha exit"| G["CI logs show rejection reason"]
  F -->|"prevents silent mocha exit"| G
Loading

Grey Divider

File Changes

1. src/tests/backend/common.ts 🐞 Bug fix +21/-1

Add diagnostic logging for unhandled rejections and exceptions

• Enhanced unhandledRejection handler to log rejection reason/stack to stderr before rethrowing
• Changed parameter type from string to any to handle Error objects with stack traces
• Added new uncaughtException handler that logs errors to stderr and explicitly exits with code 1
• Added detailed comments explaining the diagnostic purpose and Windows stderr flushing issues

src/tests/backend/common.ts


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects Bot commented May 3, 2026

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (1)

Grey Divider


Remediation recommended

1. No regression test for logging 📘 Rule violation ☼ Reliability
Description
This PR changes backend test bootstrap behavior for unhandledRejection/uncaughtException
handling but does not add a regression test that would fail if this logging behavior were removed.
Without a test, the diagnostic fix can be reverted or broken without detection.
Code

src/tests/backend/common.ts[R33-53]

+//
+// Log via process.stderr.write before throwing: when the rethrown rejection
+// kills mocha between specs, the runner exits with code 1 and no summary.
+// Without this, the rejection reason is lost (worst on Windows runners,
+// where stderr is not always flushed before abrupt exit) and CI shows a
+// silent ELIFECYCLE with no clue what rejected.
+process.on('unhandledRejection', (reason: any) => {
+  process.stderr.write(`[backend tests] unhandledRejection: ${
+    reason && reason.stack ? reason.stack : String(reason)}\n`);
+  throw reason;
+});
+
+// Surface uncaught exceptions for the same reason. Node's default behavior
+// (exit non-zero) is preserved by the explicit process.exit below — without
+// the handler, Node would write to stderr and exit; with the handler we have
+// to do it ourselves.
+process.on('uncaughtException', (err: any) => {
+  process.stderr.write(`[backend tests] uncaughtException: ${
+    err && err.stack ? err.stack : String(err)}\n`);
+  process.exit(1);
+});
Evidence
PR Compliance ID 2 requires bug fixes to include a regression test; the diff only adds/changes
process-level error handlers in src/tests/backend/common.ts and includes no accompanying test/spec
changes that assert the new stderr logging behavior.

src/tests/backend/common.ts[33-53]
Best Practice: Repository guidelines

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The change adds stderr logging + rethrow/exit behavior for `unhandledRejection` and `uncaughtException`, but there is no regression test ensuring these handlers log the error reason/stack. This makes the diagnostic fix easy to regress without noticing.

## Issue Context
Compliance requires bug fixes to include a regression test that would fail if the fix were reverted.

## Fix Focus Areas
- src/tests/backend/common.ts[33-53]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Exit bypasses server cleanup 🐞 Bug ☼ Reliability
Description
src/tests/backend/common.ts now calls process.exit(1) in an uncaughtException handler, which
can interrupt src/node/server.ts's own uncaughtException handler that invokes the async
exports.exit() cleanup/logging path. This can truncate shutdown diagnostics (for example,
metrics-at-fatal-error logging) and prevent graceful teardown during CI flakes.
Code

src/tests/backend/common.ts[R49-53]

+process.on('uncaughtException', (err: any) => {
+  process.stderr.write(`[backend tests] uncaughtException: ${
+    err && err.stack ? err.stack : String(err)}\n`);
+  process.exit(1);
+});
Evidence
The backend test bootstrap starts the Etherpad server (server.start()), and the server startup
registers its own uncaughtException handler that logs and calls exports.exit(err) for cleanup.
With the new bootstrap handler exiting the process immediately, the server’s async exit path may not
get a chance to run or finish, reducing diagnostics and skipping cleanup work.

src/tests/backend/common.ts[11-13]
src/tests/backend/common.ts[87-105]
src/tests/backend/common.ts[49-53]
src/node/server.ts[139-155]
src/node/server.ts[247-307]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`src/tests/backend/common.ts` installs an `uncaughtException` handler that always calls `process.exit(1)`. Because the Etherpad server (`src/node/server.ts`) also installs an `uncaughtException` handler that runs async `exports.exit()` cleanup/logging, the unconditional `process.exit(1)` can cut off that path.

### Issue Context
- Backend tests call `server.start()` from `common.init()`.
- `server.start()` registers its own `uncaughtException` handler and uses `exports.exit(err)` to log diagnostics (including metrics) and shut down gracefully.

### Fix Focus Areas
- src/tests/backend/common.ts[39-53]
- src/node/server.ts[139-155]
- src/node/server.ts[247-307]

### Suggested change
Adjust the `uncaughtException` handler in `common.ts` to *not* always `process.exit(1)`.

One practical pattern for tests is:
- Always write the diagnostic line to stderr.
- If there are other `uncaughtException` listeners (for example, the server’s), **do not** call `process.exit(1)` and let the other handler manage shutdown.
- Only force `process.exit(1)` when this handler is effectively the only one (to preserve default “exit non-zero” behavior for early bootstrap failures).

(Implement with a named handler function so it can compare `process.rawListeners('uncaughtException')` / `process.listenerCount('uncaughtException')` and decide whether to force-exit.)

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Advisory comments

3. Duplicate failure logging 🐞 Bug ◔ Observability
Description
The unhandledRejection handler logs and then throws, which will immediately trigger the
uncaughtException handler and log the same underlying failure a second time. This can add noise
and make CI logs harder to scan when diagnosing flakes.
Code

src/tests/backend/common.ts[R39-53]

+process.on('unhandledRejection', (reason: any) => {
+  process.stderr.write(`[backend tests] unhandledRejection: ${
+    reason && reason.stack ? reason.stack : String(reason)}\n`);
+  throw reason;
+});
+
+// Surface uncaught exceptions for the same reason. Node's default behavior
+// (exit non-zero) is preserved by the explicit process.exit below — without
+// the handler, Node would write to stderr and exit; with the handler we have
+// to do it ourselves.
+process.on('uncaughtException', (err: any) => {
+  process.stderr.write(`[backend tests] uncaughtException: ${
+    err && err.stack ? err.stack : String(err)}\n`);
+  process.exit(1);
+});
Evidence
Both handlers are registered in src/tests/backend/common.ts, and the unhandledRejection handler
explicitly throws the rejection reason, which results in an uncaught exception in the same process
and therefore triggers the uncaughtException handler too.

src/tests/backend/common.ts[39-53]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Unhandled rejections will be logged twice: once in `unhandledRejection` and again in `uncaughtException` after the rethrow.

### Issue Context
This is test-only diagnostic code; the goal is to keep the *first* useful reason/stack without clutter.

### Fix Focus Areas
- src/tests/backend/common.ts[39-53]

### Suggested change
Add a small guard to avoid double-printing for the specific “rethrow from unhandledRejection” path (for example, a module-level boolean set just before throwing, and checked/reset in `uncaughtException`, or a WeakSet/marker on Error objects where possible).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

@JohnMcLear JohnMcLear merged commit a0b85dd into ether:develop May 3, 2026
15 of 16 checks passed
JohnMcLear added a commit that referenced this pull request May 3, 2026
…p to #7663) (#7665)

* test(ci): stronger diagnostics for silent backend-test exit

PR #7663 added unhandledRejection / uncaughtException handlers in
common.ts. The next failure after merge (run 25279692065 - Windows
without plugins, Node 24) showed mocha exiting with code 1 mid-suite
261ms after the last passing test, with NEITHER handler firing. So
something more drastic is killing the process - SIGKILL, OOM, fatal
native error - or mocha itself called process.exit before the JS
handlers in common.ts could run.

Two issues with the previous attempt:

1. Handlers in common.ts only register when a spec imports common.ts.
   Only 27 of 47 specs do. If a non-common spec triggers the death,
   handlers may never have been registered.

2. process.stderr.write is asynchronous on Windows when stderr is
   piped (which it is under GitHub Actions). On a hard kill the buffered
   line never reaches the runner log.

This patch:

- Moves diagnostic handlers to a dedicated tests/backend/diagnostics.ts
  loaded via mocha --require, so they register at startup before any
  spec runs.
- Uses fs.writeSync(2, ...) for synchronous stderr writes that the
  kernel completes before returning - the line lands in the log even
  if the process is killed milliseconds later.
- Adds beforeExit / exit / signal handlers so we can discriminate the
  exit mechanism: clean drain vs process.exit vs SIGKILL vs signal.
- Tracks last-seen test via mocha root afterEach hook so the death
  point is visible in the log.

The next CI failure should print enough context to identify the cause,
after which we can fix the real bug and drop this file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(diagnostics): exit(1) on uncaughtException so fatal errors fail fast

Qodo flagged on PR #7665: the uncaughtException handler in
tests/backend/diagnostics.ts only logged and returned. Once a handler is
registered, Node no longer exits on its own. Specs that don't import
tests/backend/common.ts (20 of 47) have only this handler — so a fatal
error would have been swallowed and tests would limp along instead of
failing fast.

Mirror common.ts and call process.exit(1) after logging.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant