Skip to content

fix: gracefulShutdown awaits async cleanup after server.close drain (v0.0.4)#7

Merged
y1o1 merged 4 commits into
developfrom
worktree-graceful-shutdown-rewrite
May 8, 2026
Merged

fix: gracefulShutdown awaits async cleanup after server.close drain (v0.0.4)#7
y1o1 merged 4 commits into
developfrom
worktree-graceful-shutdown-rewrite

Conversation

@y1o1
Copy link
Copy Markdown
Contributor

@y1o1 y1o1 commented May 8, 2026

Summary

  • Cleanup callback now runs inside the server.close() callback (after in-flight requests drain) and is awaited, instead of the previous synchronous fire-before-close shape that silently dropped any returned Promise.
  • cleanup parameter type widens from () => void to () => void | Promise<void> — structurally backward-compatible.
  • Cleanup errors are caught and logged via console.error("gracefulShutdown: cleanup error", err); process.exit(0) still runs.
  • server.closeAllConnections() invoked alongside server.close() to drain keep-alive sockets (Node.js >= 18.2.0).

Spec: auth/.claude/superpowers/specs/2026-05-05-d5-builder-context-lifecycle.md — cross-repo gracefulShutdown semantics fix section.

This release (v0.0.4) is a publish prerequisite for o3co/auth.provider v0.5.2 — the templates/standalone/src/app.mts:117 call gracefulShutdown(server, () => handle.dispose()) was already passing an async dispose() that the previous sync impl silently dropped. With this fix, the lifecycle drain in auth.provider v0.5.x is properly awaited before the process exits.

TDD evidence

  • 4 new RED tests (all failed against current sync impl, all pass against new impl):
    • invokes cleanup inside server.close callback (after in-flight requests drain) — verifies call order ["close", "cleanup", "exit"]
    • awaits an async cleanup before exiting — verifies process.exit only fires after cleanup's Promise resolves
    • logs cleanup errors via console.error and still exits — verifies error path is caught + logged
    • calls server.closeAllConnections to drain keepalive sockets
  • 2 existing tests updated (async + closeAllConnections mock + microtask flush) to keep working under the new ordering.

Test plan

  • pnpm test — all 27 tests pass (5 test files)
  • pnpm typecheck — 0 errors
  • pnpm build — clean
  • pnpm lint — clean (after biome --write formatting pass)

Migration

The signature widening is structurally backward-compatible — every previous cleanup: () => void shape still satisfies () => void | Promise<void>. No code change required at call sites unless callers want to take advantage of awaited async cleanup. Behavior change: cleanup now runs after request drain and is awaited; fire-and-forget ordering is gone.

🤖 Generated with Claude Code

…v0.0.4)

The pre-existing handler ran cleanup synchronously *before* server.close,
which silently dropped any Promise returned by cleanup. The process then
exited before async resource releases (Redis disconnects, DB pools, file
handles) could complete.

New shape:
1. SIGTERM/SIGINT triggers handler
2. server.close() drains in-flight requests
3. server.closeAllConnections() drains keep-alive sockets (Node >= 18.2.0)
4. await cleanup?.() — caught and logged on rejection, does not abort exit
5. process.exit(0)

Signature widening: cleanup type goes from () => void to
() => void | Promise<void>. This is structurally backward-compatible —
every previous caller's cleanup shape still satisfies the new union.

Cleanup errors are caught with console.error("gracefulShutdown: cleanup
error", err) and do NOT abort process.exit(0). The contract is
"best-effort cleanup, then guaranteed exit" — operators relying on
hung-cleanup-blocks-exit semantics had no such guarantee in the previous
implementation either (because cleanup was synchronous, exceptions would
have crashed the handler before exit).

Tests: 4 new RED → GREEN tests (call order, async await, error logging,
closeAllConnections invocation) + 2 existing tests updated to be async
and mock closeAllConnections. All 27 tests pass; lint, typecheck, and
build all green.

Spec: .claude/superpowers/specs/2026-05-05-d5-builder-context-lifecycle.md
(cross-repo gracefulShutdown semantics fix section, lines 475-520).

Cross-repo coordination:
This release is a publish prerequisite for o3co/auth.provider v0.5.2 —
the standalone template's gracefulShutdown(server, () => handle.dispose())
call was already passing an async cleanup that the previous sync impl
silently dropped; v0.0.4 properly awaits it before process.exit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 8, 2026 08:00
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the gracefulShutdown utility to run cleanup only after server.close() completes, await async cleanup, and add keep-alive connection draining behavior; it also expands test coverage and documents the behavior change for the upcoming v0.0.4 release.

Changes:

  • Run cleanup inside the server.close() callback and await it (with error logging) before calling process.exit(0).
  • Invoke server.closeAllConnections() during shutdown to address lingering keep-alive sockets.
  • Add/adjust Vitest coverage for ordering, async cleanup awaiting, error logging, and closeAllConnections() usage; add a v0.0.4 changelog entry.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
src/shutdown.mts Moves cleanup into the server.close() callback, awaits it, logs cleanup errors, and calls closeAllConnections().
src/tests/shutdown.test.mts Adds new tests for shutdown ordering/awaiting/error logging and updates existing tests to account for the new ordering and closeAllConnections().
CHANGELOG.md Documents the v0.0.4 behavior change, including awaited cleanup and keep-alive draining.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/shutdown.mts Outdated
Comment thread src/shutdown.mts Outdated
Comment thread src/shutdown.mts
Comment thread src/__tests__/shutdown.test.mts
Comment thread CHANGELOG.md Outdated
Comment thread CHANGELOG.md Outdated
…ines)

Round 1 multi-agent-review converged on a Critical issue + 2 Important
items. This fixup addresses C1+C2+C3+C4+C6.

C1 (Critical, Codex P1 + Claude I-2 convergence) — replace
`server.closeAllConnections()` with `server.closeIdleConnections()`.
Per Node.js docs:
  closeAllConnections — "Closes all established HTTP(S) connections...
    including active connections connected to this server which are
    sending a request or waiting for a response."
  closeIdleConnections — "Closes all connections... which are not
    sending a request or waiting for a response."

The original spec called for `closeAllConnections`, but that destroys
in-flight request connections too — exactly the regression a graceful
shutdown is meant to avoid. `closeIdleConnections` is the correct
keep-alive-reaping primitive.

(Note: Node.js >= 19.0.0 reaps idle keep-alives in `server.close`
itself, so the explicit call is harmless there and necessary on 18.x.)

C2 (Important, Claude I-1) — fix CHANGELOG v0.0.3 release date
2026-04-18 → 2026-04-14 (verified against tag commit df1b2ed).

C3 (Important, Claude I-2 paragraph 1) — rewrite the CHANGELOG
"new shape" narrative so it reflects the actual concurrent execution:
`server.close(...)` is registered with an async callback while
`server.closeIdleConnections()` runs synchronously in parallel.
The previous numbered-list framing implied sequential 1→2→3→4→5 which
mismatched the implementation.

C4 (Minor, Claude M-1) — declare `engines.node >= 18.19.0` in
package.json. Matches the auth.provider consumer floor, fails fast on
pre-18.2 installers where `closeIdleConnections` is undefined.

C6 (Minor, Claude M-3) — rename "calls cleanup function when handler is
invoked" test to "works with a synchronous cleanup function" so it
clearly differentiates from the new "invokes cleanup inside server.close
callback (after in-flight requests drain)" ordering test, and rename
"calls server.closeAllConnections to drain keepalive sockets" to
"calls server.closeIdleConnections to drain idle keep-alive sockets
without aborting in-flight requests" for the same clarity reason.

Deferred (per user-confirmed scope):
- C5 M-2 JSDoc — pre-existing absence; out of scope for this PR
- C7 M-4 SIGINT behavioral test — structural argument suffices (same
  handler reference is registered for both signals)
- C8 M-5 cleanup-hang timeout — would be a separate contract decision,
  out of scope

Verification: 27/27 tests pass, typecheck/lint/build all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

Comment thread src/shutdown.mts
Comment thread src/shutdown.mts
Comment thread CHANGELOG.md
Comment thread src/__tests__/shutdown.test.mts Outdated
y1o1 and others added 2 commits May 8, 2026 17:17
…sion defense)

Round 2 multi-agent-review (Codex 0 / Claude 3 Minor) — apply all 3.

M-1 (Claude) — CHANGELOG line 50 reworded from "in parallel with
server.close()" to "synchronously after server.close() registers its
callback" so the Added section header matches the lifecycle described in
the Changed section above (which already gets it right).

M-2 (Claude) — add a negative `expect(closeAllSpy).not.toHaveBeenCalled()`
assertion to the closeIdleConnections drain test. Defends against a
future contributor flipping back to the pre-Round 1 spec premise where
closeAllConnections aborts in-flight requests; the test suite now fails
loudly instead of passing silently if that regression is reintroduced.

M-3 (Claude) — rename test variable closeAllSpy → closeIdleSpy in the
6 sites that mock closeIdleConnections. The old name was a leftover from
the closeAllConnections shape and read inconsistently with the production
API.

Verification: 27/27 tests pass, lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round 3 Copilot review: 10 threads. 7 stale (already addressed by Round 1
fixup 3641ba7 — Copilot's first review was on the pre-fixup commit). 3 new:

T3 (Important) — idempotent guard for repeated signal delivery.
RED test "is idempotent under repeated signal delivery" added first
(asserts 3 handler() calls → cleanup/close/closeIdle/exit each invoked
exactly once). Pre-impl: failed with "called 3 times". Post-impl: passes.

Implementation: a closure-local `shuttingDown` flag short-circuits the
second handler() onward, plus `process.removeListener` for both SIGTERM
and SIGINT on first entry. Operators sending multiple SIGTERMs (k8s
pre-SIGKILL retry, Ctrl+C-spam) no longer cause duplicated cleanup or
multiple process.exit(0).

(As a side benefit, removeListener also fixes a pre-existing
test-order-pollution flake where the previous "works without cleanup
function" test occasionally observed a stale handler on the real
process due to listeners not being torn down between tests.)

T4 + T10 (Minor, dupe across two Copilot reviews) — rename test helper
`flushMicrotasks` → `flushEventLoop`. The helper uses `setImmediate`,
which queues macrotasks (event-loop turns), not microtasks. Pure
naming clarity.

Stale 7 (resolved as already-addressed):
- T1, T7: closeAllConnections feature-detect → covered by R1's
  closeIdleConnections + engines.node
- T2, T8: closeAllConnections aborts in-flight → R1 fixed it
- T5, T9: CHANGELOG order mismatch → R1 rewrote CHANGELOG narrative
- T6: engines.node not declared → R1 added engines.node ≥ 18.19.0

Verification: 28/28 tests pass (27 prior + 1 new RED→GREEN), lint clean,
typecheck clean, build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@y1o1 y1o1 merged commit 2a5241c into develop May 8, 2026
1 check passed
@y1o1 y1o1 deleted the worktree-graceful-shutdown-rewrite branch May 8, 2026 19:46
y1o1 added a commit to o3co/auth.provider that referenced this pull request May 8, 2026
… block (#145)

- Bump `@o3co/auth.utils` minimum in `templates/standalone/package.json` from `^0.0.3` to `^0.0.4` to pick up the cross-repo `gracefulShutdown` rewrite (cleanup awaited inside `server.close` callback, signature widened to `() => void | Promise<void>`, idempotent under repeated SIGTERM/SIGINT, uses `closeIdleConnections` not `closeAllConnections`).
- Convert CHANGELOG `[Unreleased]` → `[0.5.2] — 2026-05-09` so the release tag captures all of F1–F9 + SC residual + this auth.utils bump as one v0.5.2 train.
- Round 1 fixup: replaced 37 hard-coded `v0.5.1` references inside the `[0.5.2]` block with `v0.5.2` so the migration narrative is internally consistent (the [0.5.2] block authored entries had inline v0.5.1 markers from when these were intended to ship as v0.5.1).
- Round 2 fixup (Copilot): renamed `### Dependencies` to `### Changed (auth.utils dependency bump, v0.5.2)` to match Keep a Changelog standard taxonomy.

Cross-repo: `o3co/auth.utils#7` merged at `2a5241cf` and `@o3co/auth.utils@0.0.4` published on npm at 2026-05-08T19:46:34Z via Trusted Publisher OIDC.

Why v0.5.2 (not v0.5.1): v0.5.1 was tagged early at commit `04132c69` as the `create-auth-provider` ENOENT hotfix off `v0.5.0`; the F-series module-system redesign + supply-chain residuals + this dep bump all ship as v0.5.2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants