Skip to content

fix: remove duplicate diagnostics error, add recovery guidance, fix stale comments#607

Merged
Sleepful merged 3 commits intomainfrom
wal-slot-phase-fix
Apr 21, 2026
Merged

fix: remove duplicate diagnostics error, add recovery guidance, fix stale comments#607
Sleepful merged 3 commits intomainfrom
wal-slot-phase-fix

Conversation

@Sleepful
Copy link
Copy Markdown
Contributor

Summary

Follow-up cleanup to PR #606. Three small fixes:

  1. Remove duplicate slot-lost error from diagnosticslast_fatal_error already reports [PSYNC_S1146] when the slot is lost. The diagnostics WAL budget check was adding a second fatal error with a different message. Removed the duplicate; the budget check now only fires for the warning case (budget depleting, not yet lost).

  2. Add "delete the existing slot to recover" to initSlot() error — The recovery guidance ("delete the slot") was only in the now-removed diagnostics error. Moved it into the initSlot() error message so it's visible in last_fatal_error.

  3. Fix stale red-test comments — Updated comments that said "this test should FAIL" (leftover from the red-green development cycle).

Also: clamp negative safe_wal_size to 0% in the budget percentage calculation (previously produced -71596% when WAL consumed exceeded the limit but the slot wasn't yet checkpointed as lost).

Files changed

File Change
packages/service-core/src/api/diagnostics.ts Remove lost fatal, keep only budget warning. Add wal_status !== 'lost' guard. Clamp negative percentage.
packages/service-core/test/src/diagnostics.test.ts Replace lost → fatal test with lost → no WAL budget error. Add negative clamp test.
modules/module-postgres/src/replication/WalStream.ts Add "delete the existing slot to recover" to initSlot() error message
modules/module-postgres/test/src/wal_stream.test.ts Fix stale "should FAIL" comments

Test exercises initSlot() with a lost slot and snapshotDone === false
(Case 1). Currently fails: initSlot() hardcodes phase as streaming
instead of deriving it from snapshotDone. The test expects phase to
be snapshot so retry is blocked for snapshot failures requiring
operator intervention to recover.
When a slot is found lost in initSlot(), use the persisted snapshotDone
flag to determine the phase instead of hardcoding streaming. If the
snapshot was not completed (snapshotDone === false), report phase as
snapshot to block retries for snapshot failures requiring operator
intervention to recover.

Also add WAL budget warnings to diagnostics API errors array: fatal
error when slot is lost, warning when budget at or below 50%. Guard
getSlotWalBudget call with slot_name check for validation endpoint.
Clamp negative safe_wal_size to 0% in budget percentage calculation.
…guidance to initSlot

Remove the wal_status=lost fatal error from diagnostics warnings since
last_fatal_error already reports it. Move "delete the existing slot to
recover" guidance into the initSlot() error message so the recovery
step is visible in the primary error. Clamp negative safe_wal_size to
0% in budget percentage. Fix stale red-test comments.
@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 15, 2026

⚠️ No Changeset found

Latest commit: e208a51

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@Sleepful Sleepful marked this pull request as ready for review April 15, 2026 08:35
@Sleepful Sleepful requested a review from rkistner April 15, 2026 18:39
@Sleepful Sleepful merged commit 2d32d2a into main Apr 21, 2026
44 checks passed
@Sleepful Sleepful deleted the wal-slot-phase-fix branch April 21, 2026 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants