fix: remove duplicate diagnostics error, add recovery guidance, fix stale comments#607
Merged
fix: remove duplicate diagnostics error, add recovery guidance, fix stale comments#607
Conversation
Test exercises initSlot() with a lost slot and snapshotDone === false (Case 1). Currently fails: initSlot() hardcodes phase as streaming instead of deriving it from snapshotDone. The test expects phase to be snapshot so retry is blocked for snapshot failures requiring operator intervention to recover.
When a slot is found lost in initSlot(), use the persisted snapshotDone flag to determine the phase instead of hardcoding streaming. If the snapshot was not completed (snapshotDone === false), report phase as snapshot to block retries for snapshot failures requiring operator intervention to recover. Also add WAL budget warnings to diagnostics API errors array: fatal error when slot is lost, warning when budget at or below 50%. Guard getSlotWalBudget call with slot_name check for validation endpoint. Clamp negative safe_wal_size to 0% in budget percentage calculation.
…guidance to initSlot Remove the wal_status=lost fatal error from diagnostics warnings since last_fatal_error already reports it. Move "delete the existing slot to recover" guidance into the initSlot() error message so the recovery step is visible in the primary error. Clamp negative safe_wal_size to 0% in budget percentage. Fix stale red-test comments.
|
1 task
rkistner
approved these changes
Apr 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up cleanup to PR #606. Three small fixes:
Remove duplicate slot-lost error from diagnostics —
last_fatal_erroralready reports[PSYNC_S1146]when the slot is lost. The diagnostics WAL budget check was adding a second fatal error with a different message. Removed the duplicate; the budget check now only fires for the warning case (budget depleting, not yet lost).Add "delete the existing slot to recover" to initSlot() error — The recovery guidance ("delete the slot") was only in the now-removed diagnostics error. Moved it into the
initSlot()error message so it's visible inlast_fatal_error.Fix stale red-test comments — Updated comments that said "this test should FAIL" (leftover from the red-green development cycle).
Also: clamp negative
safe_wal_sizeto 0% in the budget percentage calculation (previously produced-71596%when WAL consumed exceeded the limit but the slot wasn't yet checkpointed as lost).Files changed
packages/service-core/src/api/diagnostics.tslostfatal, keep only budget warning. Addwal_status !== 'lost'guard. Clamp negative percentage.packages/service-core/test/src/diagnostics.test.tslost → fataltest withlost → no WAL budget error. Add negative clamp test.modules/module-postgres/src/replication/WalStream.tsmodules/module-postgres/test/src/wal_stream.test.ts