Reduce work for spock.progress and remove checkpoint hook#450
Conversation
Replace the Checkpoint_hook + resource.dat + custom WAL RMGR machinery with (1) replication origins as the source of truth for remote_commit_lsn, (2) a shutdown-only resource.dat snapshot with LSN-validated load, and (3) a forced keepalive on apply-worker reconnect. Removes the Postgres core patch and the extra per-commit XLogFlush.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRemoves the checkpoint hook and per-commit WAL flushes; retargets RMGR to emit per-entry resource-dump records at dump sites; adds apply-worker startup reconciliation and a bounded backward scan of pg_commit_ts to recover per-origin remote_commit_ts; seeds LSNs with an initial keepalive and moves resource.dat dumps to supervisor-controlled shutdown and sync events. ChangesProgress State Recovery & Persistence Refactor
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Up to standards ✅🟢 Issues
|
| Category | Results |
|---|---|
| Complexity | 4 medium |
🟢 Metrics 29 complexity · 4 duplication
Metric Results Complexity 29 Duplication 4
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/internals-doc/plans/2026-04-30-recover-remote-commit-ts-plan.md`:
- Line 11: Update the broken Spec breadcrumb: in the "**Spec:**" line replace
the incorrect path
`docs/internals-doc/specs/2026-04-30-recover-remote-commit-ts-design.md` with
the actual spec filename added in this PR (i.e., the shipped spec file name),
ensuring the "**Spec:**" entry points to the real file in
docs/internals-doc/specs so readers are not sent to a dead path.
In `@docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md`:
- Line 101: Update the documentation to point to the new implementation module
and header: replace references that say the constants/enum/helper live in
src/spock_apply.c with the moved locations in src/spock_progress_recovery.c and
include/spock_progress_recovery.h, and likewise update any other mentions (e.g.,
the three constants SPOCK_TS_RECOVERY_MIN_SEEN_PER_ORIGIN,
SPOCK_TS_RECOVERY_SCAN_LIMIT, SPOCK_TS_RECOVERY_BATCH_SIZE and related
enum/helper) so the doc accurately reflects the shipped implementation.
In `@include/spock_rmgr.h`:
- Around line 43-50: The SpockResourceDumpRec uses 16-bit fields entry_seq and
entry_total which will wrap for >65535 entries; update the struct
(SpockResourceDumpRec) to use uint32 for entry_seq and entry_total and update
serialization/deserialization and any readers/writers (notably
spock_rmgr_log_resource_dump and WAL emit/parse code) to use 32-bit widths, or
alternatively add a guard in spock_rmgr_log_resource_dump to detect counts >
UINT16_MAX and fail emission with a clear error; ensure all references to these
fields (packing/unpacking, size calculations, compatibility checks) are updated
to the chosen 32-bit type or to handle the failure path consistently.
In `@src/spock_group.c`:
- Around line 757-763: The current code calls
init_progress_fields(&entry->progress) then
progress_update_struct(&entry->progress, sap), but progress_update_struct only
copies remote_commit_lsn when sap->remote_commit_ts beats the existing
timestamp, so after reconcile_progress_with_origin this can zero out a valid
sap->remote_commit_lsn; instead, after init_progress_fields(&entry->progress)
assign the resume-related fields on entry->progress directly from sap (e.g.,
entry->progress.remote_commit_lsn = sap->remote_commit_lsn and
entry->progress.remote_commit_ts = sap->remote_commit_ts, plus any other resume
LSN/TS fields) rather than calling progress_update_struct so the authoritative
resume LSN from sap is preserved.
In `@src/spock_progress_recovery.c`:
- Around line 74-103: The code must short-circuit when there's no durable origin
progress or shared state to avoid dereferencing SpockCtx/SpockGroupHash or
scanning commit_ts on an InvalidXLogRecPtr; after calling
reconcile_progress_with_origin(origin_lsn) and handling RECONCILE_FILE_IN_SYNC,
also return immediately if result == RECONCILE_FILE_ABSENT or origin_lsn ==
InvalidXLogRecPtr or SpockCtx == NULL or SpockGroupHash == NULL so you don't
acquire SpockCtx->apply_group_master_lock or call
hash_search/recover_progress_timestamps_from_commit_ts when state is
uninitialized; update the block using those symbols (result,
reconcile_progress_with_origin, RECONCILE_FILE_ABSENT, origin_lsn, SpockCtx,
SpockGroupHash, SpockCtx->apply_group_master_lock, SpockGroupHash, hash_search,
recover_progress_timestamps_from_commit_ts) to perform this early return.
In `@src/spock_rmgr.c`:
- Around line 189-190: entry_seq is declared as uint16 and entry_total is being
truncated to uint16, which will overflow for dumps >65535 entries and corrupt
the WAL forensic stream; fix by widening the counters (e.g., use uint32_t or
uint64_t for entry_seq and entry_total) or, if protocol requires 16-bit on-wire
values, validate and reject/clamp large totals before truncation: update
declarations for entry_seq/entry_total, add a pre-check that if original_total >
UINT16_MAX then log/error and abort the dump (or emit a protocol error), and
ensure any assignment that currently casts to uint16 (the places handling
entry_total and seq) use the new wider type or explicit guarded cast with error
handling. Ensure references to entry_seq and entry_total across the file are
updated to the new type (or the validation logic is added wherever truncation
occurs).
In `@tests/tap/t/023_forced_keepalive_timing.pl`:
- Around line 74-80: The test restart uses a clean stop/start so
spock.progress.remote_insert_lsn may be pre-seeded from resource.dat causing the
poll (lines around wait_for_pg_ready and the check loop) to pass without the
forced keepalive exercising request_initial_status_update(); to fix, change the
restart path or assertions so the test verifies an actual post-reconnect LSN
change: either (a) remove or rename/ignore resource.dat before starting the
subscriber so remote_insert_lsn is not pre-seeded, or (b) read and store the
persisted spock.progress.remote_insert_lsn value before restart and after
reconnect assert that the new remote_insert_lsn (fetched from the database) is
strictly greater than the persisted value, ensuring
request_initial_status_update() ran. Ensure the code touches the restart
sequence around wait_for_pg_ready and the check loop to implement one of these
two fixes.
In `@tests/tap/t/027_remote_commit_ts_recovery.pl`:
- Around line 69-74: The test currently asserts a non-NULL remote_commit_ts too
early because a stale resource.dat can mask whether
recover_progress_timestamps_from_commit_ts() actually ran; change the check so
that after restarting the subscriber you wait until either the subscriber's
remote_commit_lsn has reached the pre-crash/origin LSN or the recovery log line
"ts recovery: scanned ... recovered remote_commit_ts" appears before asserting
the timestamp (use an existing log-wait helper or poll the subscriber via SQL to
read remote_commit_lsn and compare to the expected LSN, then assert
remote_commit_ts is set), referencing remote_commit_ts, remote_commit_lsn and
recover_progress_timestamps_from_commit_ts() in the test change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4fde131b-6342-4aff-a56b-45d0d3d12282
📒 Files selected for processing (19)
docs/internals-doc/plans/2026-04-30-recover-remote-commit-ts-plan.mddocs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.mdinclude/spock_group.hinclude/spock_progress_recovery.hinclude/spock_rmgr.hinclude/spock_worker.hsrc/spock.csrc/spock_apply.csrc/spock_group.csrc/spock_progress_recovery.csrc/spock_rmgr.csrc/spock_shmem.csrc/spock_worker.ctests/tap/t/008_rmgr.pltests/tap/t/022_rmgr_progress_post_checkpoint_crash.pltests/tap/t/023_forced_keepalive_timing.pltests/tap/t/026_no_double_wal_flush.pltests/tap/t/027_remote_commit_ts_recovery.pltests/tap/t/029_remote_commit_ts_recovery_clean_shutdown.pl
💤 Files with no reviewable changes (2)
- include/spock_group.h
- src/spock_shmem.c
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md`:
- Around line 207-208: The docs currently state that on startup with origin LSN
== InvalidXLogRecPtr we "keep resource.dat value if any", but the implementation
reconciles first, then skips the pg_commit_ts scan so any existing resource.dat
entry with file_lsn > InvalidXLogRecPtr is actually rewritten to the origin LSN
with cleared timestamps; update the sentence to describe this real behavior
(mention reconciliation step, pg_commit_ts scan being skipped,
InvalidXLogRecPtr, resource.dat and that entries are rewritten to origin LSN
with cleared timestamps) so the spec matches the code.
In `@src/spock_progress_recovery.c`:
- Around line 187-191: When initializing a new entry with a valid origin_lsn,
seed the other LSN fields so the invariant "remote_insert_lsn/received_lsn >=
remote_commit_lsn" holds: after setting entry->progress.remote_commit_lsn =
origin_lsn, also set entry->progress.remote_insert_lsn and
entry->progress.received_lsn to origin_lsn (or to
entry->progress.remote_commit_lsn) instead of leaving them as InvalidXLogRecPtr;
update the initialization in the same block that assigns origin_lsn to
remote_commit_lsn (referencing entry->progress.remote_commit_lsn,
entry->progress.remote_insert_lsn, entry->progress.received_lsn and origin_lsn).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f7407f1b-f75a-4b69-86e9-7cb0474023ca
📒 Files selected for processing (6)
docs/internals-doc/plans/2026-04-30-recover-remote-commit-ts-plan.mddocs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.mdsrc/spock_group.csrc/spock_progress_recovery.ctests/tap/t/023_forced_keepalive_timing.pltests/tap/t/027_remote_commit_ts_recovery.pl
✅ Files skipped from review due to trivial changes (1)
- docs/internals-doc/plans/2026-04-30-recover-remote-commit-ts-plan.md
🚧 Files skipped from review as they are similar to previous changes (3)
- tests/tap/t/023_forced_keepalive_timing.pl
- tests/tap/t/027_remote_commit_ts_recovery.pl
- src/spock_group.c
b3e0786 to
2e0333b
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md`:
- Around line 79-80: Update the docs bullet for the "Entry absent" case to state
that non-timestamp LSN fields remote_insert_lsn and received_lsn are seeded from
origin_lsn (instead of left zero/NULL) so that insert/received >= commit holds
on first read; refer to the behavior implemented in
src/spock_progress_recovery.c and mention the resulting state is
RECONCILE_FILE_ABSENT while keeping remote_commit_lsn = origin_lsn and
timestamps zero/NULL as before.
In `@src/spock_progress_recovery.c`:
- Around line 117-124: The recovery scan currently only filters by the publisher
origin id (call site uses recover_progress_timestamps_from_commit_ts(entry,
(RepOriginId) MySubscription->origin->id)) which can return stale rows from
prior/recreated subscriptions; change recover_progress_timestamps_from_commit_ts
to accept an additional subscription-specific fence (for example the local
subscription identifier MySubscription->id or the subscription's
roident/roident+remote_commit_lsn boundary) and update both call sites (the one
shown and the one around lines 315–318) to pass that fence; inside
recover_progress_timestamps_from_commit_ts apply the extra filter so you only
consider pg_commit_ts rows that match the current subscription/progress window
(and optionally constrain by remote_commit_lsn) before updating remote_commit_ts
/ prev_remote_ts.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d812d38a-ffe2-4a27-a6e3-030f4440c91b
📒 Files selected for processing (6)
docs/internals-doc/plans/2026-04-30-recover-remote-commit-ts-plan.mddocs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.mdsrc/spock_group.csrc/spock_progress_recovery.ctests/tap/t/023_forced_keepalive_timing.pltests/tap/t/027_remote_commit_ts_recovery.pl
✅ Files skipped from review due to trivial changes (1)
- docs/internals-doc/plans/2026-04-30-recover-remote-commit-ts-plan.md
🚧 Files skipped from review as they are similar to previous changes (3)
- src/spock_group.c
- tests/tap/t/027_remote_commit_ts_recovery.pl
- tests/tap/t/023_forced_keepalive_timing.pl
2e0333b to
3b4a10a
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md (1)
79-82:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDocument the seeded
remote_insert_lsn/received_lsnvalues in the ABSENT path.The implementation no longer leaves all non-timestamp fields zero/NULL here: new entries seed
remote_insert_lsnandreceived_lsnfromorigin_lsnto preserveinsert/received >= commitfrom the first read. This wording is now inaccurate, and the same fix is needed again at Line 203.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md` around lines 79 - 82, Update the ABSENT case description to state that new entries seed remote_insert_lsn and received_lsn from origin_lsn (not left zero/NULL) in addition to initializing remote_commit_lsn = origin_lsn and zeroing timestamps, so the invariant insert/received >= commit holds on first read; apply the same wording change to the other ABSENT description elsewhere in the document that currently says non-timestamp fields are zero/NULL (ensure references to remote_commit_lsn, remote_insert_lsn, received_lsn, and RECONCILE_FILE_ABSENT are updated).src/spock_progress_recovery.c (1)
319-322:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftFence the
pg_commit_tsscan to this subscription's progress window.This still recovers timestamps from any commit with the same
target_origin. If a subscription is recreated, or a sibling subscription shares the same publisher node id,running_max_tscan come from unrelated history and pushremote_commit_ts/prev_remote_tsahead of this entry's durable state after restart. Filtering on origin id alone is too broad here.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md`:
- Around line 41-45: Update the opening description that currently states
"resource.dat as a shutdown-only snapshot" to accurately reflect the as-built
behavior by listing all dump paths: the shutdown snapshot plus explicit runtime
dumps via spock_group_resource_dump() (used on add_node) and the table-sync dump
path; reference the symbol resource.dat and the function
spock_group_resource_dump(), and mention add_node and table-sync so readers can
find the corresponding runtime sections that document those dump triggers.
---
Duplicate comments:
In `@docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md`:
- Around line 79-82: Update the ABSENT case description to state that new
entries seed remote_insert_lsn and received_lsn from origin_lsn (not left
zero/NULL) in addition to initializing remote_commit_lsn = origin_lsn and
zeroing timestamps, so the invariant insert/received >= commit holds on first
read; apply the same wording change to the other ABSENT description elsewhere in
the document that currently says non-timestamp fields are zero/NULL (ensure
references to remote_commit_lsn, remote_insert_lsn, received_lsn, and
RECONCILE_FILE_ABSENT are updated).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: dd8160e0-381d-44c0-9421-a9f355c47c1b
📒 Files selected for processing (6)
docs/internals-doc/plans/2026-04-30-recover-remote-commit-ts-plan.mddocs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.mdsrc/spock_group.csrc/spock_progress_recovery.ctests/tap/t/023_forced_keepalive_timing.pltests/tap/t/027_remote_commit_ts_recovery.pl
✅ Files skipped from review due to trivial changes (4)
- tests/tap/t/027_remote_commit_ts_recovery.pl
- tests/tap/t/023_forced_keepalive_timing.pl
- docs/internals-doc/plans/2026-04-30-recover-remote-commit-ts-plan.md
- src/spock_group.c
There was a problem hiding this comment.
♻️ Duplicate comments (1)
docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md (1)
79-79:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix the entry-absent state description to match the shipped reconcile code.
Line 79 still says “other fields zero/NULL,” but
reconcile_progress_with_origin()seedsremote_insert_lsnandreceived_lsntoorigin_lsnon absent entries. Please update this bullet so the spec matches runtime behavior.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md` at line 79, Update the "Entry absent" bullet to reflect runtime behavior: instead of saying "other fields zero/NULL", state that reconcile_progress_with_origin() seeds remote_insert_lsn and received_lsn to origin_lsn while remote_commit_lsn = origin_lsn and the remaining fields remain zero/NULL, and keep the resulting state RECONCILE_FILE_ABSENT; reference reconcile_progress_with_origin(), remote_commit_lsn, remote_insert_lsn, received_lsn, and RECONCILE_FILE_ABSENT when making the change.
🧹 Nitpick comments (2)
docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md (2)
219-219: ⚡ Quick winAdd explicit origin semantics (
origin=0vs unavailable) in this edge-case note.This section discusses origin filtering and cross-subscription bleed-through; please explicitly state that
origin=0means local writes, while NULL/unknown means unavailable origin metadata, so readers don’t conflate those cases during audits/reconciliation.Based on learnings: in Spock docs, distinguish
origin=0(local change) from NULL/unknown (origin unavailable), including implications for audit/reconciliation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md` at line 219, Update the edge-case note to explicitly state origin semantics: clarify that origin=0 denotes local writes while NULL/unknown denotes unavailable origin metadata, and explain how this affects the filter using roident == MySubscription->origin->id and the computation of running_max_ts and its forensic impact on prev_remote_ts; mention that sibling subscriptions or dropped/recreated subscriptions can bleed in commits when origin metadata is unavailable and note implications for audit/reconciliation and the planned parallel-apply rework.
68-69: ⚡ Quick winAvoid calling RMGR records a “marker” here; they carry full snapshots.
Line 68 currently describes a dump-event marker, but
SPOCK_RMGR_RESOURCE_DUMPrecords embed fullSpockApplyProgresspayloads per entry. Tightening this wording will prevent confusion with lightweight marker-only events.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md` around lines 68 - 69, The wording incorrectly calls the RMGR record a "marker" — update the text to state that RMGR (id 144) implements SPOCK_RMGR_RESOURCE_DUMP records which embed full SpockApplyProgress payloads per entry (not lightweight markers); revise the sentence describing the redo handler to say it emits a LOG line and that each dump record contains the full snapshot payload (SpockApplyProgress) rather than being a marker-only event, referencing SPOCK_RMGR_RESOURCE_DUMP and SpockApplyProgress to make the distinction explicit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md`:
- Line 79: Update the "Entry absent" bullet to reflect runtime behavior: instead
of saying "other fields zero/NULL", state that reconcile_progress_with_origin()
seeds remote_insert_lsn and received_lsn to origin_lsn while remote_commit_lsn =
origin_lsn and the remaining fields remain zero/NULL, and keep the resulting
state RECONCILE_FILE_ABSENT; reference reconcile_progress_with_origin(),
remote_commit_lsn, remote_insert_lsn, received_lsn, and RECONCILE_FILE_ABSENT
when making the change.
---
Nitpick comments:
In `@docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md`:
- Line 219: Update the edge-case note to explicitly state origin semantics:
clarify that origin=0 denotes local writes while NULL/unknown denotes
unavailable origin metadata, and explain how this affects the filter using
roident == MySubscription->origin->id and the computation of running_max_ts and
its forensic impact on prev_remote_ts; mention that sibling subscriptions or
dropped/recreated subscriptions can bleed in commits when origin metadata is
unavailable and note implications for audit/reconciliation and the planned
parallel-apply rework.
- Around line 68-69: The wording incorrectly calls the RMGR record a "marker" —
update the text to state that RMGR (id 144) implements SPOCK_RMGR_RESOURCE_DUMP
records which embed full SpockApplyProgress payloads per entry (not lightweight
markers); revise the sentence describing the redo handler to say it emits a LOG
line and that each dump record contains the full snapshot payload
(SpockApplyProgress) rather than being a marker-only event, referencing
SPOCK_RMGR_RESOURCE_DUMP and SpockApplyProgress to make the distinction
explicit.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6516baea-a752-40fd-a769-6be8f182af47
📒 Files selected for processing (1)
docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md
… reconcile Replaces the WAL-RMGR + checkpoint-hook persistence for spock.progress with a file snapshot + replication-origin reconcile. What's removed: - The WAL RMGR's apply-progress recovery records (no per-commit SPOCK_RMGR_APPLY_PROGRESS, no redo-into-shmem). - spock_checkpoint_hook() and its registration. - Per-commit XLogFlush() in handle_commit. What's added: - reconcile_progress_with_origin() at apply-worker startup: reads the durable origin LSN, advances remote_commit_lsn if resource.dat is stale, clears timestamp fields pending later refresh. - A retargeted RMGR (id 144) emitting one WAL record per SpockGroupHash entry at each resource.dat dump event. Records are forensic only (visible via pg_waldump); state recovery does not depend on them. - spock_supervisor_on_exit(): the spock supervisor's before_shmem_exit callback. On clean shutdown, polls (~5 s ceiling) for sibling apply/sync workers to detach via spock_any_apply_worker_running() (new helper), then runs spock_group_resource_dump() and the SHUTDOWN forensic WAL emit. XLogInsert requires a backend context, so the supervisor (a bgworker) is the right owner -- not postmaster. Both file and WAL dumps share one owner and reflect the same drained state. What's changed: - add_node and table-sync paths: drop per-entry WAL writes, dump resource.dat once after the loop via durable_rename. Performance: removes one fsync per applied remote-origin commit on the apply hot path. Under synchronous_commit=off (the apply worker's default), measured via 026_no_double_wal_flush.pl as ~2 fsyncs over 30 applied commits vs >=30 before. Test suite updated for the new layout.
After START_REPLICATION, send an 'r' status update with replyRequested=1 so the publisher immediately responds with a 'k' keepalive. The receive loop will then populate remote_insert_lsn/received_lsn in shmem within milliseconds, rather than waiting for wal_sender_timeout/2. Also remove the pg_attribute_unused() marker from the helper's definition now that it has a caller.
After a crash, resource.dat may hold a stale snapshot relative to the durable replication origin. reconcile_progress_with_origin clears remote_commit_ts in that case, leaving spock.progress NULL until the next applied commit -- a freshness regression. Add recover_progress_timestamps_from_commit_ts(): when reconcile returns anything other than RECONCILE_FILE_IN_SYNC, walk pg_commit_ts backward from the latest xid filtered on the publisher's origin id, populating remote_commit_ts with the recovered max-by-ts. Bounded termination at 1000 commits seen for the origin or 1M xids total scanned. reconcile_progress_with_origin returns a ReconcileResult enum so the caller can gate the scan on staleness. last_updated_ts is set to the recovered remote_commit_ts (lower bound on local apply time; under-reports lag rather than zeroing it; refreshes on next applied commit). The recovered prev_remote_ts is also intended to be useful for the planned parallel-apply rework. New test 027 verifies post-crash remote_commit_ts recovery; gates on (commit_lsn >= pre_crash_lsn AND ts IS NOT NULL) to prove both reconcile and recovery scan executed.
CodeRabbit + reviewer follow-up. Substantive items: - spock_group_progress_force_set_list: replace spock_init_progress_fields() + progress_update_struct() with a struct copy + insert_lsn/received_lsn invariant clamps. progress_update_struct's max-by-ts merge skips remote_commit_lsn when sap->remote_commit_ts is 0 (a legal post-reconcile shape), which could leave a force-set entry with InvalidXLogRecPtr commit_lsn. - spock_init_progress_state: add NULL shmem and InvalidXLogRecPtr origin_lsn short-circuits (defense in depth + correctness guard against backfilling stale historical ts on fresh subscriptions). - reconcile new-entry branch: pin remote_insert_lsn/received_lsn to origin_lsn so the insert >= commit invariant holds immediately. - reconcile anomaly branch: WARNING now carries all discarded file values for postmortem recovery. - Export spock_init_progress_fields (renamed from static init_progress_fields); reconcile's new-entry block calls it instead of inlining a field-reset block that had to stay in lockstep. - Document protocol-version behavior of request_initial_status_update: 'r'/'k' is the libpqwalreceiver wire protocol (works on all spock proto versions); 'k' advances received_lsn on v4 and v5+; remote_insert_lsn refresh paths differ (v5+ from 'w' header, v4 from COMMIT payload). - 023_forced_keepalive_timing.pl: remove resource.dat between stop and start (otherwise the test passes via reseed); assert received_lsn (the field 'k' actually populates) instead of remote_insert_lsn. - 027_remote_commit_ts_recovery.pl: gate assertion on (commit_lsn >= pre_crash_lsn AND ts IS NOT NULL) to prove both reconcile and recovery scan ran (otherwise stale resource.dat reseed would mask). Spec updates: - Enumerate all three resource.dat dump sites in the design overview. - Document recovery-scan limitations (origin-id filter is not subscription-unique; sibling subs or drop/recreate can contribute unrelated commits). - Soften overconfident "switch to commit-LSN" references; reword "Required for parallel-apply ordering" to "intended for the planned parallel-apply rework."
334f1cf to
71fb10a
Compare
Replaces Spock's WAL-RMGR + checkpoint-hook persistence for
spock.progresswith a file snapshot + replication-origin reconcile +pg_commit_tsrecovery scan. Removes the per-commitXLogFlushfrom the apply hot path and the dependence on a 4-line Postgres core patch.With the changes in this PR, the ~3.3x slowdown in
mainin the apply worker performance is reversed.What's removed
SPOCK_RMGR_APPLY_PROGRESSrecords.spock_checkpoint_hook()and its registration. The Postgres core patch that exposesCheckpoint_hookcan now be reverted (separate PR).XLogFlush()inhandle_commit.What's added
spock_init_progress_state(XLogRecPtr)— public entry point in a newsrc/spock_progress_recovery.cmodule, called once fromspock_apply_mainbetweenreplorigin_session_setupandspock_start_replication. Internally:resource.dat's entry is stale or absent, advanceremote_commit_lsnfrom the origin, clear stale timestamps, preserve theinsert_lsn ≥ commit_lsninvariant.pg_commit_ts(only when reconcile detected staleness): walk backward from the latest xid, filtered on the publisher's spock node id, repopulatingremote_commit_ts. Bounded termination at 1000 commits seen for the origin or 1M xids total. Restores accurate values inspock.progresspost-crash; the recoveredprev_remote_tsis also intended to be useful for the planned parallel-apply rework.request_initial_status_update(): forced 'r' message at apply-worker reconnect, prompting the publisher to send a 'k' immediately soremote_insert_lsnpopulates within milliseconds instead of waiting forwal_sender_timeout/2.SpockGroupHashentry at eachresource.datdump event, each carrying that entry's fullSpockApplyProgresssnapshot (key + LSNs + timestamps). Apg_waldumptrace shows the exact progress snapshot persisted at each event LSN — useful for incident reconstruction over time.SHUTDOWNandADD_NODEwalk the full hash (mesh-wide effect).TABLE_SYNCemits only the changed entries (per-subscription effect — the helper takes the same listupdate_listworks on).Performance
synchronous_commit=off(Spock's default for the apply worker, designed to let walwriter batch flushes), test026_no_double_wal_flush.plmeasures the delta as ~2 fsyncs over 30 applied commits, vs. at least 30 before.remote_insert_lsnpopulates within ~ms, asserted by023_forced_keepalive_timing.plwithwal_sender_timeout=10min.Bug found and fixed during this work
progress_update_structenforcesremote_insert_lsn >= remote_commit_lsn. Reconcile now bumpsremote_insert_lsnandreceived_lsnto at leastorigin_lsnwhenever it advancesremote_commit_lsn. Caught by the new test 022.Approximation, post-crash only
On a clean restart the file's actual
last_updated_tsis preserved exactly. Only on the post-crash path, where reconcile clears stale timestamps and thepg_commit_tsscan repopulates them, is an approximation needed: Postgres doesn't store the local apply time whenreplorigin_session_originis set during apply (only the publisher's commit_ts ends up inpg_commit_ts). For those entries we setlast_updated_tsto the recoveredremote_commit_ts— a lower bound (apply happened at or after the publisher's commit) that under-reports lag by typical publisher→subscriber delay rather than hiding it as zero. Refreshes to the real value on the next applied commit.Test reliability fix
008_rmgr.plnow issues aCHECKPOINTbefore SIGKILL so the apply worker's batch-2 commits are durable past walwriter cadence. Without this the test was racy with a high fail rate; deterministic now.Deferred (not in this PR)
Checkpoint_hookpatch — landing it in the spock branch is the prerequisite, the core revert ships separately.add_node,drop_node, subscription state transitions, sync start/end. Sketched indocs/internals-doc/specs/2026-05-01-spock-audit-rmgr-design.md.Layout
src/spock_progress_recovery.c+include/spock_progress_recovery.h(single public functionspock_init_progress_state); reconcile + scan + their helpers all file-static.request_initial_status_updatestays insrc/spock_apply.c(wire-protocol kin tosend_feedback).SPOC-436