Skip to content

Use non-transactional message for sync_event to eliminate WAL send delay#403

Merged
ibrarahmad merged 1 commit intomainfrom
sync_event_lsn
Mar 31, 2026
Merged

Use non-transactional message for sync_event to eliminate WAL send delay#403
ibrarahmad merged 1 commit intomainfrom
sync_event_lsn

Conversation

@mason-sharp
Copy link
Copy Markdown
Member

sync_event() previously used a transactional LogLogicalMessage which required the reorder buffer to process a full BEGIN/COMMIT cycle before delivery to subscribers. This caused the sync event LSN to not appear in pg_replication_origin_status until another unrelated COMMIT advanced the WAL sender.

Switch to a non-transactional message with flush=true, which:

  • Bypasses the reorder buffer for immediate delivery
  • Explicitly flushes WAL to wake the walsender via WalSndWakeupProcessRequests
  • Uses (LogLogicalMessage)() to bypass the 4-arg compat macro and pass the 5th flush parameter directly on PG17+; calls XLogFlush on PG15/16

On the output plugin side, send the startup message before the sync event if it hasn't been sent yet. This is necessary because the startup message is normally sent during pg_decode_begin_txn, but a non-transactional message can arrive before any transaction is decoded on a newly-created subscription, causing a protocol version mismatch that crashes the apply worker.

On the apply side, accept non-transactional messages and explicitly call replorigin_session_advance() so pg_replication_origin_status reflects the sync position immediately. Diagnostic logging added to trace sync_event flow across provider, output plugin, and subscriber.

…AL send delay

sync_event() previously used a transactional LogLogicalMessage which
required the reorder buffer to process a full BEGIN/COMMIT cycle before
delivery to subscribers. This caused the sync event LSN to not appear
in pg_replication_origin_status until another unrelated COMMIT advanced
the WAL sender.

Switch to a non-transactional message with flush=true, which:
- Bypasses the reorder buffer for immediate delivery
- Explicitly flushes WAL to wake the walsender via WalSndWakeupProcessRequests
- Uses (LogLogicalMessage)() to bypass the 4-arg compat macro and pass
  the 5th flush parameter directly on PG17+; calls XLogFlush on PG15/16

On the output plugin side, send the startup message before the sync
event if it hasn't been sent yet. This is necessary because the startup
message is normally sent during pg_decode_begin_txn, but a
non-transactional message can arrive before any transaction is decoded
on a newly-created subscription, causing a protocol version mismatch
that crashes the apply worker.

On the apply side, accept non-transactional messages and explicitly
call replorigin_session_advance() so pg_replication_origin_status
reflects the sync position immediately. Diagnostic logging added to
trace sync_event flow across provider, output plugin, and subscriber.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 31, 2026

📝 Walkthrough

Walkthrough

The changes modify sync event message handling to support non-transactional operations. Validation rules are relaxed, sync markers are emitted non-transactionally with earlier WAL flush, and the output plugin treats transaction context as optional for non-transactional messages.

Changes

Cohort / File(s) Summary
Sync event emission
src/spock_functions.c
Modified spock_create_sync_event to emit sync markers as non-transactional logical replication messages with explicit WAL flush (version-specific: direct flag for PG 17+, XLogFlush() call for older versions). Added DEBUG1 logging of emitted sync event LSN.
Sync event receiving and processing
src/spock_apply.c, src/spock_output_plugin.c
Relaxed validation in handle_message() to allow non-transactional sync events; replication origin advancement now occurs immediately for non-transactional events. Decoder treats transaction context as optional and conditionally sends startup message before sync event. Added DEBUG1 logging throughout.

Poem

🐰 A sync event hops through the wire,
No longer bound by transaction's fire,
Replication origin advances with grace,
Non-transactional messages find their place!
~Cheerfully, CodeRabbit's fluffy friend

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: converting sync_event from transactional to non-transactional messaging to eliminate WAL send delay.
Description check ✅ Passed The description comprehensively explains the problem, solution, and implementation details across all three modified files, directly relating to the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sync_event_lsn

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mason-sharp mason-sharp requested a review from ibrarahmad March 31, 2026 03:08
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/spock_functions.c`:
- Around line 3296-3312: spock_create_sync_event currently emits a
non-transactional logical message which can be observed before surrounding
transactional changes are committed; add an explicit guard at the start of
spock_create_sync_event (immediately after the existing check_local_node() call)
that rejects being called inside an explicit transaction block or a
subtransaction (use the same check/ereport pattern used in
alter_subscription_disable() and alter_subscription_enable() to detect
IsInTransactionBlock()/IsInSubTransaction() and raise an ERROR with a clear
message), so the function can only be run in a top-level non-transactional
context.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2f914511-eb93-486c-9da0-e331b31bac28

📥 Commits

Reviewing files that changed from the base of the PR and between a116148 and 14f223d.

📒 Files selected for processing (3)
  • src/spock_apply.c
  • src/spock_functions.c
  • src/spock_output_plugin.c

Comment on lines +3296 to +3312
/*
* Use a non-transactional message with flush. This bypasses the reorder
* buffer so the walsender delivers it immediately, and the explicit WAL
* flush wakes the walsender via WalSndWakeupProcessRequests.
*
* PG17+ has a 5th flush parameter; on PG15/16 we call XLogFlush manually.
* Parentheses around the function name bypass the 4-arg compat macro on
* PG17/18 so we can pass the 5th parameter directly.
*/
#if PG_VERSION_NUM >= 170000
lsn = (LogLogicalMessage)(SPOCK_MESSAGE_PREFIX, (char *) &message,
sizeof(message), false, true);
#else
lsn = LogLogicalMessage(SPOCK_MESSAGE_PREFIX, (char *) &message,
sizeof(message), true);
sizeof(message), false);
XLogFlush(lsn);
#endif
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find and examine the spock_create_sync_event function around the flagged lines
fd -t f spock_functions.c | head -5

Repository: pgEdge/spock

Length of output: 77


🏁 Script executed:

# Get context around lines 3296-3312
head -3320 src/spock_functions.c | tail -100 | cat -n

Repository: pgEdge/spock

Length of output: 4001


🏁 Script executed:

# Search for IsTransactionBlock in the codebase and PostgreSQL
rg -i "IsTransactionBlock|IsSubTransaction" --max-count=20

Repository: pgEdge/spock

Length of output: 215


🏁 Script executed:

# Find callers of spock_create_sync_event to understand usage patterns
rg "spock_create_sync_event" -A 3 -B 3

Repository: pgEdge/spock

Length of output: 1492


🏁 Script executed:

# Check the function definition and any comments about barrier semantics
ast-grep --pattern $'Datum
spock_create_sync_event($_) {
  $$$
}'

Repository: pgEdge/spock

Length of output: 2102


🏁 Script executed:

# Find the exact lines with IsTransactionBlock usage
rg -n "IsTransactionBlock|IsSubTransaction" src/spock_functions.c

Repository: pgEdge/spock

Length of output: 179


🏁 Script executed:

# Get context around those lines to see the pattern
rg -B 10 -A 5 "IsTransactionBlock" src/spock_functions.c

Repository: pgEdge/spock

Length of output: 857


🏁 Script executed:

# Check if there's any documentation or design notes about sync_event in comments
rg -B 5 -A 15 "spock_create_sync_event|sync_event|synchronization event" src/spock_functions.c | head -80

Repository: pgEdge/spock

Length of output: 2697


Add transaction-block guard to spock_create_sync_event().

The function documents barrier semantics ("ensuring that all changes up to and including the event have been replicated"), but uses non-transactional LogLogicalMessage() which bypasses the reorder buffer and is decoded immediately. This creates a correctness hole: BEGIN; INSERT ...; SELECT spock_create_sync_event(); COMMIT; exposes the sync marker before the transactional changes are replayed on the subscriber, violating the barrier contract.

Add a guard at the start of the function (lines 3284–3290, after check_local_node()) to reject explicit transaction blocks and subtransactions, matching the pattern already used in alter_subscription_disable() and alter_subscription_enable() at lines 781 and 817:

Suggested guard
 Datum
 spock_create_sync_event(PG_FUNCTION_ARGS)
 {
 	SpockLocalNode *node;
 	SpockSyncEventMessage message;
 	XLogRecPtr	lsn;
 
 	node = check_local_node(true);
+	if (IsTransactionBlock() || IsSubTransaction())
+		ereport(ERROR,
+				(errcode(ERRCODE_ACTIVE_SQL_TRANSACTION),
+				 errmsg("spock_create_sync_event() cannot be run inside a transaction block")));
+
 	message.mtype = SPOCK_SYNC_EVENT_MSG;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_functions.c` around lines 3296 - 3312, spock_create_sync_event
currently emits a non-transactional logical message which can be observed before
surrounding transactional changes are committed; add an explicit guard at the
start of spock_create_sync_event (immediately after the existing
check_local_node() call) that rejects being called inside an explicit
transaction block or a subtransaction (use the same check/ereport pattern used
in alter_subscription_disable() and alter_subscription_enable() to detect
IsInTransactionBlock()/IsInSubTransaction() and raise an ERROR with a clear
message), so the function can only be run in a top-level non-transactional
context.

Copy link
Copy Markdown
Contributor

@ibrarahmad ibrarahmad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ibrarahmad ibrarahmad merged commit fdd9587 into main Mar 31, 2026
10 checks passed
@ibrarahmad ibrarahmad deleted the sync_event_lsn branch March 31, 2026 05:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants