TRANSDISCARD/SUB_DISABLE: switch from subtransaction replay to read-only exception row capture#400
TRANSDISCARD/SUB_DISABLE: switch from subtransaction replay to read-only exception row capture#400mason-sharp merged 9 commits intomainfrom
Conversation
…-run replay In TRANSDISCARD and SUB_DISABLE modes, the second (retry) pass replays the transaction as read-only. Each DML handler logs a row to spock.exception_log instead of attempting the operation. The record that originally caused the error carries the real error message; other records get an empty string. The failed record is identified by xact_action_counter (saved as failed_action in SpockExceptionLog), which is unique per DML within a transaction. In DISCARD mode, each DML still runs inside a subtransaction; only the failed ones are logged. Add dry_run_logging regression test covering TRANSDISCARD (absent table), DISCARD (truncated table), and SUB_DISABLE (deleted row), with INSERT_EXISTS conflict on drl_t1 to verify that dry-run modes leave spock.resolutions empty while DISCARD populates it.
📝 WalkthroughWalkthroughThis PR implements exception row capture functionality in Spock by introducing a Changes
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…producer Replace wait_slot_confirm_lsn with sync_event/wait_for_sync_event to eliminate replication race conditions when querying exception_log on the subscriber. Add a TRANSDISCARD sub-test that mixes DML with a replicate_ddl call inside a single failing transaction. This reproduces a duplicate DDL logging bug in handle_queued_ddl detected by CodeRabbit during review: the TRANSDISCARD/SUB_DISABLE branch logs via add_entry_to_exception_log, then falls through to the generic logging block which logs again. Remove test section numbering per project conventions.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/regress/sql/dry_run_logging.sql`:
- Around line 107-110: Update the test comment that currently reads "Expect
entries with duplicated DDL record" to reflect the intended behavior of no
duplicate DDL logging; modify the comment above the SELECT querying
spock.exception_log (the query using table_name, operation, (error_message <>
'') AS has_error, ddl_statement ordered by command_counter) to state that the
test expects NO duplicate DDL entries (i.e., a single DDL record per command).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: fc49989d-5676-4a35-890d-8275efe21d33
⛔ Files ignored due to path filters (1)
tests/regress/expected/dry_run_logging.outis excluded by!**/*.out
📒 Files selected for processing (2)
src/spock_apply.ctests/regress/sql/dry_run_logging.sql
…ISABLE The TRANSDISCARD/SUB_DISABLE branch logged via add_entry_to_exception_log then fell through to the generic logging block which called should_log_exception and logged again, producing duplicate DDL entries. Merge both call sites into a single add_entry_to_exception_log after the if/else branches. The TRANSDISCARD/SUB_DISABLE branch now only extracts the SQL text; actual logging is done in the shared block. Add Assert(my_exception_log_index >= 0) before accessing exception_log_ptr since use_try_block guarantees the index was set in handle_begin. Bug reported by CodeRabbit during review.
There was a problem hiding this comment.
🧹 Nitpick comments (3)
src/spock_apply.c (3)
1287-1308: Consider adding defensive assertion before accessingexception_log_ptr.The code at line 2302 includes
Assert(my_exception_log_index >= 0)before accessingexception_log_ptr, but the DML handlers (INSERT, UPDATE, DELETE) don't have this check. Whilemy_exception_log_indexshould be valid whenuse_try_blockis true (set duringhandle_begin), adding the assertion would provide consistent defensive validation.🛡️ Suggested defensive assertion
if (exception_behaviour == TRANSDISCARD || exception_behaviour == SUB_DISABLE) { + Assert(my_exception_log_index >= 0); + /* * TRANSDISCARD and SUB_DISABLE: skip the DML and log the🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/spock_apply.c` around lines 1287 - 1308, Add a defensive Assert that my_exception_log_index is valid before dereferencing exception_log_ptr in the DML handlers (the INSERT/UPDATE/DELETE branches where exception_log_ptr[...] is accessed and log_insert_exception(...) is called). Specifically, in the block using exception_log_ptr[my_exception_log_index] (and before setting exception_log_ptr[my_exception_log_index].local_tuple or reading .initial_error_message), insert Assert(my_exception_log_index >= 0) to mirror the check used in handle_begin/other code paths so the handlers (where use_try_block is expected) validate the index before use.
1299-1304: Optional: Extract repeated error_msg pattern to helper function.The same logic for computing
error_msgis repeated in four handlers (INSERT, UPDATE, DELETE, SQL):char *error_msg = (xact_action_counter == exception_log_ptr[my_exception_log_index].failed_action && exception_log_ptr[my_exception_log_index].initial_error_message[0]) ? exception_log_ptr[my_exception_log_index].initial_error_message : NULL;This could be extracted to a helper function to improve maintainability and ensure consistent behavior if the logic ever needs to change.
♻️ Suggested helper function
/* * Get the error message to log for the current action during dry-run replay. * Returns the initial error message only for the action that caused the * original failure; returns NULL for all other actions. */ static inline char * get_dry_run_error_msg(void) { Assert(my_exception_log_index >= 0); if (xact_action_counter == exception_log_ptr[my_exception_log_index].failed_action && exception_log_ptr[my_exception_log_index].initial_error_message[0] != '\0') return exception_log_ptr[my_exception_log_index].initial_error_message; return NULL; }Also applies to: 1477-1482, 1594-1599, 2307-2312
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/spock_apply.c` around lines 1299 - 1304, Extract the repeated error_msg computation into a small static helper (e.g. static inline char *get_dry_run_error_msg(void)) that reads my_exception_log_index, xact_action_counter and exception_log_ptr and returns exception_log_ptr[my_exception_log_index].initial_error_message when xact_action_counter == exception_log_ptr[my_exception_log_index].failed_action and the initial_error_message is non-empty, otherwise returns NULL; replace the duplicated ternary in the four handlers (the blocks around the places that set error_msg in INSERT/UPDATE/DELETE/SQL handlers) with a call to get_dry_run_error_msg() and keep the Assert(my_exception_log_index >= 0) in the helper.
871-878: Consider clearinginitial_error_messagefor consistency.Lines 840-843 (TRANSDISCARD path) clear
initial_error_message[0] = '\0', but lines 871-876 (SUB_DISABLE path) do not. While this is harmless since the worker exits afterspock_disable_subscription(), adding the clear would maintain consistency between the two paths.♻️ Suggested change for consistency
exception_log = &exception_log_ptr[my_exception_log_index]; exception_log->commit_lsn = InvalidXLogRecPtr; + exception_log->initial_error_message[0] = '\0'; MySpockWorker->restart_delay = 0;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/spock_apply.c` around lines 871 - 878, In the SUB_DISABLE path add the same clearing of the saved initial error string as done in the TRANSDISCARD path: after assigning exception_log = &exception_log_ptr[my_exception_log_index] and before calling spock_disable_subscription()/elog(ERROR...), set exception_log->initial_error_message[0] = '\0' so the SpockExceptionLog (exception_log) is consistent with the TRANSDISCARD handling; keep the rest (commit_lsn, MySpockWorker->restart_delay, spock_disable_subscription(), elog) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/spock_apply.c`:
- Around line 1287-1308: Add a defensive Assert that my_exception_log_index is
valid before dereferencing exception_log_ptr in the DML handlers (the
INSERT/UPDATE/DELETE branches where exception_log_ptr[...] is accessed and
log_insert_exception(...) is called). Specifically, in the block using
exception_log_ptr[my_exception_log_index] (and before setting
exception_log_ptr[my_exception_log_index].local_tuple or reading
.initial_error_message), insert Assert(my_exception_log_index >= 0) to mirror
the check used in handle_begin/other code paths so the handlers (where
use_try_block is expected) validate the index before use.
- Around line 1299-1304: Extract the repeated error_msg computation into a small
static helper (e.g. static inline char *get_dry_run_error_msg(void)) that reads
my_exception_log_index, xact_action_counter and exception_log_ptr and returns
exception_log_ptr[my_exception_log_index].initial_error_message when
xact_action_counter == exception_log_ptr[my_exception_log_index].failed_action
and the initial_error_message is non-empty, otherwise returns NULL; replace the
duplicated ternary in the four handlers (the blocks around the places that set
error_msg in INSERT/UPDATE/DELETE/SQL handlers) with a call to
get_dry_run_error_msg() and keep the Assert(my_exception_log_index >= 0) in the
helper.
- Around line 871-878: In the SUB_DISABLE path add the same clearing of the
saved initial error string as done in the TRANSDISCARD path: after assigning
exception_log = &exception_log_ptr[my_exception_log_index] and before calling
spock_disable_subscription()/elog(ERROR...), set
exception_log->initial_error_message[0] = '\0' so the SpockExceptionLog
(exception_log) is consistent with the TRANSDISCARD handling; keep the rest
(commit_lsn, MySpockWorker->restart_delay, spock_disable_subscription(), elog)
unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 031139ab-7009-4644-ab84-545c7b0c6bb2
⛔ Files ignored due to path filters (1)
tests/regress/expected/dry_run_logging.outis excluded by!**/*.out
📒 Files selected for processing (2)
src/spock_apply.ctests/regress/sql/dry_run_logging.sql
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/regress/sql/dry_run_logging.sql
Also, harden the PG_CATCH block: explicitly switch to ApplyOperationContext before CopyErrorData so edata lands in a known context, call FlushErrorState before RollbackAndReleaseCurrentSubTransaction to clear the error stack, and move ReleaseCurrentSubTransaction into PG_TRY so errors during release are caught.
mason-sharp
left a comment
There was a problem hiding this comment.
I like this, it is much better than what we were doing before. I will revisit after the read-only PR.
RollbackAndReleaseCurrentSubTransaction() switches CurrentMemoryContext to the parent transaction context via AtSubCleanup_Memory(). The PG_CATCH blocks in handle_update, handle_delete, and handle_sql_or_exception called CopyErrorData() after the rollback, allocating the error copy in the wrong context. They also did not call FlushErrorState(). Standardize all four DML/DDL exception handlers (INSERT, UPDATE, DELETE, SQL) to follow the same sequence: 1. MemoryContextSwitchTo(ApplyOperationContext) 2. CopyErrorData() 3. FlushErrorState() 4. RollbackAndReleaseCurrentSubTransaction() 5. MemoryContextSwitchTo(ApplyOperationContext) -- restore after rollback The post-rollback context restore ensures the Assert at the end of handle_insert (CurrentMemoryContext == ApplyOperationContext) passes on the error path. Also change exception_log error_message from empty string to "unavailable" when the message is NULL, to avoid user confusion between empty strings and NULL values when querying.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/spock_apply.c (1)
546-546:⚠️ Potential issue | 🟠 MajorPotential off-by-one array access in exception_log slot search.
The loop conditions at lines 546 and 584 use
i <= SpockCtx->total_workers, iterating up to and including indextotal_workers. However,exception_log_shmem_size()allocatessizeof(SpockExceptionLog) * nworkerswherenworkers = SpockCtx->total_workers, meaning valid indices are0tototal_workers - 1.If a match is found at
i == total_workers,my_exception_log_indexwill be set to an out-of-bounds index, causing memory corruption when accessed later (lines 668, 741, 820, 873, 881, 889, and throughout DML handlers).🐛 Proposed fix
Change both loops from
<=to<:- for (int i = 0; i <= SpockCtx->total_workers; i++) + for (int i = 0; i < SpockCtx->total_workers; i++)Apply at line 546 and line 584.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/spock_apply.c` at line 546, The loops that search the exception_log array iterate with "for (int i = 0; i <= SpockCtx->total_workers; i++)" and can index past the allocated range because exception_log_shmem_size() allocates SpockCtx->total_workers entries; change those loop conditions to "i < SpockCtx->total_workers" so iteration is 0..total_workers-1. Update both occurrences that set/use my_exception_log_index when scanning SpockCtx->exception_log (and any similar searches) to use "<" instead of "<=" to avoid out-of-bounds access and memory corruption.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/spock_apply.c`:
- Line 546: The loops that search the exception_log array iterate with "for (int
i = 0; i <= SpockCtx->total_workers; i++)" and can index past the allocated
range because exception_log_shmem_size() allocates SpockCtx->total_workers
entries; change those loop conditions to "i < SpockCtx->total_workers" so
iteration is 0..total_workers-1. Update both occurrences that set/use
my_exception_log_index when scanning SpockCtx->exception_log (and any similar
searches) to use "<" instead of "<=" to avoid out-of-bounds access and memory
corruption.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ab67b0c4-411b-464f-be0d-ff348173ab60
⛔ Files ignored due to path filters (2)
tests/regress/expected/dry_run_logging.outis excluded by!**/*.outtests/regress/expected/replication_set.outis excluded by!**/*.out
📒 Files selected for processing (2)
src/spock_apply.csrc/spock_exception_handler.c
✅ Files skipped from review due to trivial changes (1)
- src/spock_exception_handler.c
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/spock_apply.c (1)
1197-1221:⚠️ Potential issue | 🟠 Major
failed_actioncan still snapshot the wrong operation metadata.This field only works if
xact_action_counteris advanced before anything that can fail.handle_insert()still reachesspock_read_insert()before bumping the counter, andhandle_sql_or_exception()never bumps it at all. A missing-tableINSERTor a transaction with multiple queued SQL/DDL messages can therefore record the previous action number, so replay may attach the initial error to the wrongexception_logrow(s).Also applies to: 2304-2315, 3297-3303
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/spock_apply.c`:
- Around line 2307-2313: The code currently uses JsonbToCString on
queued_message->message->root which produces JSON-formatted text; replace this
with the same extraction used in handle_sql(): validate the JSONB root with
JB_ROOT_IS_SCALAR, extract the scalar text value and duplicate it with pnstrdup
(as handle_sql() does around JB_ROOT_IS_SCALAR and pnstrdup), and assign that
raw SQL string to sql before incrementing exception_command_counter so logged
SQL is the original statement text rather than JSON.
- Around line 1524-1545: The handlers handle_update, handle_delete, and
handle_sql_or_exception copy ErrorData into ApplyOperationContext during
exception handling but never free that context; add a call to
MemoryContextReset(ApplyOperationContext) at the end of each of these functions
(just before their closing brace) mirroring the pattern used in handle_insert to
release allocations from CopyErrorData and related structures so they don't
accumulate for the worker lifetime; locate the exception code paths that call
CopyErrorData() and ensure MemoryContextReset(ApplyOperationContext) is executed
unconditionally at function exit.
- Around line 446-453: handle_truncate() must mirror the skip-and-log behavior
used by handle_insert/handle_update/handle_delete/handle_sql for TRANSDISCARD
and SUB_DISABLE modes: detect if exception_behaviour is TRANSDISCARD or
SUB_DISABLE (same condition used in begin_replication_step() that sets
transaction_read_only) and instead of calling ExecuteTruncateGuts(), log the
truncated relation(s) as skipped and return without executing; ensure the log
message clearly identifies the TRUNCATE and the mode (TRANSDISCARD/SUB_DISABLE)
and avoid side effects by not invoking ExecuteTruncateGuts() when in those
modes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7be7ea71-025a-41c9-9b88-8e95255e532c
⛔ Files ignored due to path filters (2)
tests/regress/expected/dry_run_logging.outis excluded by!**/*.outtests/regress/expected/replication_set.outis excluded by!**/*.out
📒 Files selected for processing (2)
src/spock_apply.csrc/spock_exception_handler.c
✅ Files skipped from review due to trivial changes (1)
- src/spock_exception_handler.c
Add the same skip-and-log guard used by the DML handlers so that TRUNCATE is recorded as discarded and the table contents are preserved.
…ql_or_exception In DISCARD mode, these handlers call CopyErrorData() into ApplyOperationContext inside their PG_CATCH blocks but never reset that context afterward. This leaks ErrorData across every failed operation within a transaction replay. Add MemoryContextReset(ApplyOperationContext) at function exit, matching the existing pattern in handle_insert.
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Duplication | 5 |
TIP This summary will be updated as you push new changes. Give us feedback
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/spock_apply.c (2)
3319-3340:⚠️ Potential issue | 🔴 CriticalGuard
my_exception_log_indexbefore writing retry metadata.
exception_log_ptr != NULLis not a sufficient guard here. If anything throws beforehandle_begin()finds or creates the worker slot,my_exception_log_indexis still-1, and Lines 3321-3340 write before the start of the shared-memory array.🛡️ Minimal guard
- if (exception_log_ptr != NULL) + if (exception_log_ptr != NULL && my_exception_log_index >= 0) { ... }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/spock_apply.c` around lines 3319 - 3340, The writes to exception_log_ptr[...] must only occur when my_exception_log_index is valid; add a guard checking that my_exception_log_index >= 0 (and exception_log_ptr != NULL) before writing initial_error_message, initial_operation, and failed_action, or defer these writes until handle_begin() has been called and a slot allocated; specifically update the block around my_exception_log_index, exception_log_ptr, and the use of errcallback_arg.action_name in src/spock_apply.c so you only access exception_log_ptr[my_exception_log_index] when my_exception_log_index is non-negative (or ensure handle_begin() sets it first).
1197-1221:⚠️ Potential issue | 🟠 MajorSet INSERT retry metadata before relation lookup.
spock_read_insert()is still the first failure point here, buterrcallback_arg.action_nameandxact_action_counterare only assigned afterwards. A missing-table INSERT will therefore capture the previous action ininitial_operation/failed_action, and the retry path falls back to the generic relation error instead of marking the triggering INSERT as the failed row.🛠️ Suggested direction
- rel = spock_read_insert(s, RowExclusiveLock, &newtup); + errcallback_arg.action_name = "INSERT"; + xact_action_counter++; + rel = spock_read_insert(s, RowExclusiveLock, &newtup); if (unlikely(rel == NULL)) { ... - log_insert_exception(true, "Spock can't find relation", NULL, + log_insert_exception(true, + (xact_action_counter == + exception_log_ptr[my_exception_log_index].failed_action && + exception_log_ptr[my_exception_log_index].initial_error_message[0] != '\0') ? + exception_log_ptr[my_exception_log_index].initial_error_message : + "Spock can't find relation", + NULL, NULL, NULL, "INSERT"); ... } - errcallback_arg.action_name = "INSERT"; errcallback_arg.rel = rel; - xact_action_counter++;Also applies to: 1225-1227
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/spock_apply.c` around lines 1197 - 1221, Before calling spock_read_insert move the assignment of retry metadata so the failing INSERT is recorded: set errcallback_arg.action_name to "INSERT" (or the appropriate action string) and bump xact_action_counter (and assign initial_operation/failed_action as your retry bookkeeping expects) prior to invoking spock_read_insert; this ensures a missing-table INSERT records the correct failed_action and errcallback data used by the retry path instead of inheriting a previous action. Apply the same change to the analogous block referenced at the other occurrence (lines ~1225-1227) so UPDATE/other operations also set their action metadata before doing relation lookups.
♻️ Duplicate comments (1)
src/spock_apply.c (1)
2335-2347:⚠️ Potential issue | 🟠 MajorLog the raw queued statement here, not its JSON wrapper.
JsonbToCString()stores a quoted/escaped JSON rendering, not the same plain SQL text thathandle_sql()executes, and this path no longer enforces theJB_ROOT_IS_SCALARvalidation from the execution path. Dry-runddl_statementvalues will diverge from normal logging.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/spock_apply.c` around lines 2335 - 2347, The code is logging a JSON-quoted SQL string via JsonbToCString for the TRANSDISCARD/SUB_DISABLE branch, which diverges from the execution path; instead, validate that the queued_message->message->root is a scalar (use JB_ROOT_IS_SCALAR) and extract the raw SQL text the same way handle_sql() does (reuse its extraction helper or replicate its logic to pull the scalar string without JSON quoting) before incrementing exception_command_counter so dry-run logs match executed SQL.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/spock_apply.c`:
- Around line 3319-3340: The writes to exception_log_ptr[...] must only occur
when my_exception_log_index is valid; add a guard checking that
my_exception_log_index >= 0 (and exception_log_ptr != NULL) before writing
initial_error_message, initial_operation, and failed_action, or defer these
writes until handle_begin() has been called and a slot allocated; specifically
update the block around my_exception_log_index, exception_log_ptr, and the use
of errcallback_arg.action_name in src/spock_apply.c so you only access
exception_log_ptr[my_exception_log_index] when my_exception_log_index is
non-negative (or ensure handle_begin() sets it first).
- Around line 1197-1221: Before calling spock_read_insert move the assignment of
retry metadata so the failing INSERT is recorded: set
errcallback_arg.action_name to "INSERT" (or the appropriate action string) and
bump xact_action_counter (and assign initial_operation/failed_action as your
retry bookkeeping expects) prior to invoking spock_read_insert; this ensures a
missing-table INSERT records the correct failed_action and errcallback data used
by the retry path instead of inheriting a previous action. Apply the same change
to the analogous block referenced at the other occurrence (lines ~1225-1227) so
UPDATE/other operations also set their action metadata before doing relation
lookups.
---
Duplicate comments:
In `@src/spock_apply.c`:
- Around line 2335-2347: The code is logging a JSON-quoted SQL string via
JsonbToCString for the TRANSDISCARD/SUB_DISABLE branch, which diverges from the
execution path; instead, validate that the queued_message->message->root is a
scalar (use JB_ROOT_IS_SCALAR) and extract the raw SQL text the same way
handle_sql() does (reuse its extraction helper or replicate its logic to pull
the scalar string without JSON quoting) before incrementing
exception_command_counter so dry-run logs match executed SQL.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 63f85652-7c76-4585-964c-e121cf5bf7ac
⛔ Files ignored due to path filters (2)
tests/regress/expected/dry_run_logging.outis excluded by!**/*.outtests/regress/expected/replication_set.outis excluded by!**/*.out
📒 Files selected for processing (2)
src/spock_apply.csrc/spock_exception_handler.c
🚧 Files skipped from review as they are similar to previous changes (1)
- src/spock_exception_handler.c
| -- Common scenario: three tables on provider, one broken on subscriber. | ||
| -- A single transaction with DMLs on all three tables triggers an error | ||
| -- on the broken table. The first DML (INSERT into drl_t1) also creates | ||
| -- a conflict (INSERT_EXISTS) to verify that dry-run modes do not log |
There was a problem hiding this comment.
Somewhat nit picky, but I think we should use a different term since dry-run implies "what would happen". How about we use "log-only replay"?
After changing throughout and the read-only PR gets in, we can get this merged in.
There was a problem hiding this comment.
@mason-sharp . It is up to you as a native speaker. I've never seen 'log-only replay' and used the 'dry-run mode' as a standard term in the industry. Feel free to commit your change at the top of the branch.
The TRANSDISCARD and SUB_DISABLE modes step through each operation in a failed transaction without executing DML, capturing every row to exception_log. "Exception row capture" describes this behavior more precisely than "dry-run", which implies a trial execution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tests/regress/sql/exception_row_capture.sql (2)
116-118: Missing sync after DDL cleanup could cause test ordering issues.The
DROP TABLE IF EXISTS public.drl_dummyDDL at line 118 has no sync before the reset section begins. While the subsequent reset block (lines 124-131) performs its own DDL that will serialize properly, adding a sync here would make the test more robust against potential timing issues.♻️ Optional: Add sync after cleanup DDL
-- Cleanup the dummy table \c :provider_dsn SELECT spock.replicate_ddl('DROP TABLE IF EXISTS public.drl_dummy'); +SELECT spock.sync_event() AS sync_lsn \gset +\c :subscriber_dsn +CALL spock.wait_for_sync_event(NULL, 'test_provider', :'sync_lsn', 30);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/regress/sql/exception_row_capture.sql` around lines 116 - 118, After issuing the cleanup DDL via spock.replicate_ddl('DROP TABLE IF EXISTS public.drl_dummy'), add an explicit replication sync call immediately after that statement so the DROP is fully applied before the test reset block starts; locate the spock.replicate_ddl call in the file and follow it with the test suite's existing sync mechanism (e.g., the same sync helper used elsewhere in these tests) to serialize the DDL and avoid ordering/timing issues.
78-81: Consider usingerror_message = ''for clearer intent or explicit value checking.The condition
error_message <> ''relies on non-error records storing an empty string. Per the code,NULLerror messages become'unavailable', which would make this check returntruefor both actual errors and the fallback case.If the implementation passes empty string
''for non-error records (as stated in the PR description), this works correctly. However, for test robustness, consider checking for the actual expected values or adding a comment clarifying the expectederror_messagevalues for each row.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/regress/sql/exception_row_capture.sql` around lines 78 - 81, The test's boolean expression (error_message <> '') is ambiguous given the code may set NULLs to 'unavailable'; update the SELECT against spock.exception_log to explicitly check the expected values—either replace (error_message <> '') AS has_error with a clear equality (error_message = '') AS has_no_error or, better, use an explicit membership check like (error_message NOT IN ('', 'unavailable')) AS has_error to only flag real errors; also add a brief inline comment explaining which error_message sentinel values the test expects.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/regress/sql/exception_row_capture.sql`:
- Around line 116-118: After issuing the cleanup DDL via
spock.replicate_ddl('DROP TABLE IF EXISTS public.drl_dummy'), add an explicit
replication sync call immediately after that statement so the DROP is fully
applied before the test reset block starts; locate the spock.replicate_ddl call
in the file and follow it with the test suite's existing sync mechanism (e.g.,
the same sync helper used elsewhere in these tests) to serialize the DDL and
avoid ordering/timing issues.
- Around line 78-81: The test's boolean expression (error_message <> '') is
ambiguous given the code may set NULLs to 'unavailable'; update the SELECT
against spock.exception_log to explicitly check the expected values—either
replace (error_message <> '') AS has_error with a clear equality (error_message
= '') AS has_no_error or, better, use an explicit membership check like
(error_message NOT IN ('', 'unavailable')) AS has_error to only flag real
errors; also add a brief inline comment explaining which error_message sentinel
values the test expects.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f80b64f2-08ae-4a57-bae7-b4edbf23a1f0
⛔ Files ignored due to path filters (2)
tests/regress/expected/exception_row_capture.outis excluded by!**/*.outtests/regress/expected/replication_set.outis excluded by!**/*.out
📒 Files selected for processing (4)
Makefilesrc/spock_apply.csrc/spock_exception_handler.ctests/regress/sql/exception_row_capture.sql
✅ Files skipped from review due to trivial changes (2)
- src/spock_exception_handler.c
- Makefile
🚧 Files skipped from review as they are similar to previous changes (1)
- src/spock_apply.c
spock.exception_loginstead of attempting the operation. Only the record that originally caused the error carries the real error message; other records get an empty string.xact_action_counter(saved asfailed_actioninSpockExceptionLog), which is unique per DML within a transaction.exception_row_captureregression test covering all three exception modes with different breakage methods (absent table, truncated table, deleted row) and an INSERT_EXISTS conflict to verify that dry-run modes leavespock.resolutionsempty while DISCARD populates it.