-
Notifications
You must be signed in to change notification settings - Fork 45
Spoc 388: Z0DAN should quickly fail instead of waiting timeout (plpgsql only) #330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughAdded a required Changes
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@samples/Z0DAN/zodan.sql`:
- Around line 1844-1895: Update the DROP PROCEDURE statement for
spock.present_final_cluster_state to match the new signature that includes
src_dsn as the first parameter: change the old DROP PROCEDURE IF EXISTS
spock.present_final_cluster_state(integer, boolean) to DROP PROCEDURE IF EXISTS
spock.present_final_cluster_state(text, integer, boolean) so the cleanup matches
the revised procedure signature.
In `@tests/tap/t/016_zodan_3n_timeout.pl`:
- Around line 43-78: Capture and assert the add_node call actually failed
instead of only checking elapsed time: wrap the CALL spock.add_node invocation
(where scalar_query is used) so you record any error/exception (e.g. via eval {
scalar_query(...) } and check $@ or by using a scalar_query variant that returns
status), then add an assertion that the call failed (e.g. ok(!$result || $@,
"...add_node failed as expected") and optionally assert the error message
contains "spock.sync_event" or "function ... does not exist"); keep the existing
timing check (ok($elapsed_time < 600,...)). Apply the same pattern to the other
add_node tests referenced (the blocks calling scalar_query for spock.add_node at
the other locations) and leave the cleanup calls (CALL spock.remove_node)
unchanged.
- Around line 80-117: The timer currently starts before the successful add_node
for n2 so $elapsed_time includes that operation; move the start_time = time()
assignment to immediately before the failing CALL spock.add_node for n3 (i.e.
right before the scalar_query(3, qq{ CALL spock.add_node ... }) invocation) so
the measured duration only covers the failing attempt, then compute
$elapsed_time and assert it is < 600 as before; remove or relocate the earlier
$start_time assignment placed just after renaming pg_replication_slot_advance to
avoid double-timing.
|
unfortunately needs a file rename and rebase |
Procedure present_final_cluster_state should execute COMMIT command to let newly created subscriptions from working to the new node update their state. Also, final cluster state should check all subscriptions around the cluster, not only the new node.
- Change RAISE NOTICE to RAISE EXCEPTION in error handlers to properly propagate errors instead of silently continuing - Move sub_name assignment to the correct location inside the subscription creation block where it is actually used - Add new test 016_zodan_2n_timeout for 2-node timeout scenario Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Change RAISE NOTICE to RAISE EXCEPTION in zodan.sql error handlers to ensure add_node fails immediately rather than continuing silently and eventually timing out. Affected operations: - Replication lag monitoring - Creating replication slots on other nodes - Enabling subscriptions - Creating subscriptions on other nodes - Triggering and waiting for sync events - Checking commit LSN for slot advancement - Advancing replication slots - Triggering sync_event on source node Also remove unreachable code (CONTINUE/RAISE statements after RAISE EXCEPTION). Add comprehensive 3-node cluster test that verifies quick failure by: - spock.sync_event() is missing - pg_replication_slot_advance() is missing - spock.sub_create() is missing Test also verifies successful cluster formation with 3 nodes and 2 enabled subscriptions per node. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
scalar_query() uses backticks which don't throw Perl exceptions on error - they just return empty output and set $? with the exit code. Now properly check psql exit code ($? >> 8) after each add_node call that is expected to fail, and assert non-zero exit code to verify the operation actually failed. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
samples/Z0DAN/zodan.sql (2)
300-311:⚠️ Potential issue | 🔴 Critical
create_subsilently swallows exceptions whenverb=false— fails to propagate errors.The exception handler at lines 303–309 only raises when
verbis true. Whenverb=false(which is how the test and likely production calls use it), a failure inspock.sub_createvia dblink is silently caught and the procedure returns as if it succeeded. This directly undermines the PR's goal of fast failure: errors from subscription creation on remote nodes won't propagate, and the caller (e.g.,create_sub_on_new_node_to_src_node) will continue as though the subscription was created successfully.Suggested fix — always re-raise
EXCEPTION WHEN OTHERS THEN - IF verb THEN - RAISE EXCEPTION E' - [STEP 3] Subscription "%" creation failed on remote node! Error: % - ', subscription_name, SQLERRM; - END IF; + RAISE EXCEPTION E' + [STEP 3] Subscription "%" creation failed on remote node! Error: % + ', subscription_name, SQLERRM; END;
547-568:⚠️ Potential issue | 🔴 CriticalSame verb-gated exception issue in
create_node.Both the NULL-joinid check (line 556) and the
WHEN OTHERShandler (line 563) only raise exceptions whenverb=true. Whenverb=false, node creation failures are silently ignored.Suggested fix — always re-raise
IF joinid IS NOT NULL THEN IF verb THEN RAISE NOTICE 'Node % created remotely with id % and DSN: %', node_name, joinid, dsn; END IF; ELSE - IF verb THEN - RAISE EXCEPTION E' - [STEP 3] Node "%" creation failed remotely! - ', node_name; - END IF; + RAISE EXCEPTION E' + [STEP 3] Node "%" creation failed remotely! + ', node_name; END IF; EXCEPTION WHEN OTHERS THEN - IF verb THEN - RAISE EXCEPTION E' - [STEP 3] Node "%" creation failed remotely! Error: % - ', node_name, SQLERRM; - END IF; + RAISE EXCEPTION E' + [STEP 3] Node "%" creation failed remotely! Error: % + ', node_name, SQLERRM; END;
🤖 Fix all issues with AI agents
In `@samples/Z0DAN/zodan.sql`:
- Around line 1284-1288: The EXCEPTION block raises an exception then has an
unreachable CONTINUE; to allow the loop to continue on error, replace the RAISE
EXCEPTION with a non-fatal logger (e.g., RAISE WARNING or RAISE NOTICE) that
includes SQLERRM and rec.node_name, or alternatively remove the CONTINUE if you
intend to abort; update the EXCEPTION handler around the block that references
SQLERRM and rec.node_name so the error is logged with the same message but
without terminating execution.
- Around line 1141-1145: The EXCEPTION block contains an unreachable "RAISE;"
after a terminating "RAISE EXCEPTION" call; remove the redundant "RAISE;"
statement so the handler only logs/raises once. Locate the EXCEPTION section
that includes the RAISE EXCEPTION ' ✗ %', rpad('Creating new node ' ||
new_node_name || ' (error: ' || SQLERRM || ')', 120, ' ') and delete the
subsequent standalone "RAISE;" line (inside the same block) to eliminate dead
code.
In `@tests/tap/t/017_zodan_3n_timeout.pl`:
- Around line 75-80: The remove_node calls are building DSN strings that include
user=$db_user but omit password=$db_password, so scalar_query("CALL
spock.remove_node(...)") will fail under non-trust auth; update each remove_node
invocation (the scalar_query calls invoking spock.remove_node in this file) to
include password=$db_password in the target_node_dsn argument (matching the
add_node DSN format used elsewhere) so the cleanup executes with proper
credentials; ensure all occurrences (the remove_node calls near the scalar_query
invocations) are updated consistently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
samples/Z0DAN/zodan.sql (2)
300-311:⚠️ Potential issue | 🔴 Critical
create_subsilently swallows errors whenverbis false — breaks the "fail fast" objective.The
EXCEPTIONhandler at lines 304-309 only raises whenverbis true. Whenverbis false (as in the TAP test), adblinkfailure insub_createis silently discarded andcreate_subreturns normally. This means the outer caller (create_sub_on_new_node_to_src_node, etc.) never sees the error, andadd_nodecontinues instead of aborting.This directly undermines the PR's goal and will cause the third TAP test scenario (renamed
sub_createon n1) to silently succeed (exit 0) instead of failing fast.Proposed fix — always propagate the error
BEGIN SELECT * FROM dblink(node_dsn, remotesql) AS t(sid oid) INTO sid; EXCEPTION WHEN OTHERS THEN - IF verb THEN - RAISE EXCEPTION E' - [STEP 3] Subscription "%" creation failed on remote node! Error: % - ', subscription_name, SQLERRM; - END IF; + RAISE EXCEPTION E' + [STEP 3] Subscription "%" creation failed on remote node! Error: % + ', subscription_name, SQLERRM; END;
547-568:⚠️ Potential issue | 🟠 Major
create_nodehas the same conditional-raise pattern — errors swallowed whenverbis false.Both the
joinid IS NULLbranch (line 556) and theWHEN OTHERShandler (line 563) guardRAISE EXCEPTIONwithIF verb. Whenverbis false, a remote node-creation failure is silently ignored and the procedure returns normally, allowingadd_nodeto proceed on a phantom node.Proposed fix — raise unconditionally
IF joinid IS NOT NULL THEN IF verb THEN RAISE NOTICE 'Node % created remotely with id % and DSN: %', node_name, joinid, dsn; END IF; ELSE - IF verb THEN - RAISE EXCEPTION E' - [STEP 3] Node "%" creation failed remotely! - ', node_name; - END IF; + RAISE EXCEPTION E' + [STEP 3] Node "%" creation failed remotely! + ', node_name; END IF; EXCEPTION WHEN OTHERS THEN - IF verb THEN - RAISE EXCEPTION E' - [STEP 3] Node "%" creation failed remotely! Error: % - ', node_name, SQLERRM; - END IF; + RAISE EXCEPTION E' + [STEP 3] Node "%" creation failed remotely! Error: % + ', node_name, SQLERRM; END;
🤖 Fix all issues with AI agents
In `@tests/tap/t/017_zodan_3n_timeout.pl`:
- Around line 124-143: create_sub currently swallows exceptions from the dblink
call to spock.sub_create when the parameter verb is false, causing add_node to
succeed silently; update the exception handler inside the create_sub function
(in zodan.sql) to re-raise the caught exception (including its message and
detail) rather than suppressing it so that failures in spock.sub_create
propagate back to add_node and cause a non-zero exit; ensure the handler
references the original exception (e.g., using RAISE with SQLERRM or the stored
exception variable) so tests asserting add_node failure (calling add_node with
verb := false) now observe a failure.
ibrarahmad
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to update the Python script.
| RAISE NOTICE ' ✓ Total nodes in cluster: %', node_count; | ||
| END IF; | ||
|
|
||
| -- Validate that we're not trying to remove the last node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have we removed that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed earlier, there is a problem: if add_node fails to add n3 in earlier steps, the remove script thinks that n3 is the last node (not enough data to detect other nodes). So, we need something like that (or kinda 'forced removal' flag).
Yeah. That means enrolling on a Python course for me. Let's try to shift this responsibility to Claude ;). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
samples/Z0DAN/zodan.py (2)
494-526:⚠️ Potential issue | 🟡 MinorMisleading log message in the 3+ node path.
Line 496 prints
"3+ node scenario: sync event stored, skipping disabled subscriptions"but the code immediately below (lines 520–526) does create disabled subscriptions. This appears to be a copy-paste leftover from the 2-node message on line 491.Suggested fix
- self.notice(" - 3+ node scenario: sync event stored, skipping disabled subscriptions") + self.notice(f" - 3+ node scenario: triggering sync event on node {rec['node_name']}")
968-1030:⚠️ Potential issue | 🟡 MinorDuplicate method
show_all_subscription_status— this definition is dead code.There are two definitions of
show_all_subscription_statusin this class: one here (line 968) and another at line 1346. In Python, the second definition silently shadows the first, making this block unreachable dead code. The version at line 1346 includes the new status labels ([DISABLED],[INITIALIZING],[SYNCING]) from this PR.Remove this dead definition to avoid confusion.
🤖 Fix all issues with AI agents
In `@samples/Z0DAN/zodan.py`:
- Around line 528-531: The log phase numbers are out of order because add_node
calls trigger_sync_on_other_nodes_and_wait_on_source before
create_source_to_new_subscription; update the notice messages so they match
execution order: change trigger_sync_on_other_nodes_and_wait_on_source's notice
from "Phase 5: Triggering sync events on other nodes..." to "Phase 4: ..." and
change create_source_to_new_subscription's notice from "Phase 4: Creating source
to new node subscription" to "Phase 5: ...". Ensure both functions
(trigger_sync_on_other_nodes_and_wait_on_source and
create_source_to_new_subscription) use the corrected phase numbers so output is
monotonically increasing.
- Around line 810-816: The code calls self.run_psql(new_node_dsn, sql) for the
spock.wait_for_sync_event call but does not check the result so failures are
ignored; replace the direct run_psql call with the existing wait_for_sync_event
helper (or at minimum check run_psql's return value and raise on non-zero) so
errors in wait_for_sync_event are detected and propagated; update both the
2-node path (where sync_lsn, new_node_name, src_node_name, timeout_ms are used)
and the multi-node path (the similar block) to call
wait_for_sync_event(new_node_dsn, src_node_name, sync_lsn, timeout_ms) or assert
the returned result and raise an exception on failure.
🧹 Nitpick comments (3)
samples/Z0DAN/zodan.py (3)
229-266: Fragile exception discrimination and inconsistent subprocess usage.Several concerns in this validation block:
Fragile string matching (lines 262-264):
if "disabled subscription" in str(e)is brittle — if the error message at line 258 is ever reworded, this silently swallows the exception and falls through to re-raise as a generic failure. Consider using a custom exception class instead.Inconsistent subprocess usage (lines 237-249): This block shells out with
subprocess.rundirectly, duplicating logic already inself.run_psqlandself.get_spock_nodes. Using the existing helpers would reduce duplication and ensure consistent error handling.Ambiguous variable name (line 240):
lis flagged by Ruff (E741). Rename to something descriptive likeline.Proposed fix for the fragile exception pattern
Define a specific exception (at class or module level):
class SubscriptionDisabledException(Exception): passThen use it:
- if sub_enabled != 't': - raise Exception(f"Node {node_nm} has disabled subscription {sub_nm}") + if sub_enabled != 't': + raise SubscriptionDisabledException(f"Node {node_nm} has disabled subscription {sub_nm}") ... - except Exception as e: - if "disabled subscription" in str(e): - raise - self.format_notice("✗", f"Failed to check subscription statuses: {str(e)}") - raise + except SubscriptionDisabledException: + raise + except Exception as e: + self.format_notice("✗", f"Failed to check subscription statuses: {e!s}") + raiseFix ambiguous variable name on line 240
- lines = [l.strip() for l in result.stdout.strip().split('\n') if l.strip()] + lines = [line.strip() for line in result.stdout.strip().split('\n') if line.strip()]
828-830: Useraise ... from eto preserve exception chain.Both lines 829 and 864 create a new
Exceptionwrapping the caught one but discard the traceback chain. This makes debugging harder since the original stack trace is lost.Proposed fix
except Exception as e: - raise Exception(f"Enabling subscription {sub_name} (error: {e})") + raise Exception(f"Enabling subscription {sub_name} (error: {e})") from eApply the same change at both line 829 and line 864.
Also applies to: 863-864
936-944: Subscription status display only queries source node.Lines 940–944 run
spock.sub_show_status()againstself.src_dsnonly, while the replication-readiness loop above (lines 900–909) correctly iterates all nodes. This means the "Subscription Status" section may miss subscriptions on non-source nodes. Phase 13 (show_all_subscription_status) covers all nodes, but the output here could mislead an operator checking just this section.Consider querying all nodes here as well for consistency, or adding a note that this is a source-only view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@samples/Z0DAN/zodan.py`:
- Around line 494-496: The log message inside the loop over other_nodes
incorrectly says "skipping disabled subscriptions" while the code actually
creates disabled subscriptions and will repeat per node; update the notice()
call to accurately reflect the action (e.g., "3+ node scenario: sync event
stored; creating disabled subscriptions" or move it outside the loop so it logs
once), or remove/adjust it to avoid per-node repetition — locate the loop that
iterates over other_nodes and the notice(...) invocation and make the message
and placement consistent with the subsequent disabled-subscription creation
logic.
- Around line 229-266: The DSN query in dsns_sql currently excludes the local
node so the source node's subscriptions aren't checked; modify dsns_sql (used
where cmd is constructed) to include local node(s) (remove the WHERE node_id NOT
IN (...) filter or adjust it to select all nodes) so the loop that runs sub_sql
against each node covers the source as well, and also fix the l variable in the
list comprehension (lines = [l.strip() ...]) to a non-ambiguous name like
line_str or line to satisfy linters (ruff E741) and update subsequent references
(lines, parts, node_dsn_val, node_nm) accordingly. Ensure existing error
handling around the subscription check (the try/except that inspects "disabled
subscription") remains unchanged.
🧹 Nitpick comments (2)
samples/Z0DAN/zodan.py (2)
571-572:new_node_dsnparameter is unused.Static analysis flags this (ruff ARG002). All other phase methods accept this parameter for interface consistency, so if that's intentional, consider prefixing with
_to signal it's unused:_new_node_dsn.
827-828: Re-raised exceptions lose the original traceback — useraise ... from e.Both exception handlers wrap and re-raise but drop the original cause, making debugging harder. This is flagged by ruff B904.
Suggested fix
- except Exception as e: - raise Exception(f"Enabling subscription {sub_name} (error: {e})") + except Exception as e: + raise Exception(f"Enabling subscription {sub_name} (error: {e})") from eApply the same pattern at line 860.
Also applies to: 859-860
samples/Z0DAN/zodan.py
Outdated
| # Check: all nodes in the cluster have only enabled subscriptions | ||
| try: | ||
| # Get DSNs of all remote nodes from source | ||
| dsns_sql = """ | ||
| SELECT if_dsn, node_name | ||
| FROM spock.node JOIN spock.node_interface ON (if_nodeid = node_id) | ||
| WHERE node_id NOT IN (SELECT node_id FROM spock.local_node) | ||
| """ | ||
| cmd = ["psql", src_dsn, "-X", "-A", "-F", "|", "-t", "-c", dsns_sql] | ||
| result = subprocess.run(cmd, capture_output=True, text=True) | ||
| if result.returncode == 0: | ||
| lines = [l.strip() for l in result.stdout.strip().split('\n') if l.strip()] | ||
| for line in lines: | ||
| parts = line.split('|') | ||
| if len(parts) >= 2: | ||
| node_dsn_val = parts[0].strip() | ||
| node_nm = parts[1].strip() | ||
| # Check subscriptions on this node | ||
| sub_sql = "SELECT sub_name, sub_enabled FROM spock.subscription" | ||
| sub_cmd = ["psql", node_dsn_val, "-X", "-A", "-F", "|", "-t", "-c", sub_sql] | ||
| sub_result = subprocess.run(sub_cmd, capture_output=True, text=True) | ||
| if sub_result.returncode == 0: | ||
| sub_lines = [sl.strip() for sl in sub_result.stdout.strip().split('\n') if sl.strip()] | ||
| for sub_line in sub_lines: | ||
| sub_parts = sub_line.split('|') | ||
| if len(sub_parts) >= 2: | ||
| sub_nm = sub_parts[0].strip() | ||
| sub_enabled = sub_parts[1].strip() | ||
| if sub_enabled != 't': | ||
| raise Exception(f"Node {node_nm} has disabled subscription {sub_nm}") | ||
| elif self.verbose: | ||
| self.format_notice("OK:", f"Node with DSN {node_dsn_val} has enabled subscription {sub_nm}") | ||
| self.format_notice("OK:", "Checking each Spock node has only active subscriptions") | ||
| except Exception as e: | ||
| if "disabled subscription" in str(e): | ||
| raise | ||
| self.format_notice("✗", f"Failed to check subscription statuses: {str(e)}") | ||
| raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-check only inspects remote nodes — the source node's own subscriptions are never verified.
The query at line 234 excludes the local node (WHERE node_id NOT IN (SELECT node_id FROM spock.local_node)), so if the source node itself has a disabled subscription, this check will pass silently. If the PR goal is to verify all nodes in the cluster have only enabled subscriptions, the source node should be included.
Also, line 240 uses l as a variable name, which is flagged by linters (ruff E741) as ambiguous — easily confused with 1.
Suggested fix for variable name
- lines = [l.strip() for l in result.stdout.strip().split('\n') if l.strip()]
+ lines = [line.strip() for line in result.stdout.strip().split('\n') if line.strip()]🧰 Tools
🪛 Ruff (0.14.14)
[error] 238-238: subprocess call: check for execution of untrusted input
(S603)
[error] 240-240: Ambiguous variable name: l
(E741)
[error] 249-249: subprocess call: check for execution of untrusted input
(S603)
[warning] 258-258: Abstract raise to an inner function
(TRY301)
[warning] 258-258: Create your own exception
(TRY002)
[warning] 258-258: Avoid specifying long messages outside the exception class
(TRY003)
[warning] 265-265: Use explicit conversion flag
Replace with conversion flag
(RUF010)
🤖 Prompt for AI Agents
In `@samples/Z0DAN/zodan.py` around lines 229 - 266, The DSN query in dsns_sql
currently excludes the local node so the source node's subscriptions aren't
checked; modify dsns_sql (used where cmd is constructed) to include local
node(s) (remove the WHERE node_id NOT IN (...) filter or adjust it to select all
nodes) so the loop that runs sub_sql against each node covers the source as
well, and also fix the l variable in the list comprehension (lines = [l.strip()
...]) to a non-ambiguous name like line_str or line to satisfy linters (ruff
E741) and update subsequent references (lines, parts, node_dsn_val, node_nm)
accordingly. Ensure existing error handling around the subscription check (the
try/except that inspects "disabled subscription") remains unchanged.
samples/Z0DAN/zodan.py
Outdated
| for rec in other_nodes: | ||
| # Trigger sync event on origin node and store LSN | ||
| self.notice(" - 3+ node scenario: sync event stored, skipping disabled subscriptions") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misleading log message — says "skipping disabled subscriptions" but then creates them.
Line 496 prints "3+ node scenario: sync event stored, skipping disabled subscriptions", but the code immediately below (lines 519–526) creates disabled subscriptions. This appears to be copy-pasted from the 2-node branch at line 491. The message will also repeat once per node in the loop, which is likely unintended.
Suggested fix
- for rec in other_nodes:
- # Trigger sync event on origin node and store LSN
- self.notice(" - 3+ node scenario: sync event stored, skipping disabled subscriptions")
- sync_lsn = self.sync_event(rec['dsn'])
+ self.notice(" - 3+ node scenario")
+ for rec in other_nodes:
+ # Trigger sync event on origin node and store LSN
+ sync_lsn = self.sync_event(rec['dsn'])🤖 Prompt for AI Agents
In `@samples/Z0DAN/zodan.py` around lines 494 - 496, The log message inside the
loop over other_nodes incorrectly says "skipping disabled subscriptions" while
the code actually creates disabled subscriptions and will repeat per node;
update the notice() call to accurately reflect the action (e.g., "3+ node
scenario: sync event stored; creating disabled subscriptions" or move it outside
the loop so it logs once), or remove/adjust it to avoid per-node repetition —
locate the loop that iterates over other_nodes and the notice(...) invocation
and make the message and placement consistent with the subsequent
disabled-subscription creation logic.
Fix Z0DAN error handling: Change RAISE NOTICE to RAISE EXCEPTION across all error handlers in zodan.sql so that add_node fails immediately on error instead of silently logging and eventually timing out. Affected operations: node creation, sync event triggering/waiting, replication slot creation/advancement, subscription creation/enabling, and replication lag monitoring.
Fix subscription name assignment: Move sub_name assignment to the correct location inside the subscription creation block where it is actually used.
Fix Z0DAN initialization: Add COMMIT in present_final_cluster_state to let newly created subscriptions update their state; check all subscriptions across the cluster, not only the new node.
Fix status label: Correct [INITIALIZING] to [SYNCING] for subscriptions in syncing state.
Allow removing the last Spock node via remove_node.
Remove unreachable code: Drop CONTINUE/RAISE statements after RAISE EXCEPTION.
Add 3-node cluster TAP test (016_zodan_3n_timeout).