Spoc 387: Prevent adding ignored-schema tables to replication sets#319
Spoc 387: Prevent adding ignored-schema tables to replication sets#319mason-sharp merged 5 commits intomainfrom
Conversation
mason-sharp
left a comment
There was a problem hiding this comment.
LGTM, just was not sure about a TODO comment.
- Add new excluded_schema regression test - Reformat multi-parameter SQL function definitions for readability (sub_create, sub_show_status, repset_create, repset_drop) This test demonstrates two issues: 1. table from an 'excluded' schema disables subscription. 2. Spock tests are unstable.
Previous commits exposed test instability due to non-deterministic ordering in query results. Changes made to stabilize output: - Replace SELECT * FROM spock.sub_create() with SELECT 1 FROM spock.sub_create() to avoid returning implementation-dependent OIDs. - Queries on spock.local_sync_status now join with spock.subscription to display subscription names instead of internal IDs, with proper ORDER BY on sub_name (using COLLATE "C") and additional columns (sync_kind, sync_nspname, sync_relname) where needed. - Add ORDER BY subscription_name COLLATE "C" to spock.sub_show_status() calls.
Centralize extension and schema skipping logic into helper functions build_exclude_extension_string() and build_exclude_schema_string(). Use dynamic list instead of fixed-size array for argument construction. Global skip lists for schemas and extensions (spock, lolor, snowflake) are now defined in spock_node.c and exported via spock_node.h.
Previously, tables in ignored schemas (spock, lolor, snowflake) could be added to replication sets, which would then cause replication failures at runtime. This change adds proactive validation that rejects such tables at the time they are added to a replication set, providing clear error messages with hints for resolution. Add EnsureRelationNotIgnored() function that checks both the global skip_schema and skip_extension lists. Call it from both replication_set_add_table() and replication_set_add_seq(). Also fix PG17 compatibility by using CheckRelationOidLockedByMe() instead of LockHeldByMe() with LOCKTAG, and move the test helper function pg_current_xlog_location() to the spock schema to avoid potential conflicts with user functions in public schema.
📝 WalkthroughWalkthroughAdd schema/extension skip lists and EnsureRelationNotIgnored; integrate exclusions into pg_dump argument construction; add PG-version-gated relation lock checks; reformat several SQL function declarations (and change one OUT type int→bigint); add regression tests and include new Changes
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
sql/spock--6.0.0-devel.sql (1)
669-675:⚠️ Potential issue | 🔴 CriticalAdd
CREATE OR REPLACE FUNCTIONforget_apply_worker_statusto the upgrade script.The function signature change from
worker_pid inttoworker_pid bigintis not reflected in the upgrade scriptsql/spock--5.0.4--6.0.0-devel.sql. Users upgrading from 5.0.4 to 6.0.0-devel will retain the old function signature unless this definition is included in the upgrade path. Add the updated function definition to the upgrade script to ensure the signature change is applied during extension upgrade.
🤖 Fix all issues with AI agents
In `@tests/regress/sql/excluded_schema.sql`:
- Around line 6-8: Replace the incorrect header comment "Test resynchronization"
with a short, accurate description for this test, e.g. "Test excluded schema
behavior (SPOC-387)"; locate the top comment block that currently contains "Test
resynchronization" and update it to reference excluded schema behavior and the
ticket number so the test purpose matches its contents.
- Around line 42-53: The loop polling spock.sub_show_status(subscription_name :=
'sub_387') busy-waits with no delay and only exits when status = 'down'; add a
short sleep (e.g., PERFORM pg_sleep(0.1); or 0.5) inside the LOOP to avoid 100%
CPU spinning, and change the exit logic to handle both terminal states: EXIT
WHEN cnt > 0 OR EXISTS(SELECT 1 FROM spock.sub_show_status(subscription_name :=
'sub_387') WHERE status = 'replicating'); alternatively, replace this custom
loop with the existing test helper (e.g., sub_wait_for_sync) if available and
assert explicitly if the subscription reaches 'replicating' instead of timing
out.
In `@tests/regress/sql/parallel.sql`:
- Line 10: Fix the typo in the SQL test comment: change "statment" to
"statement" in the commented line that reads "-- FIXME: The statment below is
commented out temporarily." so the comment becomes "-- FIXME: The statement
below is commented out temporarily."
🧹 Nitpick comments (6)
src/spock_node.c (1)
1266-1310: Error messages say "table" but this function is also called for sequences.
EnsureRelationNotIgnoredis invoked from bothreplication_set_add_tableandreplication_set_add_seq(inspock_repset.c). Theerrmsgtext always says"table %s cannot be added to any replication set", which is misleading when the caller is adding a sequence.Consider using
"relation"instead of"table"for accuracy, or include therelkindin the message.Suggested fix
ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("table %s cannot be added to any replication set", + errmsg("relation %s cannot be added to any replication set", RelationGetRelationName(rel)),(Apply similarly to the extension-check
ereportblock on line 1303.)src/spock_sync.c (1)
242-270: Verify safety of storing stack-localpg_dumppointer in the list.
pg_dumpis a stack-allocatedchar[MAXPGPATH](line 228). Its address is stored inargs(line 242) and later passed toexec_cmdviacmdargv. This works because the entire execution completes withindump_structure's stack frame. However, this is fragile — if someone later refactors to returnargsor defer execution, the dangling pointer would cause UB.The comment at lines 278–283 already notes that the list contains string literals (which can't be freed), but doesn't mention the stack pointer. Consider adding a note or using
pstrdup(pg_dump)for defensive safety.tests/regress/sql/apply_delay.sql (1)
34-37: Minor ORDER BY inconsistency with other test files.Other files in this PR (e.g.,
multiple_upstreams.sql,sync_table.sql) order bysub_name,sync_kind,sync_nspname,sync_relname COLLATE "C", whereas this query orders only bysub_name COLLATE "C". For consistency and to ensure deterministic output even if additional tables are added to this test later, consider adding the extra sort columns.Suggested fix
SELECT sync_kind, sub_name, sync_nspname, sync_relname, sync_status IN ('y', 'r') FROM spock.local_sync_status l JOIN spock.subscription s ON (l.sync_subid = s.sub_id) -ORDER BY sub_name COLLATE "C"; +ORDER BY sub_name,sync_kind,sync_nspname,sync_relname COLLATE "C";tests/regress/sql/excluded_schema.sql (1)
1-1: TODO: sequence-in-excluded-schema coverage is noted but not yet implemented.The TODO on line 23 flags an uncovered scenario. Would you like me to open a tracking issue for it?
Also applies to: 22-24
tests/regress/sql/init.sql (1)
86-86: Inconsistent with the join-based pattern used inparallel.sql.This query still exposes
sync_subid(an implementation-dependent OID) directly. Other test files in this PR (e.g.,parallel.sqllines 34–37) joinspock.local_sync_statuswithspock.subscriptionto displaysub_nameinstead, and addORDER BY ... COLLATE "C"for deterministic ordering. Consider applying the same pattern here for consistency and stability.♻️ Suggested refactor to match the parallel.sql pattern
-SELECT sync_kind, sync_subid, sync_nspname, sync_relname, sync_status IN ('y', 'r') FROM spock.local_sync_status ORDER BY 2,3,4; +SELECT sync_kind, sub_name, sync_nspname, sync_relname, sync_status IN ('y', 'r') +FROM spock.local_sync_status l JOIN spock.subscription s + ON (l.sync_subid = s.sub_id) +ORDER BY sub_name COLLATE "C", sync_nspname, sync_relname;tests/regress/sql/parallel.sql (1)
34-37: Good improvement; consider extendingORDER BYfor full determinism.The join on
spock.subscriptionto displaysub_nameinstead of the raw OID is a solid stability improvement. However, theORDER BYonly coverssub_name. If multiple sync entries exist per subscription (e.g., fromtest_subscriptioncreated ininit.sql), rows within the same subscription will be non-deterministic. Consider addingsync_nspname, sync_relnameto the sort key, as done in the suggested refactor forinit.sql.♻️ Suggested extended ORDER BY
SELECT sync_kind, sub_name, sync_nspname, sync_relname, sync_status IN ('y', 'r') FROM spock.local_sync_status l JOIN spock.subscription s ON (l.sync_subid = s.sub_id) -ORDER BY sub_name COLLATE "C"; +ORDER BY sub_name COLLATE "C", sync_nspname, sync_relname;
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/spock_node.c`:
- Around line 1267-1312: In EnsureRelationNotIgnored, guard against nspname
being NULL after calling get_namespace_name(RelationGetNamespace(rel)) before
calling strcmp against skip_schema entries (return or treat as non-match if
NULL); then perform the skip_schema loop only if nspname is non-NULL. Also
harmonize the errmsg wording between the schema and extension branches (use the
same term, e.g., "relation %s") so RelationGetRelationName(rel) is described
consistently in both error messages; keep other errdetail/errhint text the same.
In `@tests/regress/sql/excluded_schema.sql`:
- Around line 8-12: Remove the redundant INSERT statement "INSERT INTO
spock.test_387 (x) VALUES (1);" that follows the CREATE TABLE for spock.test_387
in the test SQL; the test only needs the table definition and the calls to
spock.repset_create and spock.repset_add_table to validate the schema-exclusion
error, so delete that INSERT line and keep the CREATE TABLE and the two
spock.repset_* calls unchanged (no changes needed to the expected output which
already contains the ERROR/DETAIL/HINT).
replication_set_add_table()/replication_set_add_seq()time with clear error messages, instead of failing at replication runtime.dump_structure()to use helpers for building--exclude-extension/--exclude-schemaarguments, centralizing the global skip lists inspock_node.c.ORDER BY ... COLLATE "C", useSELECT 1 FROMto hide implementation-dependent OIDs, and join on subscription names instead of internal IDs.excluded_schemaregression test demonstrating the fix.