maintainer, route: add target table registry to detect route confliction#5098
maintainer, route: add target table registry to detect route confliction#50983AceShowHand wants to merge 16 commits into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughExports router registry types and refactors TargetTableRegistry to dual maps with Remove/ApplyTransition; adds routeConflictDetector to validate/apply DDL-driven route transitions; integrates detector into Barrier and Controller bootstrap; updates dispatch blocking and adds unit and integration tests. ChangesTable Route Conflict Detection Infrastructure
Sequence Diagram(s)sequenceDiagram
participant DDLEvent
participant Dispatcher as BasicDispatcher
participant Barrier
participant Detector as routeConflictDetector
participant Registry as TargetTableRegistry
DDLEvent->>Dispatcher: shouldBlock(event)?
alt routing enabled and route-affecting
Dispatcher-->>Barrier: block event
end
Barrier->>Detector: precheck(routeDDLInfo)
Detector->>Detector: buildTransition(drops, adds, updates)
Detector->>Registry: ApplyTransition(validates)
alt validation passes
Registry-->>Detector: nil
Detector-->>Barrier: ready
else validation fails
Registry-->>Detector: ErrTableRouteConflict
Detector-->>Barrier: error (reportError)
end
Barrier->>Detector: apply(routeDDLInfo)
Detector->>Registry: ApplyTransition(removes, adds)
Registry-->>Detector: updated
Detector-->>Barrier: success
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to detect and prevent table routing conflicts where multiple logical upstream tables map to the same downstream target. It implements a TargetTableRegistry to track these mappings and integrates conflict validation into the OpenAPI changefeed management and the maintainer's DDL processing workflow. Feedback was provided regarding an optimization for counting replicas per target in the registry to improve performance from O(N) to O(1).
| func (r *TargetTableRegistry) countReplicasForTarget(excludeReplica int64, target TargetKey) int { | ||
| count := 0 | ||
| for id, b := range r.bindings { | ||
| if id != excludeReplica && b.Target == target { | ||
| count++ | ||
| } | ||
| } | ||
| // Add 1 for the binding we haven't inserted yet (the one being upserted). | ||
| return count | ||
| } |
There was a problem hiding this comment.
The countReplicasForTarget method iterates over all bindings to count replicas for a target, which is O(N). This can be optimized to O(1) by maintaining a targetReplicaCount map[TargetKey]int in the TargetTableRegistry struct to track the number of replicas per target. This would also require updating Upsert, RemoveByReplicaID, and RemoveBySchemaID to maintain this map.
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (5)
maintainer/barrier_event.go (1)
232-257: 💤 Low valueRepeated
Snapshot()calls inside loop may degrade performance for large registries.
be.targetRegistry.Snapshot()is called inside the loop for each blocked table ID (line 240). If there are many blocked tables and a large registry, this creates O(n × m) snapshot overhead. Consider callingSnapshot()once before the loop.♻️ Proposed optimization
+ // Handle rename (blocked tables may change name -> new route result) + if be.blockedDispatchers != nil && be.blockedDispatchers.InfluenceType == heartbeatpb.InfluenceType_Normal { + existingBindings := be.targetRegistry.Snapshot() for _, tableID := range be.blockedDispatchers.TableIDs { binding, err := be.buildRouteBindingForTable(schemaStore, tableID, 0) if err != nil { return nil, err } if binding != nil { // Check if the route actually changed. - existingBindings := be.targetRegistry.Snapshot() changed := true for _, existing := range existingBindings {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@maintainer/barrier_event.go` around lines 232 - 257, The code calls be.targetRegistry.Snapshot() inside the loop over be.blockedDispatchers.TableIDs, causing repeated snapshots; move the snapshot call outside the loop by obtaining existingBindings := be.targetRegistry.Snapshot() once before iterating, then reuse existingBindings when comparing binding against existing.ReplicaTableID/Target/Source in the loop (while keeping the existing buildRouteBindingForTable, transition.Removes and transition.Adds logic unchanged).maintainer/barrier_test.go (1)
17-18: 💤 Low valueImport ordering: internal package import should be grouped after standard library imports.
Same as in
maintainer_test.go, theroutingimport is placed beforetesting(standard library).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@maintainer/barrier_test.go` around lines 17 - 18, The import order in maintainer/barrier_test.go is wrong: move the standard library import testing before the internal package import "github.com/pingcap/ticdc/downstreamadapter/routing" so imports are grouped as standard library first, then external/internal packages; update the import block around the testing and routing entries accordingly.maintainer/maintainer_controller_bootstrap.go (1)
522-530: 💤 Low valueSilently skipping tables on
GetTableInfoerror may cause incomplete registry initialization.Similar to the concern in
buildRouteBindingForTable, whenGetTableInfofails here (lines 523-530), the table is skipped with a warning log. During bootstrap, this could result in an incomplete registry. If a skipped table later routes to a target that conflicts with another table, the conflict won't be detected.Consider whether bootstrap should fail if any table's info cannot be retrieved, or at least document that transient schema store errors during bootstrap may cause incomplete conflict detection.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@maintainer/maintainer_controller_bootstrap.go` around lines 522 - 530, The loop over tables currently skips table entries when schemaStore.GetTableInfo(c.keyspaceMeta, t.TableID, startTs) returns an error, which can silently leave the registry incomplete; change this behavior in the bootstrap path (the loop that calls GetTableInfo) to surface errors instead of continuing—either propagate/return the error up so bootstrap fails (preferred) or aggregate and return a consolidated error that includes c.changefeedID.Name() and the failing t.TableID; ensure the same semantics are applied consistently with buildRouteBindingForTable so route/conflict detection isn't bypassed.maintainer/maintainer_test.go (1)
17-18: 💤 Low valueImport ordering: internal package import should be grouped after standard library imports.
The
routingimport is placed before thecontextimport (standard library). Go convention groups standard library imports first, then third-party, then internal packages. However, since no static analysis tool has flagged this, consider fixing it for consistency.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@maintainer/maintainer_test.go` around lines 17 - 18, The import block in maintainer_test.go has routing ("github.com/pingcap/ticdc/downstreamadapter/routing") placed before the standard library import context; reorder the imports so standard library packages (e.g., context) appear first, then third-party/internal packages (e.g., github.com/pingcap/ticdc/downstreamadapter/routing) to match Go import grouping and conventions used around the test functions in this file.downstreamadapter/routing/registry_test.go (1)
115-147: ⚡ Quick winAdd an Upsert regression case for multi-replica target ownership.
Please add a focused test where two replicas from the same source share one target, then one replica is upserted to a new target; ownership of the old target should remain and a different source adding to that old target must still fail with
ErrTableRouteConflict.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@downstreamadapter/routing/registry_test.go` around lines 115 - 147, Add a new subtest in TestTargetTableRegistryUpsert that creates a registry via NewTargetTableRegistry seeded with two RouteBinding entries from the same source (same SourceID but different ReplicaIDs) pointing to the same target (use makeBinding to construct them), then call r.Upsert to move one of those replicas to a new target; assert with r.Snapshot() that the remaining replica still owns the original target (table name unchanged) and that attempting to Upsert a binding from a different source to that original target returns an error equal to errors.ErrTableRouteConflict; use the same helpers (NewTargetTableRegistry, Upsert, makeBinding, Snapshot) and equality check for ErrTableRouteConflict as in existing tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@downstreamadapter/routing/registry.go`:
- Around line 146-148: The ownerSources map is being cleared prematurely because
the condition uses "<= 1" when checking
r.countReplicasForTarget(binding.ReplicaTableID, oldBinding.Target); change this
to check for zero remaining replicas (== 0) so ownership is only removed when no
replica remains; update the same logic in the other block referenced (the second
occurrence around the 253-261 region) that also calls r.countReplicasForTarget
and deletes from r.ownerSources for oldBinding.Target.
In `@maintainer/barrier_event_test.go`:
- Around line 17-18: Update every call to NewBlockEvent in this test to pass the
three new routing arguments: routeRouter (type routing.Router), targetRegistry
(type *routing.TargetTableRegistry), and routeEnabled (bool). Specifically,
locate all existing NewBlockEvent(...) usages (e.g., the calls around the
previously noted lines) and append the same routing arguments pattern used where
NewBlockEvent is called from barrier.go: pass the Router instance, the
TargetTableRegistry pointer, and the appropriate routeEnabled boolean. Ensure
the final call signature matches NewBlockEvent(..., routeRouter, targetRegistry,
routeEnabled) so compilation succeeds.
In `@maintainer/barrier_event.go`:
- Around line 404-426: The precheck error path in the block using
buildRouteTransition and ValidateReplace repeatedly returns nil,"" without
setting be.routePrecheckDone, causing endless retries; update the error paths so
that on any error from buildRouteTransition or ValidateReplace you set
be.routePrecheckDone = true to avoid log spam and then surface the error to the
changefeed failure/monitoring path (e.g., record the error on the changefeed
status or invoke the existing changefeed-failure API) or implement a bounded
retry/backoff before marking the precheck done; ensure you still set
be.routeTransition when a valid transition exists and preserve the
needsRoutePrecheck() gating logic.
---
Nitpick comments:
In `@downstreamadapter/routing/registry_test.go`:
- Around line 115-147: Add a new subtest in TestTargetTableRegistryUpsert that
creates a registry via NewTargetTableRegistry seeded with two RouteBinding
entries from the same source (same SourceID but different ReplicaIDs) pointing
to the same target (use makeBinding to construct them), then call r.Upsert to
move one of those replicas to a new target; assert with r.Snapshot() that the
remaining replica still owns the original target (table name unchanged) and that
attempting to Upsert a binding from a different source to that original target
returns an error equal to errors.ErrTableRouteConflict; use the same helpers
(NewTargetTableRegistry, Upsert, makeBinding, Snapshot) and equality check for
ErrTableRouteConflict as in existing tests.
In `@maintainer/barrier_event.go`:
- Around line 232-257: The code calls be.targetRegistry.Snapshot() inside the
loop over be.blockedDispatchers.TableIDs, causing repeated snapshots; move the
snapshot call outside the loop by obtaining existingBindings :=
be.targetRegistry.Snapshot() once before iterating, then reuse existingBindings
when comparing binding against existing.ReplicaTableID/Target/Source in the loop
(while keeping the existing buildRouteBindingForTable, transition.Removes and
transition.Adds logic unchanged).
In `@maintainer/barrier_test.go`:
- Around line 17-18: The import order in maintainer/barrier_test.go is wrong:
move the standard library import testing before the internal package import
"github.com/pingcap/ticdc/downstreamadapter/routing" so imports are grouped as
standard library first, then external/internal packages; update the import block
around the testing and routing entries accordingly.
In `@maintainer/maintainer_controller_bootstrap.go`:
- Around line 522-530: The loop over tables currently skips table entries when
schemaStore.GetTableInfo(c.keyspaceMeta, t.TableID, startTs) returns an error,
which can silently leave the registry incomplete; change this behavior in the
bootstrap path (the loop that calls GetTableInfo) to surface errors instead of
continuing—either propagate/return the error up so bootstrap fails (preferred)
or aggregate and return a consolidated error that includes c.changefeedID.Name()
and the failing t.TableID; ensure the same semantics are applied consistently
with buildRouteBindingForTable so route/conflict detection isn't bypassed.
In `@maintainer/maintainer_test.go`:
- Around line 17-18: The import block in maintainer_test.go has routing
("github.com/pingcap/ticdc/downstreamadapter/routing") placed before the
standard library import context; reorder the imports so standard library
packages (e.g., context) appear first, then third-party/internal packages (e.g.,
github.com/pingcap/ticdc/downstreamadapter/routing) to match Go import grouping
and conventions used around the test functions in this file.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4f215250-3fd0-48a0-8d5a-31819f3eab5b
📒 Files selected for processing (13)
api/v2/changefeed.godownstreamadapter/routing/registry.godownstreamadapter/routing/registry_test.godownstreamadapter/routing/router.gomaintainer/barrier.gomaintainer/barrier_event.gomaintainer/barrier_event_test.gomaintainer/barrier_test.gomaintainer/maintainer_controller.gomaintainer/maintainer_controller_bootstrap.gomaintainer/maintainer_test.gopkg/errors/error.gopkg/errors/helper.go
There was a problem hiding this comment.
♻️ Duplicate comments (1)
maintainer/barrier_event.go (1)
392-414:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftRoute precheck error causes infinite retry loop without recovery.
When
buildRouteTransition()(line 394) orValidateReplace()(line 403) returns an error, the method returnsnil, ""without settingroutePrecheckDone = true. On subsequent resend cycles,needsRoutePrecheck() && !be.routePrecheckDonewill remain true, causing the same error to be logged repeatedly with no progress.For transient errors, this may eventually recover. For permanent conflicts (true schema conflicts), the DDL is blocked indefinitely with log spam but no actionable feedback to the user.
Consider setting
routePrecheckDone = trueeven on error to prevent log spam, and surfacing the conflict to the changefeed error state so operators can take action.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@maintainer/barrier_event.go` around lines 392 - 414, The route precheck currently returns early on errors from buildRouteTransition() or targetRegistry.ValidateReplace(), leaving be.routePrecheckDone false and causing infinite retries; update the error paths in the precheck block (the code around needsRoutePrecheck(), buildRouteTransition(), ValidateReplace(), and where routeTransition is set) to always set be.routePrecheckDone = true before returning on error and also record/surface the failure to the changefeed error state (e.g., invoke the changefeed error reporting method used elsewhere in this file or set the changefeed status field) so permanent conflicts stop retrying and are visible to operators.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@maintainer/barrier_event.go`:
- Around line 392-414: The route precheck currently returns early on errors from
buildRouteTransition() or targetRegistry.ValidateReplace(), leaving
be.routePrecheckDone false and causing infinite retries; update the error paths
in the precheck block (the code around needsRoutePrecheck(),
buildRouteTransition(), ValidateReplace(), and where routeTransition is set) to
always set be.routePrecheckDone = true before returning on error and also
record/surface the failure to the changefeed error state (e.g., invoke the
changefeed error reporting method used elsewhere in this file or set the
changefeed status field) so permanent conflicts stop retrying and are visible to
operators.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f238fc2a-ccdd-4081-a086-1284aba6344d
📒 Files selected for processing (10)
downstreamadapter/routing/ddl_query_rewriter.godownstreamadapter/routing/registry.godownstreamadapter/routing/router.godownstreamadapter/routing/router_test.gomaintainer/barrier.gomaintainer/barrier_event.gomaintainer/barrier_event_test.gomaintainer/barrier_test.gomaintainer/maintainer_controller_bootstrap.gomaintainer/maintainer_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
- maintainer/maintainer_test.go
- maintainer/maintainer_controller_bootstrap.go
- maintainer/barrier_test.go
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (5)
downstreamadapter/dispatcher/basic_dispatcher.go (2)
935-937: ⚡ Quick winConsider adding logging when blocking route-affecting DDLs.
The new blocking path will prevent DDL execution when routing is configured, but there's no log statement explaining why the DDL was blocked. This could make troubleshooting difficult in production environments.
📝 Suggested improvement
ddlEvent := event.(*commonEvent.DDLEvent) if d.sharedInfo.GetRouter().HasTableRoute() && routeAffectingDDL(ddlEvent) { + log.Info("blocking route-affecting DDL for conflict detection", + zap.Stringer("dispatcher", d.id), + zap.Uint64("commitTs", ddlEvent.FinishedTs), + zap.String("query", ddlEvent.Query)) return true }As per coding guidelines, logs are operational signals; see docs/agents/logging.md before adding, removing, or rewriting logs.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@downstreamadapter/dispatcher/basic_dispatcher.go` around lines 935 - 937, Add an operational log when the dispatcher blocks route-affecting DDLs: in the branch that checks d.sharedInfo.GetRouter().HasTableRoute() && routeAffectingDDL(ddlEvent) (just before the return true), emit a concise logger message (e.g., d.logger.Warn or Info) that includes the DDL event identifier/details (ddlEvent) and that routing is enabled (from d.sharedInfo.GetRouter()) so operators can see why the DDL was blocked; keep the message short and non-sensitive and then return true as before.
962-970: ⚡ Quick winSimplify boolean return and add documentation.
The function uses an if-then-return-true/return-false pattern that can be simplified to a direct boolean expression. Additionally, the function lacks documentation explaining what constitutes a "route-affecting" DDL.
♻️ Proposed refactoring
+// routeAffectingDDL returns true if the DDL event requires route conflict detection. +// This includes DDLs that drop tables, add tables, update schemas, or change table names, +// all of which can alter table routing mappings. func routeAffectingDDL(event *commonEvent.DDLEvent) bool { - if event.GetNeedDroppedTables() != nil || + return event.GetNeedDroppedTables() != nil || len(event.GetNeedAddedTables()) > 0 || len(event.GetUpdatedSchemas()) > 0 || - event.TableNameChange != nil { - return true - } - return false + event.TableNameChange != nil }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@downstreamadapter/dispatcher/basic_dispatcher.go` around lines 962 - 970, Replace the verbose if/return pattern in routeAffectingDDL with a single boolean return expression that directly checks the four conditions (event.GetNeedDroppedTables() != nil, len(event.GetNeedAddedTables()) > 0, len(event.GetUpdatedSchemas()) > 0, event.TableNameChange != nil) and add a short documentation comment above routeAffectingDDL describing that a "route-affecting" DDL is any DDL that drops tables, adds tables, updates schemas, or renames a table so callers understand the criteria.downstreamadapter/dispatcher/event_dispatcher_test.go (1)
71-108: ⚡ Quick winConsider adding test cases for all route-affecting DDL conditions.
The test covers
NeedAddedTablesandTableNameChangescenarios but doesn't test theNeedDroppedTablesandUpdatedSchemascode paths in therouteAffectingDDL()helper. While the current coverage validates the core blocking logic, testing all four conditions would ensure complete coverage of the helper function.🧪 Suggested additional test cases
require.False(t, newDispatcher(router).shouldBlock(regularSingleTableDDL)) + + dropTable := &commonEvent.DDLEvent{ + NeedDroppedTables: &commonEvent.InfluencedTables{ + InfluenceType: commonEvent.InfluenceTypeNormal, + TableIDs: []int64{1}, + }, + } + require.True(t, newDispatcher(router).shouldBlock(dropTable)) + + schemaChange := &commonEvent.DDLEvent{ + UpdatedSchemas: []*commonEvent.SchemaIDChange{ + {TableID: 1, OldSchemaID: 1, NewSchemaID: 2}, + }, + } + require.True(t, newDispatcher(router).shouldBlock(schemaChange)) }As per coding guidelines, prefer focused deterministic tests; see docs/agents/testing.md before adding or changing tests.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@downstreamadapter/dispatcher/event_dispatcher_test.go` around lines 71 - 108, Add two focused assertions in TestShouldBlockRouteAffectingDDL to cover the missing route-affecting DDL paths: construct a DDLEvent with NeedDroppedTables populated (e.g., NeedDroppedTables: []commonEvent.Table{{SchemaID:1,TableID:2}}) and assert newDispatcher(router).shouldBlock(...) returns true (and false for newDispatcher(routing.Router{}) as with NeedAddedTables); then construct a DDLEvent exercising UpdatedSchemas (populate UpdatedSchemas with the same kind of entries or schema IDs used by routeAffectingDDL) and assert shouldBlock returns true with router and false with an empty router. Reference TestShouldBlockRouteAffectingDDL, routeAffectingDDL, NeedDroppedTables, and UpdatedSchemas when adding these cases.maintainer/barrier.go (1)
541-546: 💤 Low valueConsistent nil-guard returns but asymmetric semantics.
Both
precheckRouteEvent(returns(true, nil)) andapplyRouteEvent(returnsnil) short-circuit whenrouteDetectoris nil. While the nil guards are correct, the return-value semantics differ: precheck signals "ready" by returningtrue, apply signals "success" by returningnil. This is internally consistent with Go conventions (bool for readiness, error for failure), but consider adding a brief comment on line 542 noting thattrue, nilmeans "no conflict detector configured, proceed" to improve readability for future maintainers unfamiliar with the routing subsystem.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@maintainer/barrier.go` around lines 541 - 546, Add a short clarifying comment in the precheckRouteEvent function noting that returning true, nil is intentional and means "no routeDetector configured — proceed" to mirror applyRouteEvent's nil short-circuit semantics; e.g., place the comment above the nil check in precheckRouteEvent and reference that it intentionally signals readiness when routeDetector is nil (consistent with applyRouteEvent returning nil on the same condition).maintainer/route_conflict_detector.go (1)
287-290: ⚡ Quick winUse repository errors instead of raw
fmt.Errorf.These branches bypass the repo's error taxonomy, so upstream code can't classify or wrap detector failures consistently the way it can for
errors.ErrTableRouteConflictabove.As per coding guidelines, "Use predefined repository errors; see docs/agents/error-handling.md before changing error creation, wrapping, or propagation."
Also applies to: 305-307, 330-334
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@maintainer/route_conflict_detector.go` around lines 287 - 290, Replace raw fmt.Errorf usages in the route_conflict_detector loop and the other noted branches with the repository's predefined error types (e.g., errors.ErrTableRouteConflict) so callers can classify failures; for each place that currently returns fmt.Errorf("route registry binding not found for table %d", change.TableID) (and the similar cases at the later branches around the code handling d.tables and route lookups), return the appropriate repository error from the errors package and include the table id as context by wrapping or annotating the repository error using the project's error-wrapping helper (maintain the original context but do not introduce new fmt.Errorf-based errors). Ensure you update the return statements in the loop over info.updatedSchema (reference variable names change.TableID, existing, d.tables) and the similar branches mentioned so they consistently use errors.ErrTableRouteConflict (or the exact repo error that matches "route registry binding not found") with contextual wrapping.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@maintainer/barrier.go`:
- Around line 427-432: The code path where event.allDispatcherReported() is true
calls b.precheckRouteEvent(event) and on err || !ready returns (event, nil, "",
false) but does not remove the event from b.blockedEvents; make this behavior
symmetric with the non-blocked path by deleting the event from b.blockedEvents
when precheckRouteEvent fails or returns not-ready (and likewise ensure you
delete on applyRouteEvent failures if applicable), or explicitly document/handle
why blocked events should persist; update the branch that calls
precheckRouteEvent (and any references to blockedEvents, needACK,
precheckRouteEvent, applyRouteEvent) so failed prechecks either remove the event
from blockedEvents before returning or intentionally keep it with a clear
comment and state-machine assertion.
In `@maintainer/maintainer_controller_bootstrap.go`:
- Around line 470-474: The comment "// findHoles returns an array of Span that
are not covered in the range" is misplaced above initializeRouteRegistry; move
that doc comment so it immediately precedes the findHoles function (the function
named findHoles) or remove it entirely, ensuring initializeRouteRegistry's
godoc/comment block is directly above initializeRouteRegistry and reads cleanly;
verify no other stray comments referencing findHoles remain.
In `@maintainer/route_conflict_detector.go`:
- Around line 213-220: needsCheck currently ignores blockTables when TableIDs is
empty, causing DB-wide or global (All) block events to be skipped; update
routeConflictDetector.needsCheck to also return true when info.blockTables is
non-nil and its scope indicates DB or All (even if TableIDs is empty) by
checking the blockTables scope/level field. Also update buildTransition to not
only reevaluate heartbeatpb.InfluenceType_Normal: ensure buildTransition
handles/schema-wide and global influences emitted by blockTables (e.g., treat
DB/All scope as impacting other InfluenceType values and trigger the registry
reevaluation path), so schema/global route-affecting DDLs cause registry
updates.
In `@maintainer/route_registry_test.go`:
- Around line 27-101: newRegistryTestTable currently only sets SchemaTableName,
but rebuildTargetTableRegistry and router.Route use the top-level
SchemaName/TableName fields, so tests exercise routing with empty names; update
newRegistryTestTable to also set Table.SchemaName and Table.TableName to the
provided schema and table values (in addition to SchemaTableName) so the "builds
current target owners" and "fails on route conflict" cases exercise the actual
matcher logic.
---
Nitpick comments:
In `@downstreamadapter/dispatcher/basic_dispatcher.go`:
- Around line 935-937: Add an operational log when the dispatcher blocks
route-affecting DDLs: in the branch that checks
d.sharedInfo.GetRouter().HasTableRoute() && routeAffectingDDL(ddlEvent) (just
before the return true), emit a concise logger message (e.g., d.logger.Warn or
Info) that includes the DDL event identifier/details (ddlEvent) and that routing
is enabled (from d.sharedInfo.GetRouter()) so operators can see why the DDL was
blocked; keep the message short and non-sensitive and then return true as
before.
- Around line 962-970: Replace the verbose if/return pattern in
routeAffectingDDL with a single boolean return expression that directly checks
the four conditions (event.GetNeedDroppedTables() != nil,
len(event.GetNeedAddedTables()) > 0, len(event.GetUpdatedSchemas()) > 0,
event.TableNameChange != nil) and add a short documentation comment above
routeAffectingDDL describing that a "route-affecting" DDL is any DDL that drops
tables, adds tables, updates schemas, or renames a table so callers understand
the criteria.
In `@downstreamadapter/dispatcher/event_dispatcher_test.go`:
- Around line 71-108: Add two focused assertions in
TestShouldBlockRouteAffectingDDL to cover the missing route-affecting DDL paths:
construct a DDLEvent with NeedDroppedTables populated (e.g., NeedDroppedTables:
[]commonEvent.Table{{SchemaID:1,TableID:2}}) and assert
newDispatcher(router).shouldBlock(...) returns true (and false for
newDispatcher(routing.Router{}) as with NeedAddedTables); then construct a
DDLEvent exercising UpdatedSchemas (populate UpdatedSchemas with the same kind
of entries or schema IDs used by routeAffectingDDL) and assert shouldBlock
returns true with router and false with an empty router. Reference
TestShouldBlockRouteAffectingDDL, routeAffectingDDL, NeedDroppedTables, and
UpdatedSchemas when adding these cases.
In `@maintainer/barrier.go`:
- Around line 541-546: Add a short clarifying comment in the precheckRouteEvent
function noting that returning true, nil is intentional and means "no
routeDetector configured — proceed" to mirror applyRouteEvent's nil
short-circuit semantics; e.g., place the comment above the nil check in
precheckRouteEvent and reference that it intentionally signals readiness when
routeDetector is nil (consistent with applyRouteEvent returning nil on the same
condition).
In `@maintainer/route_conflict_detector.go`:
- Around line 287-290: Replace raw fmt.Errorf usages in the
route_conflict_detector loop and the other noted branches with the repository's
predefined error types (e.g., errors.ErrTableRouteConflict) so callers can
classify failures; for each place that currently returns fmt.Errorf("route
registry binding not found for table %d", change.TableID) (and the similar cases
at the later branches around the code handling d.tables and route lookups),
return the appropriate repository error from the errors package and include the
table id as context by wrapping or annotating the repository error using the
project's error-wrapping helper (maintain the original context but do not
introduce new fmt.Errorf-based errors). Ensure you update the return statements
in the loop over info.updatedSchema (reference variable names change.TableID,
existing, d.tables) and the similar branches mentioned so they consistently use
errors.ErrTableRouteConflict (or the exact repo error that matches "route
registry binding not found") with contextual wrapping.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ec0a5288-fa9e-498b-8cb8-cb82ad284642
📒 Files selected for processing (18)
downstreamadapter/dispatcher/basic_dispatcher.godownstreamadapter/dispatcher/event_dispatcher_test.godownstreamadapter/routing/ddl_query_rewriter.godownstreamadapter/routing/registry.godownstreamadapter/routing/registry_test.godownstreamadapter/routing/router.godownstreamadapter/routing/router_test.gomaintainer/barrier.gomaintainer/barrier_event.gomaintainer/barrier_test.gomaintainer/maintainer.gomaintainer/maintainer_controller.gomaintainer/maintainer_controller_bootstrap.gomaintainer/maintainer_test.gomaintainer/route_conflict_detector.gomaintainer/route_conflict_detector_test.gomaintainer/route_registry.gomaintainer/route_registry_test.go
| if event.allDispatcherReported() { | ||
| ready, err := b.precheckRouteEvent(event) | ||
| if err != nil || !ready { | ||
| return event, nil, "", false | ||
| } | ||
| } |
There was a problem hiding this comment.
Asymmetric error handling: precheck failures suppress ack but leave event in blockedEvents.
When precheckRouteEvent fails or returns not-ready at lines 428-430, the function returns false for needACK but does not remove the event from blockedEvents. In contrast, the non-blocked path at lines 451-456 deletes the event on precheck/apply failure. This asymmetry means blocked events that fail precheck will remain in the map and will be retried on the next dispatcher resend, while non-blocked events are immediately discarded. Confirm this difference is intentional and reflects the correct state-machine semantics for the two paths.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@maintainer/barrier.go` around lines 427 - 432, The code path where
event.allDispatcherReported() is true calls b.precheckRouteEvent(event) and on
err || !ready returns (event, nil, "", false) but does not remove the event from
b.blockedEvents; make this behavior symmetric with the non-blocked path by
deleting the event from b.blockedEvents when precheckRouteEvent fails or returns
not-ready (and likewise ensure you delete on applyRouteEvent failures if
applicable), or explicitly document/handle why blocked events should persist;
update the branch that calls precheckRouteEvent (and any references to
blockedEvents, needACK, precheckRouteEvent, applyRouteEvent) so failed prechecks
either remove the event from blockedEvents before returning or intentionally
keep it with a clear comment and state-machine assertion.
| // findHoles returns an array of Span that are not covered in the range | ||
| // initializeRouteRegistry constructs the maintainer-owned route conflict detector | ||
| // from the current set of tables. If no dispatch rules contain routing targets, | ||
| // routeDetector stays nil. | ||
| func (c *Controller) initializeRouteRegistry(tables []commonEvent.Table, startTs uint64) error { |
There was a problem hiding this comment.
Misplaced comment for findHoles now precedes initializeRouteRegistry.
The line 470 comment // findHoles returns an array of Span that are not covered in the range describes findHoles (defined at line 489) but is now stranded above initializeRouteRegistry. Move the orphaned findHoles doc back next to its function (or delete it) so the initializeRouteRegistry godoc reads cleanly.
📝 Proposed fix
-// findHoles returns an array of Span that are not covered in the range
// initializeRouteRegistry constructs the maintainer-owned route conflict detector
// from the current set of tables. If no dispatch rules contain routing targets,
// routeDetector stays nil.
func (c *Controller) initializeRouteRegistry(tables []commonEvent.Table, startTs uint64) error {
@@
}
+// findHoles returns an array of Span that are not covered in the range
func findHoles(currentSpan utils.Map[*heartbeatpb.TableSpan, *replica.SpanReplication], totalSpan *heartbeatpb.TableSpan) []*heartbeatpb.TableSpan {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@maintainer/maintainer_controller_bootstrap.go` around lines 470 - 474, The
comment "// findHoles returns an array of Span that are not covered in the
range" is misplaced above initializeRouteRegistry; move that doc comment so it
immediately precedes the findHoles function (the function named findHoles) or
remove it entirely, ensuring initializeRouteRegistry's godoc/comment block is
directly above initializeRouteRegistry and reads cleanly; verify no other stray
comments referencing findHoles remain.
| func (d *routeConflictDetector) needsCheck(info routeDDLInfo) bool { | ||
| if d == nil || d.registry == nil || info.isSyncPoint { | ||
| return false | ||
| } | ||
| if info.droppedTables != nil || len(info.addedTables) > 0 || len(info.updatedSchema) > 0 { | ||
| return true | ||
| } | ||
| return info.blockTables != nil && len(info.blockTables.TableIDs) > 0 |
There was a problem hiding this comment.
blockTables events with DB/All scope are skipped.
needsCheck only admits blockTables when TableIDs is non-empty, and buildTransition only reevaluates heartbeatpb.InfluenceType_Normal. A schema-wide or global route-affecting DDL therefore becomes a no-op here, so the registry never gets updated and conflicts can be missed.
Also applies to: 300-320
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@maintainer/route_conflict_detector.go` around lines 213 - 220, needsCheck
currently ignores blockTables when TableIDs is empty, causing DB-wide or global
(All) block events to be skipped; update routeConflictDetector.needsCheck to
also return true when info.blockTables is non-nil and its scope indicates DB or
All (even if TableIDs is empty) by checking the blockTables scope/level field.
Also update buildTransition to not only reevaluate
heartbeatpb.InfluenceType_Normal: ensure buildTransition handles/schema-wide and
global influences emitted by blockTables (e.g., treat DB/All scope as impacting
other InfluenceType values and trigger the registry reevaluation path), so
schema/global route-affecting DDLs cause registry updates.
|
/test all |
|
/test all |
|
/test next-gen |
|
@3AceShowHand: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
[FORMAT CHECKER NOTIFICATION] Notice: To remove the 📖 For more info, you can check the "Contribute Code" section in the development guide. |
What problem does this PR solve?
Issue Number: close #xxx
What is changed and how it works?
Check List
Tests
Questions
Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?
Release note
Summary by CodeRabbit
Bug Fixes
New Features
Tests