ddl, tici: support add/drop partition for TiCI indexes | tidb-test=13ccf8de48e8db2290ff884598444d0508606bbf tiflash=feature-fts#66816
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds TiCI partition management: new proto RPCs, TiCI manager client AddPartition/DropPartition APIs and test hooks, DDL hooks to call TiCI when TiCI-backed global indexes gain/lose partitions, a persisted reorg flag, and unit tests with failpoints and in-process mocks. Changes
Sequence Diagram(s)sequenceDiagram
participant Admin
participant DDL as DDL Handler
participant Store as Storage (KV/Store)
participant TiCI as TiCI Manager
participant Meta as TiCI Meta Service
Admin->>DDL: Request Add Partition (table w/ TiCI-backed indexes)
activate DDL
DDL->>DDL: getTiCIIndexIDs(tblInfo)
alt TiCI index IDs present
DDL->>Store: derive KeyspaceID
DDL->>TiCI: AddPartition(ctx, store, tblInfo, schema, indexIDs, parserInfo)
activate TiCI
TiCI->>TiCI: buildAddPartitionRequest(...)
TiCI->>Meta: RPC AddPartition(request)
activate Meta
Meta-->>TiCI: AddPartitionResponse
deactivate Meta
TiCI-->>DDL: result / error
deactivate TiCI
DDL->>DDL: set TiCIPartitionAdded and persist reorg meta
end
DDL-->>Admin: partition created
deactivate DDL
Admin->>DDL: Request Drop Partition (table w/ TiCI-backed indexes)
activate DDL
DDL->>DDL: getTiCIIndexIDs(tblInfo)
alt TiCI index IDs present
loop physical partition IDs
DDL->>TiCI: DropPartition(ctx, store, physicalTableID, indexIDs)
activate TiCI
TiCI->>Meta: RPC DropPartition(request)
activate Meta
Meta-->>TiCI: DropPartitionResponse
deactivate Meta
TiCI-->>DDL: result / error
deactivate TiCI
end
DDL->>DDL: clear TiCIPartitionAdded and persist reorg meta
end
DDL-->>Admin: partition dropped
deactivate DDL
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## feature/fts #66816 +/- ##
===================================================
+ Coverage 76.8610% 76.9918% +0.1308%
===================================================
Files 1960 1964 +4
Lines 555677 558332 +2655
===================================================
+ Hits 427099 429870 +2771
+ Misses 127116 126982 -134
- Partials 1462 1480 +18
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/tici/tici_manager_client_test.go`:
- Around line 107-197: The test subtests CreateFulltextIndex, AddPartition, and
DropPartition set expectations on MockMetaServiceClient with Once() but never
assert them; after each subtest's last assertion (after calls to
ctx.CreateFulltextIndex, ctx.AddPartition, ctx.DropPartition and their
require/assert checks), add mockClient.AssertExpectations(t) to verify the
mocked RPC calls actually occurred; apply the same addition to the other
subtests that use mockClient with Once() (the analogous blocks around the later
tests referenced) so all mocked expectations are asserted.
In `@pkg/tici/tici_manager_client.go`:
- Around line 354-359: ManagerCtx methods AddPartition and DropPartition
currently proceed to build and send RPCs even when indexIDs is empty, causing
inconsistency with package-level helpers that no-op; update
ManagerCtx.AddPartition and ManagerCtx.DropPartition to check if len(indexIDs)
== 0 and immediately return nil (no-op) before calling
buildAddPartitionRequest/buildDropPartitionRequest or
t.addPartition/t.dropPartition so direct manager API callers behave the same as
package helpers.
- Around line 879-887: The retry terminal error in GetShardLocalCacheInfo drops
TiCI's error_message and identifying context; update the final fmt.Errorf return
so it includes resp.ErrorMessage plus the numeric/status code and any available
IDs (e.g., req.TableID, req.IndexID, shardID or fields on resp) along with
code.String(); for example build a message combining code.String(), resp.Status,
resp.ErrorMessage and the request/response table/index/shard identifiers so
callers get actionable context when attempts give up.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 2a586f5b-4aea-4738-bae6-bfa66ca8b4f1
📒 Files selected for processing (2)
pkg/tici/tici_manager_client.gopkg/tici/tici_manager_client_test.go
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/ddl/partition.go`:
- Around line 224-232: Guard the tici.AddPartition call so it is not retried
when the job is in StateReplicaOnly: detect the job state (jobCtx or job
object/state check in the current function) and skip calling tici.AddPartition
if the job is already in StateReplicaOnly (or otherwise ensure idempotency by
checking for an existing partition-add marker) and return success; add a short
comment immediately above the tici.AddPartition invocation stating this call
must remain before updatePartitionInfo because it relies on AddingDefinitions
being present and must not be reissued during StateReplicaOnly retries. Ensure
the guard uses the same job/state identifiers present in this function and
preserves the early-return/error behavior when appropriate.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 76c3f7dd-4f08-4f09-a187-e10abf5a04da
📒 Files selected for processing (2)
pkg/ddl/partition.gopkg/ddl/partition_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/ddl/partition_test.go
|
Addressed CodeRabbit threads + CI flake:
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/ddl/partition.go`:
- Around line 224-235: The TiCI AddPartition call is dropping per-index parser
configuration by passing nil for ParserInfo; update the add-partition path in
the block around getTiCIIndexIDs and the tici.AddPartition call so it does not
pass nil: collect each index's ParserInfo from tblInfo (from the index
definitions), then either (a) if the RPC supports only a single ParserInfo,
validate that all involved indexes share the same parser config and pass that
ParserInfo to tici.AddPartition, or (b) if configs differ, group indexes by
ParserInfo and invoke tici.AddPartition once per group with the corresponding
ParserInfo; ensure this logic lives before the TiCIPartitionAdded guard
(job.ReorgMeta.TiCIPartitionAdded) and uses jobCtx.stepCtx/jobCtx.ctx as before.
- Around line 230-240: The rollback path must undo the TiCI side-effect created
when job.ReorgMeta.TiCIPartitionAdded is set after tici.AddPartition succeeds;
modify rollbackLikeDropPartition to check job.ReorgMeta.TiCIPartitionAdded (and
use the same physicalTableIDs/schema/indexIDs context) and call
tici.DropPartition(ctx, store, tblInfo, job.SchemaName, physicalTableIDs)
(handle and log errors, and clear TiCIPartitionAdded via w.updateDDLJob) before
returning so TiCI metadata isn’t leaked; also add a regression test that forces
a job into rollback after tici.AddPartition has succeeded to verify
DropPartition is invoked and the flag is cleared.
In `@pkg/tici/tici_manager_client.go`:
- Around line 333-338: The AddPartition and DropPartition handling currently
returns dbterror.ErrInvalidDDLJob.FastGenByArgs(resp.ErrorMessage) without
including resp.Status or identifying context; update the error returned from the
RPC failure branches (the block referencing resp.Status, resp.ErrorMessage, and
req.IndexIds) to include the status code and relevant table/index identifiers
(e.g., req.IndexIds and any table ID/Name available in the request) alongside
resp.ErrorMessage so callers can see which partition RPC failed; construct a
descriptive error string combining resp.Status, resp.ErrorMessage, and the
request identifiers and pass that into FastGenByArgs (or wrap/create a
contextual error) for both AddPartition and DropPartition failure paths.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: fb18f6d7-435c-4469-8bcf-cdf3121d8c49
📒 Files selected for processing (4)
pkg/ddl/partition.gopkg/meta/model/reorg.gopkg/tici/tici_manager_client.gopkg/tici/tici_manager_client_test.go
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
pkg/tici/tici_manager_client.go (1)
916-927:⚠️ Potential issue | 🟡 MinorInclude TiCI
error_messagein terminalScanRangesfailure.The final returned error still excludes
resp.ErrorMessage, which makes root-cause diagnosis harder after retries are exhausted.Suggested fix
if !isRetryableGetShardLocalCacheStatus(resp.Status) || time.Since(start) >= maxWait { return nil, fmt.Errorf( - "GetShardLocalCacheInfo failed after %s (attempts=%d, keyspaceID=%d, tableID=%d, indexID=%d, ranges=%d, limit=%d): %s(%d)", + "GetShardLocalCacheInfo failed after %s (attempts=%d, keyspaceID=%d, tableID=%d, indexID=%d, ranges=%d, limit=%d): %s(%d): %s", time.Since(start).Round(time.Millisecond), attempt+1, request.KeyspaceId, tableID, indexID, len(keyRanges), limit, code.String(), resp.Status, + resp.ErrorMessage, ) }As per coding guidelines, "Keep error handling actionable and contextual; avoid silently swallowing errors".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/tici/tici_manager_client.go` around lines 916 - 927, The returned error from GetShardLocalCacheInfo (in the ScanRanges retry/failure path) omits resp.ErrorMessage; update the fmt.Errorf call that builds the final error (the block shown) to include resp.ErrorMessage alongside resp.Status and code.String() so the error message surfaces the TiCI error_message for diagnosing root causes—modify the format string and arguments in the failing return expression inside GetShardLocalCacheInfo to append or interpolate resp.ErrorMessage (e.g., include "error_message=%q" or similar) so the response's ErrorMessage is logged in the final error.
🧹 Nitpick comments (1)
pkg/ddl/partition_test.go (1)
332-334: Rollback cleanup path is still untested duet.Skip.This PR changes rollback behavior for TiCI partition cleanup, but the dedicated rollback test is disabled. Please make this deterministic (failpoint + state hook) and enable it in CI.
Based on learnings, "Applies to pkg/ddl/** : For DDL schema changes, perform DDL-focused unit/integration tests and compatibility impact checks."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/ddl/partition_test.go` around lines 332 - 334, The test TestRollbackAddPartitionDropsTiCIPartition is skipped via t.Skip, leaving the rollback cleanup path untested; remove t.Skip and make the test deterministic by injecting a failpoint (e.g., enable a rollback scheduling failpoint) and using a state hook or callback to force/observe the exact timing of the add-partition rollback path, then assert that the TiCI partition is dropped; locate the test function TestRollbackAddPartitionDropsTiCIPartition and replace the skip with setup code that activates the failpoint, registers the state hook to drive the scheduler to the rollback branch, runs the operation, verifies the TiCI cleanup, and then disables the failpoint so the test can run reliably in CI.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/ddl/partition.go`:
- Around line 2246-2251: The code currently unconditionally clears
job.ReorgMeta.TiCIPartitionAdded and persists it via w.updateDDLJob even if
DropPartition RPCs failed; change the logic so that TiCIPartitionAdded is only
set to false and updateDDLJob called after all DropPartition RPCs complete
successfully. Locate the rollback path handling TiCI partition drops (look for
the block that runs DropPartition RPCs and the lines setting
job.ReorgMeta.TiCIPartitionAdded = false and calling w.updateDDLJob) and modify
it to: 1) check the results/errors from each DropPartition, 2) if any fail,
return the error without mutating TiCIPartitionAdded, and 3) only when all drops
succeed set TiCIPartitionAdded = false and call w.updateDDLJob(jobCtx, job,
true).
In `@pkg/tici/tici_test_export_intest.go`:
- Around line 127-129: Replace the mock etcd client return in getEtcdClientFunc
so it returns (nil, nil) instead of &clientv3.Client{}; update the mock
installer to return nil for the etcd client because newManagerCtxFunc ignores
the client parameter and DataWriterGroup.Close() checks for nil before calling
Close, preventing a panic from a zero-value client. Ensure the signature of
getEtcdClientFunc remains func() (*clientv3.Client, error) and that callers
relying on newManagerCtxFunc continue to operate unchanged.
---
Duplicate comments:
In `@pkg/tici/tici_manager_client.go`:
- Around line 916-927: The returned error from GetShardLocalCacheInfo (in the
ScanRanges retry/failure path) omits resp.ErrorMessage; update the fmt.Errorf
call that builds the final error (the block shown) to include resp.ErrorMessage
alongside resp.Status and code.String() so the error message surfaces the TiCI
error_message for diagnosing root causes—modify the format string and arguments
in the failing return expression inside GetShardLocalCacheInfo to append or
interpolate resp.ErrorMessage (e.g., include "error_message=%q" or similar) so
the response's ErrorMessage is logged in the final error.
---
Nitpick comments:
In `@pkg/ddl/partition_test.go`:
- Around line 332-334: The test TestRollbackAddPartitionDropsTiCIPartition is
skipped via t.Skip, leaving the rollback cleanup path untested; remove t.Skip
and make the test deterministic by injecting a failpoint (e.g., enable a
rollback scheduling failpoint) and using a state hook or callback to
force/observe the exact timing of the add-partition rollback path, then assert
that the TiCI partition is dropped; locate the test function
TestRollbackAddPartitionDropsTiCIPartition and replace the skip with setup code
that activates the failpoint, registers the state hook to drive the scheduler to
the rollback branch, runs the operation, verifies the TiCI cleanup, and then
disables the failpoint so the test can run reliably in CI.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: a615a7e9-e353-45dc-bd4b-85c19d58a067
📒 Files selected for processing (9)
pkg/ddl/cancel_test.gopkg/ddl/index_modify_test.gopkg/ddl/partition.gopkg/ddl/partition_test.gopkg/ddl/tici_test_helper_test.gopkg/lightning/backend/local/local.gopkg/tici/tici_manager_client.gopkg/tici/tici_test_export_intest.gopkg/tici/tici_write.go
|
/retest |
|
/retest |
1 similar comment
|
/retest |
# Conflicts: # pkg/ddl/cancel_test.go # pkg/ddl/index_modify_test.go
|
/retest |
|
/test unit-test |
|
@wjhuang2016: The specified target(s) for Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test pull-br-integration-test |
|
@wjhuang2016: The specified target(s) for Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test pull-br-integration-test |
|
@wjhuang2016: The specified target(s) for Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test pull-br-integration-test |
|
@wjhuang2016: The specified target(s) for Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test pull-br-integration-test |
|
@wjhuang2016: The specified target(s) for The following commands are available to trigger optional jobs: Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@wjhuang2016: The specified target(s) for Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test pull-br-integration-test |
|
@yinshuangfei: The specified target(s) for Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
/retest |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: fzzf678, OliverS929, yinshuangfei The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
1 similar comment
|
/retest |
What problem does this PR solve?
Issue Number: ref #1793
Problem Summary:
When a partitioned table has TiCI global indexes,
ADD PARTITION/DROP PARTITIONshould also create/drop the corresponding TiCI index metadata for the physical partition tables. Previously TiDB only managed TiCI indexes at the logical-table level.What changed and how does it work?
AddPartition/DropPartitionRPCs and request/response messages topkg/tici/tici.proto(index IDs asrepeated int64).pkg/tici/tici_manager_client.goand wire them to DDL:tici.AddPartitionduringADD PARTITIONinStateReplicaOnlybeforeupdatePartitionInfo, soAddingDefinitionsare included in the requesttable_info.tici.DropPartitionduringDROP PARTITIONinStateDeleteReorganizationfor each dropped physical partition ID (best-effort; log warnings on failure).Check List
Tests
Test results:
make failpoint-enable && (rc=0; go test ./pkg/tici -run TestCreateFulltextIndex -tags=intest,deadlock || rc=$?; if [ $rc -eq 0 ]; then go test ./pkg/ddl -run TestAddDropPartitionWithTiCIIndex -tags=intest,deadlock || rc=$?; fi; make failpoint-disable; exit $rc)make bazel_lint_changedSide effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.
Summary by CodeRabbit
New Features
Bug Fixes
Tests