dxf: make max concurrent task configurable#69645
Conversation
📝 WalkthroughWalkthroughReplaces the exported ChangesMax Concurrent Task Configuration
Estimated code review effort: 3 (Moderate) | ~25 minutes Sequence Diagram(s)sequenceDiagram
participant Client
participant HTTPServer
participant DXFTaskMaxConcurrentHandler
participant proto
Client->>HTTPServer: GET/POST /dxf/task/max_concurrent
HTTPServer->>DXFTaskMaxConcurrentHandler: ServeHTTP(req)
alt POST with value
DXFTaskMaxConcurrentHandler->>proto: SetMaxConcurrentTask(value)
proto-->>DXFTaskMaxConcurrentHandler: ok or range error
else GET
DXFTaskMaxConcurrentHandler->>proto: GetMaxConcurrentTask()
proto-->>DXFTaskMaxConcurrentHandler: current value
end
DXFTaskMaxConcurrentHandler-->>Client: JSON {max_concurrent_task}
Suggested labels: Suggested reviewers: Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
pkg/dxf/framework/proto/task.go (1)
97-104: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueDocument the validation-bypass and parallel-test caveat.
SetMaxConcurrentTaskForTestintentionally skips the[Min, Max]range checkSetMaxConcurrentTaskenforces (callers pass values like1,3,4,6belowMinMaxConcurrentTask). It also mutates process-global state, so liket.Setenv/t.Chdirit isn't safe to use from parallel tests. Neither point is captured in the doc comment.📝 Suggested doc comment
-// SetMaxConcurrentTaskForTest updates the max concurrency of task and returns a restore function. +// SetMaxConcurrentTaskForTest updates the max concurrency of task for test purposes, +// bypassing the [MinMaxConcurrentTask, MaxMaxConcurrentTask] validation done by +// SetMaxConcurrentTask, and returns a function that restores the previous value. +// Since this mutates process-global state, it must not be used from parallel tests. func SetMaxConcurrentTaskForTest(value int) func() {As per coding guidelines, "Comments SHOULD explain non-obvious intent, constraints, invariants, concurrency guarantees, SQL/compatibility contracts, or important performance trade-offs."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dxf/framework/proto/task.go` around lines 97 - 104, Update the doc comment on SetMaxConcurrentTaskForTest to mention that it intentionally bypasses the MinMaxConcurrentTask validation enforced by SetMaxConcurrentTask and that it mutates process-global state, so it must not be used from parallel tests. Keep the note close to the function name so callers understand the validation-bypass and concurrency caveat when using SetMaxConcurrentTaskForTest.Source: Coding guidelines
pkg/dxf/framework/scheduler/scheduler_manager.go (1)
158-158: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueConsider a brief comment explaining why
finishChis sized toMaxMaxConcurrentTask.Since the channel capacity is fixed at creation time but
GetMaxConcurrentTask()can later be raised at runtime (up toMaxMaxConcurrentTask), sizing to the static upper bound rather than the current/default value is the correct choice to avoid blocking on future increases. This rationale isn't obvious from the code alone.📝 Suggested comment
- finishCh: make(chan struct{}, proto.MaxMaxConcurrentTask), + // sized to the upper bound since GetMaxConcurrentTask() can be raised + // at runtime after this channel is created. + finishCh: make(chan struct{}, proto.MaxMaxConcurrentTask),As per coding guidelines, "Comments SHOULD explain non-obvious intent, constraints, invariants, concurrency guarantees... SHOULD NOT restate what the code already makes clear."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dxf/framework/scheduler/scheduler_manager.go` at line 158, Add a brief explanatory comment at the finishCh initialization in scheduler_manager.go to capture the non-obvious intent: it is sized to proto.MaxMaxConcurrentTask because GetMaxConcurrentTask() may be increased later at runtime, so the channel must be preallocated to the highest possible concurrency to avoid blocking. Keep the comment near the schedulerManager construction / finishCh field assignment so the rationale is tied to the finishCh capacity choice.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@pkg/dxf/framework/proto/task.go`:
- Around line 97-104: Update the doc comment on SetMaxConcurrentTaskForTest to
mention that it intentionally bypasses the MinMaxConcurrentTask validation
enforced by SetMaxConcurrentTask and that it mutates process-global state, so it
must not be used from parallel tests. Keep the note close to the function name
so callers understand the validation-bypass and concurrency caveat when using
SetMaxConcurrentTaskForTest.
In `@pkg/dxf/framework/scheduler/scheduler_manager.go`:
- Line 158: Add a brief explanatory comment at the finishCh initialization in
scheduler_manager.go to capture the non-obvious intent: it is sized to
proto.MaxMaxConcurrentTask because GetMaxConcurrentTask() may be increased later
at runtime, so the channel must be preallocated to the highest possible
concurrency to avoid blocking. Keep the comment near the schedulerManager
construction / finishCh field assignment so the rationale is tied to the
finishCh capacity choice.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 43b68bc5-91bc-430a-874c-a824f0c18867
📒 Files selected for processing (14)
pkg/dxf/framework/integrationtests/bench_test.gopkg/dxf/framework/proto/BUILD.bazelpkg/dxf/framework/proto/task.gopkg/dxf/framework/proto/task_test.gopkg/dxf/framework/scheduler/interface.gopkg/dxf/framework/scheduler/scheduler_manager.gopkg/dxf/framework/scheduler/scheduler_manager_nokit_test.gopkg/dxf/framework/scheduler/scheduler_test.gopkg/dxf/framework/storage/table_test.gopkg/dxf/framework/storage/task_table.gopkg/server/handler/tests/dxf_test.gopkg/server/handler/tikvhandler/dxf.gopkg/server/http_status.gotests/realtikvtest/addindextest1/disttask_test.go
|
@D3Hunter: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
8650f6f to
ab0568e
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #69645 +/- ##
================================================
- Coverage 76.3268% 74.0981% -2.2288%
================================================
Files 2041 2050 +9
Lines 560589 576634 +16045
================================================
- Hits 427880 427275 -605
- Misses 131808 149151 +17343
+ Partials 901 208 -693
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
🔍 Starting code review for this PR... |
ingress-bot
left a comment
There was a problem hiding this comment.
This review was generated by AI and should be verified by a human reviewer.
Manual follow-up is recommended before merge.
Summary
- Total findings: 8
- Inline comments: 8
- Summary-only findings (no inline anchor): 0
Findings (highest risk first)
🟡 [Minor] (4)
DefaultMaxConcurrentTaskandMinMaxConcurrentTaskare co-equal independent literals with no explicit link, creating silent drift when either changes (pkg/dxf/framework/proto/task.go:89, pkg/dxf/framework/proto/task.go:91, pkg/dxf/framework/proto/task.go:98)- MaxMaxConcurrentTask and MinMaxConcurrentTask use double-compound naming (pkg/dxf/framework/proto/task.go:91, pkg/dxf/framework/proto/task.go:93, pkg/dxf/framework/scheduler/scheduler_manager.go:158)
SetMaxConcurrentTaskForTestis exported from a non-test file with no compile-time restriction to test code (pkg/dxf/framework/proto/task.go:118)- max_concurrent knob is process-local and ephemeral, unlike the persisted sibling DXF tune endpoint (pkg/server/handler/tikvhandler/dxf.go:368, pkg/server/http_status.go:258, pkg/dxf/framework/proto/task.go:120)
🧹 [Nit] (4)
- finishCh buffer size change to MaxMaxConcurrentTask lacks an invariant comment (pkg/dxf/framework/scheduler/scheduler_manager.go:158)
- Three mid-test calls to
SetMaxConcurrentTaskForTestsilently discard the restore function, using it as a plain setter (pkg/dxf/framework/storage/table_test.go:553, pkg/dxf/framework/storage/table_test.go:559, pkg/dxf/framework/storage/table_test.go:564) - DXFTaskMaxConcurrentHandler.ServeHTTP missing doc comment and persistence contract (pkg/server/handler/tikvhandler/dxf.go:355, pkg/server/http_status.go:419)
- Original TODO 'remove this limit later' was silently dropped without noting disposition (pkg/dxf/framework/proto/task.go:87)
|
/cherry-pick release-nextgen-202603 |
|
@D3Hunter: once the present PR merges, I will cherry-pick it on top of release-nextgen-202603 in the new PR and assign it to you. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
What problem does this PR solve?
Issue Number: ref #61702
Problem Summary:
DXF service runs in the SYSTEM keyspace for nextgen clusters and shares a resource pool across user keyspaces. The framework-wide max concurrent task limit was a fixed in-memory package variable, which made the limit hard to tune when many small tasks were blocked by the default scheduler concurrency.
What changed and how does it work?
This PR makes the DXF max concurrent task limit atomically configurable in memory:
proto.MaxConcurrentTaskglobal with atomic-backed helpers.16, caps runtime updates at1000, and keeps the upper cap available for scheduler buffer sizing.GET /dxf/schedule/max_concurrent_taskandPOST /dxf/schedule/max_concurrent_task?value=<n>under the existing SYSTEM-keyspace DXF HTTP route block.persistence: "memory_only"so callers know the update is not persisted.finishChcapacity invariant.The value is intentionally not persisted in this first version; it only changes the in-memory value of the TiDB process receiving the HTTP request. Operators should send the request to the current DXF owner when tuning scheduler concurrency.
Check List
Tests
Manual test details:
go test -run TestMaxConcurrentTask -tags=intest,deadlock -count=1 ./pkg/dxf/framework/proto./tools/check/failpoint-go-test.sh pkg/server/handler/tests -run 'TestDXFAPI/max.*concurrent.*task.*api' -count=1 -tags=intest,deadlock,nextgen./tools/check/failpoint-go-test.sh pkg/dxf/framework/storage -run TestGetTopUnfinishedTasks -count=1./tools/check/failpoint-go-test.sh pkg/dxf/framework/scheduler -run '^$' -count=1make lintgit diff --checkgit diff --cached --checkmake bazel_preparewas not required: no Go files were added, removed, renamed, or moved; no Go import sections changed; no Bazel or module files changed; and no new top-level Go test function was added.Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.
Summary by CodeRabbit
New Features
Bug Fixes
Tests