*: Optimize the underlying SegmentReader concurrency for TableScan under disagg arch by JaySon-Huang · Pull Request #10522 · pingcap/tiflash

JaySon-Huang · 2025-11-03T09:03:01Z

What problem does this PR solve?

Issue Number: ref #10356

Problem Summary: The query performance under disagg arch is slow when compute node local cache missed.

The main reason is that SegmentReaderPool default size is vcore * dt_read_thread_count_scale, which is vcore * 2, and StorageDisaggregated creates SegmentReadTaskPool with max_active_segment = num_stream. When cache missed, SegmentReader will perform blocking IO calling S3 API. So the speed of reading data from the TableScan (which is reading from the SegmentReaderPool) is not sufficient for the Pipeline model executing other computing.

The best way is to refine the StorageLayer reading logic and let it yield the current SegmentReaderTask from the SegmentReaderPool if it require network IO from remote storage service and let another SegmentReaderTask has chance for executing reading data from local cache. But it require lots of efforts.

** Now we increase the underlying SegmentReader concurrency for TableScan speed when cache miss under disagg arch. **

What is changed and how it works?

*: Optimize the underlying SegmentReader concurrency for TableScan under disagg arch
  - Adjust the concurrency under disagg
    * `SegmentReaderPoolManager` will init the SegmentReaderPool with size = vcore * dt_read_thread_count_scale (2.0) * 10 for disagg compute node
    * `StorageDisaggregated` will create SegmentReadTaskPool with max_active_segment = num_stream * 10 for disagg read task
    * `initThreadPool` will generate thread pool with 6*vcore threads at max for `BuildReadTaskForWNPool`/`BuildReadTaskForWNTablePool`/`BuildReadTaskPool`/`RNWritePageCachePool`
  - ScanDetails changes under disagg
    * Add rows_per_sec and bytes_per_sec for TableScan that summing from all concurrency
    * Fix num_columns and read_mode in scan_details
    * Fix the logging of `SegmentReadTaskPool` does not show mpp_task_id correctly
    * Add logging about finish building tasks from write node response
  - Add a http API /tiflash/remote/cache/evict for evicting local cache on compute node for testing

Tested with chbenchmark 8000

	First query after CN restarted				Following 4 query avg
	q1 latency (seconds)	q1 TableScan (seconds)	TableScan bytes_per_sec	TableScan rows_per_sec	q1 latency (seconds)	q1 TableScan (seconds)	TableScan bytes_per_sec	TableScan rows_per_sec
Optimal baseline (fully hit local cache on CN)	6.44	3.67			6.42	3.27	13178 MiB/s	600,678,349
vcore=8 max_active_seg=vcore	95.70	92.60	614 MiB/s	28,020,696	50.68	47.50	1182 MiB/s	53,901,421
vcore=8 max_active_seg=5*vcore	19.15	15.80	3973 MiB/s	181,027,573	8.66	5.36	12564 MiB/s	572,858,663
vcore=8 max_active_seg=10*vcore	10.96	7.54	10516 MiB/s	479,218,168	8.42	4.78	19440 MiB/s	886,323,146

Deploy next-gen cluster
- 2 compute node with vcore limited to 8, local cache disabled

    - host: 172.31.10.1
...
      config:
        flash.disaggregated_mode: tiflash_compute
        storage.main.dir:
            - /tidb-deploy/tiflash-9000/data
        storage.remote.cache.capacity: 200000000000
        storage.remote.cache.dir: /tidb-deploy/tiflash-9000/remote_cache
        storage.remote.cache.dtfile_level: 0
        tcp_port: 9000
      resource_control:
        cpu_quota: 800%
    - host: 172.31.10.2
...
      config:
        flash.disaggregated_mode: tiflash_compute
        storage.main.dir:
            - /tidb-deploy/tiflash-9000/data
        storage.remote.cache.capacity: 200000000000
        storage.remote.cache.dir: /tidb-deploy/tiflash-9000/remote_cache
        storage.remote.cache.dtfile_level: 0
        tcp_port: 9000
      resource_control:
        cpu_quota: 800%

BR restore chbenchmark 8000 to the cluster
- chbenchmark8k.order_line involve about 2610 segment
Run chbenchmark AP query 1 on the static dataset

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Optimize the TableScan performance under disagg arch

Signed-off-by: JaySon-Huang <tshent@qq.com>

JaySon-Huang · 2025-11-03T14:48:39Z

/test pull-unit-test

JaySon-Huang · 2025-11-03T14:50:19Z

@JinheLin @Lloyd-Pottiger @CalvinNeo PTAL

Signed-off-by: JaySon-Huang <tshent@qq.com>

CalvinNeo

lgtm

ti-chi-bot · 2025-11-04T02:00:59Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CalvinNeo, JinheLin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [CalvinNeo,JinheLin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2025-11-04T02:01:01Z

[LGTM Timeline notifier]

Timeline:

2025-11-04 01:43:32.737987773 +0000 UTC m=+148062.181017642: ☑️ agreed by JinheLin.
2025-11-04 02:01:00.514681815 +0000 UTC m=+149109.957711694: ☑️ agreed by CalvinNeo.

JaySon-Huang · 2025-11-04T02:06:51Z

/cherry-pick release-nextgen-20251011

ti-chi-bot · 2025-11-04T02:07:54Z

@JaySon-Huang: new pull request created to branch release-nextgen-20251011: #10523.

Details

In response to this:

/cherry-pick release-nextgen-20251011

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

…der disagg arch (#10522) (#10523) ref #10356 *: Optimize the underlying SegmentReader concurrency for TableScan under disagg arch - Adjust the concurrency under disagg * `SegmentReaderPoolManager` will init the SegmentReaderPool with size = vcore * dt_read_thread_count_scale (2.0) * 10 for disagg compute node * `StorageDisaggregated` will create SegmentReadTaskPool with max_active_segment = num_stream * 10 for disagg read task * `initThreadPool` will generate thread pool with 6*vcore threads at max for `BuildReadTaskForWNPool`/`BuildReadTaskForWNTablePool`/`BuildReadTaskPool`/`RNWritePageCachePool` - ScanDetails changes under disagg * Add rows_per_sec and bytes_per_sec for TableScan that summing from all concurrency * Fix num_columns and read_mode in scan_details * Fix the logging of `SegmentReadTaskPool` does not show mpp_task_id correctly * Add logging about finish building tasks from write node response - Add a http API /tiflash/remote/cache/evict for evicting local cache on compute node for testing Signed-off-by: JaySon-Huang <tshent@qq.com> Co-authored-by: JaySon-Huang <tshent@qq.com>

Add evict http api

d352b48

Signed-off-by: JaySon-Huang <tshent@qq.com>

JaySon-Huang added 2 commits November 3, 2025 17:04

Add more logging to scan_details and fix fields

753d36d

Signed-off-by: JaySon-Huang <tshent@qq.com>

Increase the default scale for disagg cn

59eae76

Signed-off-by: JaySon-Huang <tshent@qq.com>

JaySon-Huang force-pushed the opt_disagg_concurrency branch from 0586ebe to 59eae76 Compare November 3, 2025 11:19

Fix lint

a3eba82

Signed-off-by: JaySon-Huang <tshent@qq.com>

JaySon-Huang changed the title ~~[WIP] *:Opt disagg concurrency~~ *: Optimize the underlying SegmentReader concurrency for TableScan under disagg arch Nov 3, 2025

ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 3, 2025

Increase scale to 10

ea06e57

Signed-off-by: JaySon-Huang <tshent@qq.com>

ti-chi-bot bot removed the do-not-merge/needs-linked-issue label Nov 3, 2025

JaySon-Huang requested review from CalvinNeo, JinheLin and Lloyd-Pottiger November 3, 2025 14:30

ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Nov 3, 2025

Add metrics about running MergedTask

5bd0e5c

Signed-off-by: JaySon-Huang <tshent@qq.com>

JinheLin approved these changes Nov 4, 2025

View reviewed changes

ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Nov 4, 2025

CalvinNeo approved these changes Nov 4, 2025

View reviewed changes

ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Nov 4, 2025

ti-chi-bot bot merged commit 09ca448 into pingcap:master Nov 4, 2025
7 checks passed

JaySon-Huang deleted the opt_disagg_concurrency branch November 4, 2025 02:04

ti-chi-bot mentioned this pull request Nov 4, 2025

*: Optimize the underlying SegmentReader concurrency for TableScan under disagg arch (#10522) #10523

Merged

12 tasks

JaySon-Huang mentioned this pull request Dec 1, 2025

[next gen on clould] tiflash query is slow when cache missed #10356

Closed

JaySon-Huang mentioned this pull request Dec 12, 2025

[next gen on cloud] tiflash query failed with error ”1105 (other error for mpp stream: Code: 49, e.displayText() = DB::Exception: Check status.ok() failed: Failed to fetch all pages“ during tiflash wn rolling restart #10513

Closed

JaySon-Huang mentioned this pull request Dec 23, 2025

disagg: Support cache manipulating http API under disaggregated arch #10625

Merged

12 tasks

ti-chi-bot mentioned this pull request Dec 23, 2025

disagg: Support cache manipulating http API under disaggregated arch (#10625) #10628

Merged

12 tasks

JaySon-Huang mentioned this pull request Mar 18, 2026

Avoid MPP queries exhaust tiflash-compute network under disagg arch #10752

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

*: Optimize the underlying SegmentReader concurrency for TableScan under disagg arch#10522

*: Optimize the underlying SegmentReader concurrency for TableScan under disagg arch#10522
ti-chi-bot[bot] merged 6 commits intopingcap:masterfrom
JaySon-Huang:opt_disagg_concurrency

JaySon-Huang commented Nov 3, 2025 •

edited

Loading

Uh oh!

JaySon-Huang commented Nov 3, 2025

Uh oh!

JaySon-Huang commented Nov 3, 2025

Uh oh!

CalvinNeo left a comment

Uh oh!

ti-chi-bot bot commented Nov 4, 2025

Uh oh!

ti-chi-bot bot commented Nov 4, 2025

Uh oh!

Uh oh!

JaySon-Huang commented Nov 4, 2025

Uh oh!

ti-chi-bot commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

JaySon-Huang commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What is changed and how it works?

Tested with chbenchmark 8000

Check List

Release note

Uh oh!

JaySon-Huang commented Nov 3, 2025

Uh oh!

JaySon-Huang commented Nov 3, 2025

Uh oh!

CalvinNeo left a comment

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot bot commented Nov 4, 2025

Uh oh!

ti-chi-bot bot commented Nov 4, 2025

[LGTM Timeline notifier]

Uh oh!

Uh oh!

JaySon-Huang commented Nov 4, 2025

Uh oh!

ti-chi-bot commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JaySon-Huang commented Nov 3, 2025 •

edited

Loading