Storage: Fix TableScan performance regression under wide-sparse table (#10379) #10384

ti-chi-bot · 2025-08-29T08:56:47Z

This is an automated cherry-pick of #10379

What problem does this PR solve?

Issue Number: close #10361

Problem Summary:

What is changed and how it works?

Storage: Fix TableScan performance regression under wide-sparse table
* Use merged_file_info.size as the buffer size when reading data  (mark, min-max index, col-data) from merged file to minimize read amplification
* Use merged_file_info.size as the buffer size when parsing data as ChecksumReadBuffer to minimize the memory allocation overhead
* Introduce class `MinMaxIndexLoader` to tidy codes of reading min-max index

test logging output: test.log.zip

Tested on x86_64 8c 32GB (m7i.2xlarge), with 16000 iops, 625MB/s throughput gp3 EBS (amd_rockylinux9)
Compare the uncompressed read/write throughput using the dttool bench

Before this PR:

the write throughput has increase by more than 60% compared to v2 format
the read throughput decrease about 10%~20% after using v3 DMFile format. Especially on wide-sparse table.

sparse_ratio	v2 write throughput	v2 read throughput	v3 write throughput	v3 read throughput
0	132.409	1814.432	220.047 (+66.2%)	1671.053 (-7.9%)
0.05	132.893	1695.434	216.352 (+62.8%)	1486.356 (-12.3%)
0.1	129.537	1599.987	212.952 (+64.4%)	1375.033 (-14.1%)
0.5	130.932	1318.756	211.084 (+61.2%)	1120.210 (-15.1%)
0.8	137.630	1497.458	227.778 (+65.5%)	1216.922 (-18.7%)
0.9	149.608	1737.411	245.560 (+64.1%)	1361.528 (-21.6%)
0.99	162.110	2160.127	295.130 (+82.1%)	1564.444 (-27.6%)

After this PR:

the write throughput has increase by more than 60% compared to v2 format
the read throughput change is not significant. Regression is not larger than 7%

sparse_ratio	v2 write throughput	v2 read throughput	v3 write throughput	v3 read throughput
0	131.757	1771.513	221.029 (+67.8%)	1839.603 (+3.8%)
0.05	129.977	1683.676	220.039 (+69.3%)	1691.280 (+0.5%)
0.1	130.816	1580.936	211.728 (+61.9%)	1559.970 (-1.3%)
0.5	130.103	1337.525	211.439 (+62.5%)	1292.864 (-3.3%)
0.8	140.769	1479.410	227.881 (+61.9%)	1386.370 (-6.3%)
0.9	146.884	1664.910	244.082 (+66.2%)	1603.719 (-3.7%)
0.99	161.374	2098.286	291.528 (+80.7%)	2116.858 (+0.9%)

Compare the read/write throughput of v3 DMFile format before and after this PR:

The write throughput is almost not changed
The read throughput has increased a lot, especially on the sparse-table scenario

sparse_ratio	v3 (before) write throughput	v3 (before) read throughput	v3 (after) write throughput	v3 (after) read throughput
0	220.047	1671.053	221.029 (+0.4%)	1839.603 (+10.1%)
0.05	216.352	1486.356	220.039 (+1.7%)	1691.280 (+13.8%)
0.1	212.952	1375.033	211.728 (-0.6%)	1559.970 (+13.4%)
0.5	211.084	1120.210	211.439 (+0.2%)	1292.864 (+15.4%)
0.8	227.778	1216.922	227.881 (+0.0%)	1386.370 (+13.9%)
0.9	245.560	1361.528	244.082 (-0.6%)	1603.719 (+17.8%)
0.99	295.130	1564.444	291.528 (-1.2%)	2116.858 (+35.3%)

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)

#!/bin/bash

# Define arrays for sparse-ratio and version
sparse_ratios=(0 0.05 0.1 0.5 0.8 0.9 0.99)
#sparse_ratios=(0 0.1 0.9 0.99)
versions=(2 3)

# Define the base command and fixed arguments
base_cmd="beforefix/tiflash/tiflash"
sub_cmd="dttool bench"
fixed_args="--columns 600 --rows 131000 --field 12 --write-repeat 5 --repeat 20 --random 2021268696"

# Iterate through all combinations of sparse-ratio and version
for version in "${versions[@]}"; do
  for sparse_ratio in "${sparse_ratios[@]}"; do
    # Define a unique workdir for each run
    workdir="./tmp_v${version}_sr${sparse_ratio}"

    # Construct the full command
    cmd="${base_cmd} ${sub_cmd} ${fixed_args} --sparse-ratio ${sparse_ratio} --workdir ${workdir} --version ${version}"

    # Print the command being executed (optional, for logging/debugging)
    echo "Executing: $cmd"

    # Execute the command
    $cmd

    # Check if the command executed successfully
    if [ $? -ne 0 ]; then
      echo "Error: Command failed for version ${version}, sparse-ratio ${sparse_ratio}"
      # Optional: Add 'exit 1' here to stop the script on failure
      # exit 1
    fi

    echo "----------------------------------------"
  done
done

echo "All benchmarks completed."
echo "=================================="

base_cmd="afterfix/tiflash/tiflash"

# Iterate through all combinations of sparse-ratio and version
for version in "${versions[@]}"; do
  for sparse_ratio in "${sparse_ratios[@]}"; do
    # Define a unique workdir for each run
    workdir="./tmp_v${version}_sr${sparse_ratio}"

    # Construct the full command
    cmd="${base_cmd} ${sub_cmd} ${fixed_args} --sparse-ratio ${sparse_ratio} --workdir ${workdir} --version ${version}"

    # Print the command being executed (optional, for logging/debugging)
    echo "Executing: $cmd"

    # Execute the command
    $cmd

    # Check if the command executed successfully
    if [ $? -ne 0 ]; then
      echo "Error: Command failed for version ${version}, sparse-ratio ${sparse_ratio}"
      # Optional: Add 'exit 1' here to stop the script on failure
      # exit 1
    fi

    echo "----------------------------------------"
  done
done

echo "All benchmarks completed."

No code

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Fix a bug that cause TableScan performance regression under wide-sparse table

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

ti-chi-bot · 2025-08-29T08:56:51Z

@JaySon-Huang This PR has conflicts, I have hold it.
Please resolve them or ask others to resolve them, then comment /unhold to remove the hold label.

ti-chi-bot · 2025-08-29T08:56:53Z

@ti-chi-bot: ## If you want to know how to resolve it, please read the guide in TiDB Dev Guide.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Signed-off-by: JaySon-Huang <tshent@qq.com>

JaySon-Huang

lgtm

JaySon-Huang · 2025-09-01T01:22:57Z

/unhold
conflict are resolved

ti-chi-bot · 2025-09-01T02:31:08Z

[LGTM Timeline notifier]

Timeline:

2025-09-01 01:22:23.578644313 +0000 UTC m=+748835.456475257: ☑️ agreed by JaySon-Huang.
2025-09-01 02:31:07.679523344 +0000 UTC m=+752959.557354298: ☑️ agreed by Lloyd-Pottiger.

ti-chi-bot · 2025-09-01T02:47:03Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JaySon-Huang, JinheLin, Lloyd-Pottiger

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [JaySon-Huang,JinheLin,Lloyd-Pottiger]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2025-09-01T03:57:40Z

@ti-chi-bot: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-unit-test	`661ec84`	link	unknown	`/test pull-unit-test`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

JaySon-Huang · 2025-09-01T04:07:53Z

/test pull-unit-test

This is an automated cherry-pick of pingcap#10379

1938adc

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

ti-chi-bot mentioned this pull request Aug 29, 2025

Storage: Fix TableScan performance regression under wide-sparse table #10379

Merged

12 tasks

ti-chi-bot assigned JaySon-Huang Aug 29, 2025

ti-chi-bot bot added the do-not-merge/cherry-pick-not-approved label Aug 29, 2025

JaySon-Huang added 2 commits August 29, 2025 18:26

Resolve conflict

8b3ab25

Signed-off-by: JaySon-Huang <tshent@qq.com>

Backport the DTTool changes

661ec84

Signed-off-by: JaySon-Huang <tshent@qq.com>

JaySon-Huang approved these changes Sep 1, 2025

View reviewed changes

ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Sep 1, 2025

JaySon-Huang requested review from JinheLin and Lloyd-Pottiger September 1, 2025 01:22

ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 1, 2025

Lloyd-Pottiger approved these changes Sep 1, 2025

View reviewed changes

ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Sep 1, 2025

JinheLin approved these changes Sep 1, 2025

View reviewed changes

ti-chi-bot bot merged commit 7f54532 into pingcap:release-8.5 Sep 1, 2025
3 of 4 checks passed

ti-chi-bot bot deleted the cherry-pick-10379-to-release-8.5 branch September 1, 2025 04:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Storage: Fix TableScan performance regression under wide-sparse table (#10379) #10384

Storage: Fix TableScan performance regression under wide-sparse table (#10379) #10384

Uh oh!

ti-chi-bot commented Aug 29, 2025

Uh oh!

ti-chi-bot commented Aug 29, 2025

Uh oh!

ti-chi-bot bot commented Aug 29, 2025

Uh oh!

JaySon-Huang left a comment

Uh oh!

JaySon-Huang commented Sep 1, 2025

Uh oh!

ti-chi-bot bot commented Sep 1, 2025

Uh oh!

ti-chi-bot bot commented Sep 1, 2025

Uh oh!

ti-chi-bot bot commented Sep 1, 2025

Uh oh!

JaySon-Huang commented Sep 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Storage: Fix TableScan performance regression under wide-sparse table (#10379) #10384

Storage: Fix TableScan performance regression under wide-sparse table (#10379) #10384

Uh oh!

Conversation

ti-chi-bot commented Aug 29, 2025

What problem does this PR solve?

What is changed and how it works?

Check List

Release note

Uh oh!

ti-chi-bot commented Aug 29, 2025

Uh oh!

ti-chi-bot bot commented Aug 29, 2025

Uh oh!

JaySon-Huang left a comment

Choose a reason for hiding this comment

Uh oh!

JaySon-Huang commented Sep 1, 2025

Uh oh!

ti-chi-bot bot commented Sep 1, 2025

[LGTM Timeline notifier]

Uh oh!

ti-chi-bot bot commented Sep 1, 2025

Uh oh!

ti-chi-bot bot commented Sep 1, 2025

Uh oh!

JaySon-Huang commented Sep 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants