Skip to content

lightning: reuse props scan across batches for global sort (#66738) | tidb-test=release-8.1.2#67275

Merged
ti-chi-bot[bot] merged 2 commits intopingcap:feature/release-8.1-gsort-testfrom
GMHDBJD:backport-seekprops-limit-8.1
Mar 25, 2026
Merged

lightning: reuse props scan across batches for global sort (#66738) | tidb-test=release-8.1.2#67275
ti-chi-bot[bot] merged 2 commits intopingcap:feature/release-8.1-gsort-testfrom
GMHDBJD:backport-seekprops-limit-8.1

Conversation

@GMHDBJD
Copy link
Copy Markdown
Collaborator

@GMHDBJD GMHDBJD commented Mar 25, 2026

What problem does this PR solve?

Issue Number: ref #66812 #66738

Problem Summary:

  • LoadIngestData scans stats files once per batch through seekPropsOffsets.
  • With many batches this repeatedly opens the same stats files and amplifies object-store pressure.

What changed and how does it work?

  • Reuse props offsets across batches in LoadIngestData instead of re-scanning stats files per batch.
  • Cap props-file scan concurrency with getReadRangeFromPropsConcurrency = 64, matching #66738.
  • Add regression tests for offset reuse and the scan concurrency limit.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Summary by CodeRabbit

  • Refactor

    • Optimized data ingestion by pre-computing read offset ranges and reusing them across batches, reducing redundant file operations.
    • Improved concurrency handling for file operations with configurable concurrency limits to prevent system overload.
  • Tests

    • Added tests to verify stat file ranges are efficiently reused across data batch processing.
    • Added test to validate concurrency limits are properly enforced during file scanning.

@GMHDBJD GMHDBJD added the release-note-none Denotes a PR that doesn't merit a release note. label Mar 25, 2026
@ti-chi-bot ti-chi-bot Bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Mar 25, 2026
@tiprow
Copy link
Copy Markdown

tiprow Bot commented Mar 25, 2026

Hi @GMHDBJD. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 25, 2026

📝 Walkthrough

Walkthrough

This PR refactors the Lightning external backend to precompute per-key read ranges once via getReadRangeFromProps and pass them through the read pipeline, rather than computing concurrency and offsets inline. New helper functions (readAllDataWithOffsets, getReadConcurrencyFromOffsets, readAllDataWithConcurrency) centralize the parallel reading logic, and a concurrency limit is applied to stats file scanning.

Changes

Cohort / File(s) Summary
Control Flow in Engine
pkg/lightning/backend/external/engine.go
LoadIngestData now precomputes read ranges via getReadRangeFromProps and passes startOffsets/estimatedEndOffsets to loadRangeBatch, which delegates to readAllDataWithOffsets instead of inline computation. getFilesReadConcurrency simplified to delegate offset estimation.
Read Path Refactoring
pkg/lightning/backend/external/reader.go
Introduced readAllDataWithOffsets and getReadConcurrencyFromOffsets helpers to accept pre-computed offsets. Created readAllDataWithConcurrency to centralize parallel reading logic. readAllData simplified to call new helpers.
Range Computation Utility
pkg/lightning/backend/external/util.go
Added getReadRangeFromProps to convert job keys to read ranges and getReadRangeFromPropsConcurrency limit (default 64) to cap parallel stats file scanning in seekPropsOffsets.
Test Coverage
pkg/lightning/backend/external/engine_test.go, pkg/lightning/backend/external/util_test.go
Added TestLoadIngestDataReusesStatRangesAcrossBatches to verify stat ranges are computed once and reused. Added TestSeekPropsOffsetsConcurrencyLimit with instrumented storage wrapper to validate concurrent file open limit enforcement.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

size/XL, ok-to-test

Suggested reviewers

  • D3Hunter

Poem

🐰 Offsets computed before we race,
No more inline within the place,
Ranges reused, concurrency capped,
The reading path now tightly wrapped!
Hop forward, batch, with grace and speed!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.69% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: reusing props scan across batches to improve performance, with reference to the related issue.
Description check ✅ Passed PR description includes required sections with specific issue references, problem summary, technical changes, test checklist, and release notes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/lightning/backend/external/reader.go (1)

94-126: Consider updating the log message to match the function name.

The log at line 114 says "found hotspot file in getFilesReadConcurrency" but this function is getReadConcurrencyFromOffsets. Since this function can be called from both getFilesReadConcurrency and readAllDataWithOffsets, a more generic message would be clearer.

♻️ Suggested log message update
 		if expectedConc > 1 {
-			logutil.Logger(ctx).Info("found hotspot file in getFilesReadConcurrency",
+			logutil.Logger(ctx).Info("found hotspot file when computing read concurrency",
 				zap.String("filename", statsFiles[i]),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/lightning/backend/external/reader.go` around lines 94 - 126, Update the
log message emitted inside getReadConcurrencyFromOffsets so it no longer
references getFilesReadConcurrency; change the string passed to
logutil.Logger(...).Info (currently "found hotspot file in
getFilesReadConcurrency") to a generic message such as "found hotspot file while
computing read concurrency" or "found hotspot file in
getReadConcurrencyFromOffsets" so logs accurately reflect the function (this
affects the log call that also includes
filename/startOffset/endOffset/expectedConc/concurrency).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/lightning/backend/external/reader.go`:
- Around line 94-126: Update the log message emitted inside
getReadConcurrencyFromOffsets so it no longer references
getFilesReadConcurrency; change the string passed to logutil.Logger(...).Info
(currently "found hotspot file in getFilesReadConcurrency") to a generic message
such as "found hotspot file while computing read concurrency" or "found hotspot
file in getReadConcurrencyFromOffsets" so logs accurately reflect the function
(this affects the log call that also includes
filename/startOffset/endOffset/expectedConc/concurrency).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 3097041b-c951-4476-a59e-a13ec3c4130e

📥 Commits

Reviewing files that changed from the base of the PR and between b7f2487 and c029ae3.

📒 Files selected for processing (5)
  • pkg/lightning/backend/external/engine.go
  • pkg/lightning/backend/external/engine_test.go
  • pkg/lightning/backend/external/reader.go
  • pkg/lightning/backend/external/util.go
  • pkg/lightning/backend/external/util_test.go

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 25, 2026

Codecov Report

❌ Patch coverage is 90.21739% with 9 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (feature/release-8.1-gsort-test@b7f2487). Learn more about missing BASE report.

Additional details and impacted files
@@                         Coverage Diff                         @@
##             feature/release-8.1-gsort-test     #67275   +/-   ##
===================================================================
  Coverage                                  ?   71.0178%           
===================================================================
  Files                                     ?       1479           
  Lines                                     ?     427559           
  Branches                                  ?          0           
===================================================================
  Hits                                      ?     303643           
  Misses                                    ?     103289           
  Partials                                  ?      20627           
Flag Coverage Δ
unit 71.0178% <90.2173%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.9656% <0.0000%> (?)
parser ∅ <0.0000%> (?)
br 41.6092% <0.0000%> (?)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines +477 to +481
for i := 0; i < 2; i++ {
data := <-outCh
data.Data.IncRef()
data.Data.DecRef()
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above check for openCount is enough?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current check is sufficient, because this test forces two batches while "statOpenCountStorage" only counts stat-file opens, so "openCount == len(statFiles)" directly verifies that each stat file is scanned once rather than once per batch.

require.Equal(t, [][]uint64{{30, 20, 0, 30}, {50, 40, 0, 50}}, got)
}

func TestSeekPropsOffsetsConcurrencyLimit(t *testing.T) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can remove this test if we only want to ensure that eg.SetLimit(getReadRangeFromPropsConcurrency) would work. (Keep this is ok too)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Mar 25, 2026

@joechenrh: adding LGTM is restricted to approvers and reviewers in OWNERS files.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Copy Markdown
Collaborator

@Benjamin2037 Benjamin2037 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot Bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Mar 25, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Mar 25, 2026

@OliverS929: adding LGTM is restricted to approvers and reviewers in OWNERS files.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot Bot added the lgtm label Mar 25, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Mar 25, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Benjamin2037, joechenrh, OliverS929, wjhuang2016

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Mar 25, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Mar 25, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-03-25 09:30:18.690300235 +0000 UTC m=+347014.726370495: ☑️ agreed by Benjamin2037.
  • 2026-03-25 11:48:12.197854186 +0000 UTC m=+355288.233924436: ☑️ agreed by wjhuang2016.

@ti-chi-bot ti-chi-bot Bot merged commit 8e72973 into pingcap:feature/release-8.1-gsort-test Mar 25, 2026
19 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants