lightning: reuse props scan across batches for global sort (#66738) | tidb-test=release-8.1.2#67275
Conversation
|
Hi @GMHDBJD. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
📝 WalkthroughWalkthroughThis PR refactors the Lightning external backend to precompute per-key read ranges once via Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/lightning/backend/external/reader.go (1)
94-126: Consider updating the log message to match the function name.The log at line 114 says
"found hotspot file in getFilesReadConcurrency"but this function isgetReadConcurrencyFromOffsets. Since this function can be called from bothgetFilesReadConcurrencyandreadAllDataWithOffsets, a more generic message would be clearer.♻️ Suggested log message update
if expectedConc > 1 { - logutil.Logger(ctx).Info("found hotspot file in getFilesReadConcurrency", + logutil.Logger(ctx).Info("found hotspot file when computing read concurrency", zap.String("filename", statsFiles[i]),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/lightning/backend/external/reader.go` around lines 94 - 126, Update the log message emitted inside getReadConcurrencyFromOffsets so it no longer references getFilesReadConcurrency; change the string passed to logutil.Logger(...).Info (currently "found hotspot file in getFilesReadConcurrency") to a generic message such as "found hotspot file while computing read concurrency" or "found hotspot file in getReadConcurrencyFromOffsets" so logs accurately reflect the function (this affects the log call that also includes filename/startOffset/endOffset/expectedConc/concurrency).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/lightning/backend/external/reader.go`:
- Around line 94-126: Update the log message emitted inside
getReadConcurrencyFromOffsets so it no longer references
getFilesReadConcurrency; change the string passed to logutil.Logger(...).Info
(currently "found hotspot file in getFilesReadConcurrency") to a generic message
such as "found hotspot file while computing read concurrency" or "found hotspot
file in getReadConcurrencyFromOffsets" so logs accurately reflect the function
(this affects the log call that also includes
filename/startOffset/endOffset/expectedConc/concurrency).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 3097041b-c951-4476-a59e-a13ec3c4130e
📒 Files selected for processing (5)
pkg/lightning/backend/external/engine.gopkg/lightning/backend/external/engine_test.gopkg/lightning/backend/external/reader.gopkg/lightning/backend/external/util.gopkg/lightning/backend/external/util_test.go
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## feature/release-8.1-gsort-test #67275 +/- ##
===================================================================
Coverage ? 71.0178%
===================================================================
Files ? 1479
Lines ? 427559
Branches ? 0
===================================================================
Hits ? 303643
Misses ? 103289
Partials ? 20627
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
| for i := 0; i < 2; i++ { | ||
| data := <-outCh | ||
| data.Data.IncRef() | ||
| data.Data.DecRef() | ||
| } |
There was a problem hiding this comment.
The above check for openCount is enough?
There was a problem hiding this comment.
I think the current check is sufficient, because this test forces two batches while "statOpenCountStorage" only counts stat-file opens, so "openCount == len(statFiles)" directly verifies that each stat file is scanned once rather than once per batch.
| require.Equal(t, [][]uint64{{30, 20, 0, 30}, {50, 40, 0, 50}}, got) | ||
| } | ||
|
|
||
| func TestSeekPropsOffsetsConcurrencyLimit(t *testing.T) { |
There was a problem hiding this comment.
Maybe we can remove this test if we only want to ensure that eg.SetLimit(getReadRangeFromPropsConcurrency) would work. (Keep this is ok too)
|
@joechenrh: adding LGTM is restricted to approvers and reviewers in OWNERS files. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@OliverS929: adding LGTM is restricted to approvers and reviewers in OWNERS files. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Benjamin2037, joechenrh, OliverS929, wjhuang2016 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
8e72973
into
pingcap:feature/release-8.1-gsort-test
What problem does this PR solve?
Issue Number: ref #66812 #66738
Problem Summary:
LoadIngestDatascans stats files once per batch throughseekPropsOffsets.What changed and how does it work?
LoadIngestDatainstead of re-scanning stats files per batch.getReadRangeFromPropsConcurrency = 64, matching#66738.Check List
Tests
Side effects
Documentation
Release note
Summary by CodeRabbit
Refactor
Tests