executor: fix slow log push down bug#68765
Conversation
Signed-off-by: lance6716 <lance6716@gmail.com>
|
@lance6716 I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details. ⏳ This process typically takes 10-30 minutes depending on the complexity of the changes. ℹ️ Learn more details on Pantheon AI. |
Signed-off-by: lance6716 <lance6716@gmail.com>
|
Hi @lance6716. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds a 1s internal tolerance for slow-log time comparisons, applies it to reverse-scanner initialization and file-pruning overlap checks, and extracts line-splitting logic; tests updated with cross-chunk and millisecond-jitter scenarios. ChangesSlow-log reverse-scan and file-filtering tolerance
Sequence DiagramsequenceDiagram
participant Client
participant ReverseScanner
participant FileFilter
participant TimeHelper
Client->>ReverseScanner: request slow-log for time range
ReverseScanner->>TimeHelper: slowLogTimeWithTolerance(minStart, -1s)
TimeHelper-->>ReverseScanner: adjusted minStartTime
ReverseScanner->>FileFilter: candidate files request (range)
FileFilter->>TimeHelper: slowLogMayOverlapTimeRangeWithTolerance(file.start, file.end, range, tz)
TimeHelper-->>FileFilter: overlap true/false (±1s)
FileFilter-->>ReverseScanner: filtered candidate files
ReverseScanner-->>Client: return rows (no gaps)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Tools execution failed with the following error: Failed to run tools: 13 INTERNAL: Received RST_STREAM with code 2 (Internal server error) Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR fixes slow-log reverse scan behavior so Dashboard-style descending queries with LIMIT do not miss rows when log blocks cross read chunks or when adjacent slow-log timestamps drift slightly out of order.
Changes:
- Adds a 1s internal tolerance for slow-log file pruning and reverse-scan stop checks while preserving exact row-level time filtering.
- Avoids synthesizing a trailing empty line when splitting reverse-read slow-log chunks.
- Adds/updates unit tests for reverse scan chunk-boundary handling and timestamp jitter.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
pkg/executor/slow_query.go |
Applies internal slow-log time tolerance and fixes reverse chunk line splitting. |
pkg/executor/slow_query_test.go |
Adds regression coverage for cross-chunk reverse scan and time jitter behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
introduced in #65740 |
|
/cherry-pick release-nextgen-202603 |
|
@D3Hunter: once the present PR merges, I will cherry-pick it on top of release-nextgen-202603 in the new PR and assign it to you. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #68765 +/- ##
================================================
- Coverage 76.3105% 75.3551% -0.9554%
================================================
Files 2041 2023 -18
Lines 563465 567225 +3760
================================================
- Hits 429983 427433 -2550
- Misses 132566 139758 +7192
+ Partials 916 34 -882
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
D3Hunter
left a comment
There was a problem hiding this comment.
Summary
- Total findings: 0
- Inline comments: 0
- Summary-only findings (no inline anchor): 0
Findings (highest risk first)
No findings.
D3Hunter
left a comment
There was a problem hiding this comment.
Summary
- Total findings: 3
- Inline comments: 3
- Summary-only findings (no inline anchor): 0
Findings (highest risk first)
🟡 [Minor] (3)
- Tolerant overlap helper reads like an exact time-range predicate (
pkg/executor/slow_query.go:1381) - Cross-chunk regression is folded into the limit test (
pkg/executor/slow_query_test.go:874) - Reverse jitter test can pass with the out-of-window row (
pkg/executor/slow_query_test.go:966)
| return ret, err | ||
| } | ||
|
|
||
| func slowLogMayOverlapTimeRange(start, end types.Time, tr *timeRange, tz *time.Location) bool { |
There was a problem hiding this comment.
🟡 [Minor] Tolerant overlap helper reads like an exact time-range predicate
Why
The new helper name communicates a generic overlap check, but the implementation widens both sides of the range by slowLogTimeRangeInternalTolerance. That widened behavior is intentionally only for pruning and reverse-scan stop checks, not for final row filtering.
Scope
pkg/executor/slow_query.go:1381
Risk if unchanged
A future caller can reasonably reuse slowLogMayOverlapTimeRange where an exact time-range predicate is needed, which would allow rows outside the requested range to look eligible before slowLogChecker filters them. The intended boundary between tolerant pruning and exact filtering becomes easy to miss.
Evidence
slowLogMayOverlapTimeRange computes rangeStart and rangeEnd by applying slowLogTimeRangeInternalTolerance, while the nearby constant comment says rows must still be filtered by the original time ranges in slowLogChecker.
Change request
Prefer a name that exposes the widened semantics, for example slowLogMayOverlapTimeRangeWithTolerance or slowLogMayOverlapPruningWindow; current name is confusing. Please also add a short why-comment on the helper saying it must only be used for file-pruning or reverse-scan candidate checks.
| require.Equal(t, "2020-02-15 21:00:05.000000", t1) | ||
| require.NoError(t, retriever.close()) | ||
|
|
||
| // The long DB line makes the middle slow-log block cross readLastLines chunks. |
There was a problem hiding this comment.
🟡 [Minor] Cross-chunk regression is folded into the limit test
Why
The added block makes TestSlowQueryRetrieverReversedScanWithLimit cover both reverse-limit ordering and a separate readLastLines cross-chunk blank-line regression.
Scope
pkg/executor/slow_query_test.go:874
Risk if unchanged
Future failures or edits in the original limit scenario can obscure or skip the independent cross-chunk regression, and the test name no longer describes everything it protects.
Evidence
After the original limit assertions finish at pkg/executor/slow_query_test.go:870, the test creates a second slow-log file and performs forward/reverse scan comparison at pkg/executor/slow_query_test.go:874 through pkg/executor/slow_query_test.go:912.
Change request
Can we split this into a focused test or subtest, for example TestSlowQueryRetrieverReversedScanCrossChunk, sharing setup helpers only if needed?
There was a problem hiding this comment.
I still want to reuse this test. Because LIMIT is necessary to enter the execution path. Will add comment
| } | ||
| reverseRows = append(reverseRows, rows...) | ||
| } | ||
| require.Len(t, reverseRows, 3) |
There was a problem hiding this comment.
🟡 [Minor] Reverse jitter test can pass with the out-of-window row
Why
The test is meant to prove that the new tolerance only widens internal pruning and reverse-scan stop checks, while rows are still filtered by the original time range. The reverse branch only asserts the row count, though the fixture has three in-window rows plus one just-before-window row and the reverse retriever also uses limit = 3.
Scope
pkg/executor/slow_query_test.go:966
Risk if unchanged
A reverse-scan regression that returns just-before-window and drops one valid in-window row would still produce three rows, so this regression test would not catch a compatibility break in SLOW_QUERY time predicate semantics.
Evidence
The fixture at pkg/executor/slow_query_test.go:917-924 defines three in-window rows and one just-before-window row. The reverse path at pkg/executor/slow_query_test.go:955-966 sets limit = 3 and only checks require.Len(t, reverseRows, 3).
Change request
Please add assertions for the returned Query or Time values, including absence of just-before-window, in the reverse path; alternatively set the reverse limit above the fixture size and assert the exact result set and order.
There was a problem hiding this comment.
I have changed the code in d8e0a3e to address 3 comments. PTAL
Signed-off-by: lance6716 <lance6716@gmail.com>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: crazycs520, D3Hunter The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
|
@D3Hunter: new pull request created to branch DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
What problem does this PR solve?
Issue Number: close #68758
bug 1: if the new line symbol is at the ending of internal chunk, the file scan will be finsihed too early because empty string will break the loop
bug 2: slow log has some time drift because record time and write time is different. We should tolerate it
Problem Summary:
What changed and how does it work?
Check List
Tests
checked with real slow log files collected from user.
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.
Summary by CodeRabbit
Bug Fixes
Tests