Skip to content

executor: fix slow log push down bug#68765

Merged
ti-chi-bot[bot] merged 4 commits into
pingcap:masterfrom
lance6716:fix-68758
Jun 1, 2026
Merged

executor: fix slow log push down bug#68765
ti-chi-bot[bot] merged 4 commits into
pingcap:masterfrom
lance6716:fix-68758

Conversation

@lance6716
Copy link
Copy Markdown
Contributor

@lance6716 lance6716 commented May 29, 2026

What problem does this PR solve?

Issue Number: close #68758

bug 1: if the new line symbol is at the ending of internal chunk, the file scan will be finsihed too early because empty string will break the loop

bug 2: slow log has some time drift because record time and write time is different. We should tolerate it

Problem Summary:

What changed and how does it work?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

checked with real slow log files collected from user.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

  • Bug Fixes

    • Improved slow query log scanning and pruning with tolerance for minor time-order variations, improving reverse scans and file selection
    • Normalized slow-log tail parsing to handle mixed line endings and avoid dropped entries
  • Tests

    • Added tests for time-boundary jitter and cross-chunk scenarios (including long lines); renamed an existing parsing test for clarity

lance6716 added 2 commits May 29, 2026 16:31
Signed-off-by: lance6716 <lance6716@gmail.com>
Signed-off-by: lance6716 <lance6716@gmail.com>
Copilot AI review requested due to automatic review settings May 29, 2026 09:07
@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-triage-completed release-note-none Denotes a PR that doesn't merit a release note. labels May 29, 2026
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai Bot commented May 29, 2026

@lance6716 I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

ℹ️ Learn more details on Pantheon AI.

Signed-off-by: lance6716 <lance6716@gmail.com>
@ti-chi-bot ti-chi-bot Bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 29, 2026
@tiprow
Copy link
Copy Markdown

tiprow Bot commented May 29, 2026

Hi @lance6716. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 56b8f9c7-ef82-476b-b88d-acb64f144b92

📥 Commits

Reviewing files that changed from the base of the PR and between 6e46ba3 and d8e0a3e.

📒 Files selected for processing (2)
  • pkg/executor/slow_query.go
  • pkg/executor/slow_query_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/executor/slow_query.go
  • pkg/executor/slow_query_test.go

📝 Walkthrough

Walkthrough

Adds a 1s internal tolerance for slow-log time comparisons, applies it to reverse-scanner initialization and file-pruning overlap checks, and extracts line-splitting logic; tests updated with cross-chunk and millisecond-jitter scenarios.

Changes

Slow-log reverse-scan and file-filtering tolerance

Layer / File(s) Summary
Tolerance constant and helper functions
pkg/executor/slow_query.go
Defines slowLogTimeRangeInternalTolerance (1s) and adds slowLogTimeWithTolerance and slowLogMayOverlapTimeRangeWithTolerance helpers.
Reverse-scanner minStartTime with tolerance
pkg/executor/slow_query.go
Compute scanner.minStartTime via slowLogTimeWithTolerance (negative tolerance) in newSlowLogReverseScanner.
getAllFiles timezone capture and non-compressed pruning
pkg/executor/slow_query.go
Capture session tz and apply tolerance-adjusted time when pruning non-compressed slow-log files.
File overlap detection with tolerance (non-compressed & compressed)
pkg/executor/slow_query.go
Replace direct overlap checks with slowLogMayOverlapTimeRangeWithTolerance for both non-compressed and compressed file pruning.
Line-splitting utility extraction
pkg/executor/slow_query.go
Extract splitSlowLogLines to normalize \r\n, split into lines, and drop trailing empty lines; readLastLines uses it.
Test updates and new edge-case tests
pkg/executor/slow_query_test.go
Add model import, rename TestSplitbyColonTestSplitByColon, add getTimeColIdxFromRetrieverOutput, extend cross-chunk reverse-scan test, and add TestSlowQueryRetrieverReversedScanWithTimeJitter.

Sequence Diagram

sequenceDiagram
  participant Client
  participant ReverseScanner
  participant FileFilter
  participant TimeHelper

  Client->>ReverseScanner: request slow-log for time range
  ReverseScanner->>TimeHelper: slowLogTimeWithTolerance(minStart, -1s)
  TimeHelper-->>ReverseScanner: adjusted minStartTime
  ReverseScanner->>FileFilter: candidate files request (range)
  FileFilter->>TimeHelper: slowLogMayOverlapTimeRangeWithTolerance(file.start, file.end, range, tz)
  TimeHelper-->>FileFilter: overlap true/false (±1s)
  FileFilter-->>ReverseScanner: filtered candidate files
  ReverseScanner-->>Client: return rows (no gaps)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

lgtm

Suggested reviewers

  • gengliqi
  • wshwsh12

Poem

🐰 A rabbit taps the log with gentle paws and time,
Adds one small second to bridge a jittered line.
It trims the tails, splits returns so clean and bright,
No rows fall through the cracks in moonlit query-night.
Hooray—slow logs return in orderly delight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'executor: fix slow log push down bug' accurately reflects the main objective of the PR, which is to fix bugs in slow log scanning and retrieval.
Linked Issues check ✅ Passed The PR addresses both bugs mentioned in #68758: it fixes the newline handling issue and adds time tolerance for slow log scanning, with comprehensive test coverage for time boundary scenarios.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing slow log scanning issues: tolerance handling, time-aware file pruning, and test coverage for edge cases are all directly related to the #68758 objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Tools execution failed with the following error:

Failed to run tools: 13 INTERNAL: Received RST_STREAM with code 2 (Internal server error)


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes slow-log reverse scan behavior so Dashboard-style descending queries with LIMIT do not miss rows when log blocks cross read chunks or when adjacent slow-log timestamps drift slightly out of order.

Changes:

  • Adds a 1s internal tolerance for slow-log file pruning and reverse-scan stop checks while preserving exact row-level time filtering.
  • Avoids synthesizing a trailing empty line when splitting reverse-read slow-log chunks.
  • Adds/updates unit tests for reverse scan chunk-boundary handling and timestamp jitter.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
pkg/executor/slow_query.go Applies internal slow-log time tolerance and fixes reverse chunk line splitting.
pkg/executor/slow_query_test.go Adds regression coverage for cross-chunk reverse scan and time jitter behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@D3Hunter
Copy link
Copy Markdown
Contributor

introduced in #65740

@D3Hunter
Copy link
Copy Markdown
Contributor

/cherry-pick release-nextgen-202603

@ti-chi-bot
Copy link
Copy Markdown
Member

@D3Hunter: once the present PR merges, I will cherry-pick it on top of release-nextgen-202603 in the new PR and assign it to you.

Details

In response to this:

/cherry-pick release-nextgen-202603

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 29, 2026

Codecov Report

❌ Patch coverage is 39.13043% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.3551%. Comparing base (1c3f9eb) to head (d8e0a3e).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #68765        +/-   ##
================================================
- Coverage   76.3105%   75.3551%   -0.9554%     
================================================
  Files          2041       2023        -18     
  Lines        563465     567225      +3760     
================================================
- Hits         429983     427433      -2550     
- Misses       132566     139758      +7192     
+ Partials        916         34       -882     
Flag Coverage Δ
integration 41.2892% <39.1304%> (+1.5106%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 60.4610% <ø> (ø)
parser ∅ <ø> (∅)
br 49.8023% <ø> (-13.0287%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ti-chi-bot ti-chi-bot Bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels May 29, 2026
Copy link
Copy Markdown
Contributor

@D3Hunter D3Hunter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

  • Total findings: 0
  • Inline comments: 0
  • Summary-only findings (no inline anchor): 0
Findings (highest risk first)

No findings.

Copy link
Copy Markdown
Contributor

@D3Hunter D3Hunter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

  • Total findings: 3
  • Inline comments: 3
  • Summary-only findings (no inline anchor): 0
Findings (highest risk first)

🟡 [Minor] (3)

  1. Tolerant overlap helper reads like an exact time-range predicate (pkg/executor/slow_query.go:1381)
  2. Cross-chunk regression is folded into the limit test (pkg/executor/slow_query_test.go:874)
  3. Reverse jitter test can pass with the out-of-window row (pkg/executor/slow_query_test.go:966)

Comment thread pkg/executor/slow_query.go Outdated
return ret, err
}

func slowLogMayOverlapTimeRange(start, end types.Time, tr *timeRange, tz *time.Location) bool {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 [Minor] Tolerant overlap helper reads like an exact time-range predicate

Why
The new helper name communicates a generic overlap check, but the implementation widens both sides of the range by slowLogTimeRangeInternalTolerance. That widened behavior is intentionally only for pruning and reverse-scan stop checks, not for final row filtering.

Scope
pkg/executor/slow_query.go:1381

Risk if unchanged
A future caller can reasonably reuse slowLogMayOverlapTimeRange where an exact time-range predicate is needed, which would allow rows outside the requested range to look eligible before slowLogChecker filters them. The intended boundary between tolerant pruning and exact filtering becomes easy to miss.

Evidence
slowLogMayOverlapTimeRange computes rangeStart and rangeEnd by applying slowLogTimeRangeInternalTolerance, while the nearby constant comment says rows must still be filtered by the original time ranges in slowLogChecker.

Change request
Prefer a name that exposes the widened semantics, for example slowLogMayOverlapTimeRangeWithTolerance or slowLogMayOverlapPruningWindow; current name is confusing. Please also add a short why-comment on the helper saying it must only be used for file-pruning or reverse-scan candidate checks.

require.Equal(t, "2020-02-15 21:00:05.000000", t1)
require.NoError(t, retriever.close())

// The long DB line makes the middle slow-log block cross readLastLines chunks.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 [Minor] Cross-chunk regression is folded into the limit test

Why
The added block makes TestSlowQueryRetrieverReversedScanWithLimit cover both reverse-limit ordering and a separate readLastLines cross-chunk blank-line regression.

Scope
pkg/executor/slow_query_test.go:874

Risk if unchanged
Future failures or edits in the original limit scenario can obscure or skip the independent cross-chunk regression, and the test name no longer describes everything it protects.

Evidence
After the original limit assertions finish at pkg/executor/slow_query_test.go:870, the test creates a second slow-log file and performs forward/reverse scan comparison at pkg/executor/slow_query_test.go:874 through pkg/executor/slow_query_test.go:912.

Change request
Can we split this into a focused test or subtest, for example TestSlowQueryRetrieverReversedScanCrossChunk, sharing setup helpers only if needed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still want to reuse this test. Because LIMIT is necessary to enter the execution path. Will add comment

}
reverseRows = append(reverseRows, rows...)
}
require.Len(t, reverseRows, 3)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 [Minor] Reverse jitter test can pass with the out-of-window row

Why
The test is meant to prove that the new tolerance only widens internal pruning and reverse-scan stop checks, while rows are still filtered by the original time range. The reverse branch only asserts the row count, though the fixture has three in-window rows plus one just-before-window row and the reverse retriever also uses limit = 3.

Scope
pkg/executor/slow_query_test.go:966

Risk if unchanged
A reverse-scan regression that returns just-before-window and drops one valid in-window row would still produce three rows, so this regression test would not catch a compatibility break in SLOW_QUERY time predicate semantics.

Evidence
The fixture at pkg/executor/slow_query_test.go:917-924 defines three in-window rows and one just-before-window row. The reverse path at pkg/executor/slow_query_test.go:955-966 sets limit = 3 and only checks require.Len(t, reverseRows, 3).

Change request
Please add assertions for the returned Query or Time values, including absence of just-before-window, in the reverse path; alternatively set the reverse limit above the fixture size and assert the exact result set and order.

Copy link
Copy Markdown
Contributor Author

@lance6716 lance6716 Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the code in d8e0a3e to address 3 comments. PTAL

Signed-off-by: lance6716 <lance6716@gmail.com>
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Jun 1, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: crazycs520, D3Hunter

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [D3Hunter,crazycs520]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jun 1, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Jun 1, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-05-29 09:26:28.595912385 +0000 UTC m=+11328.970722070: ☑️ agreed by crazycs520.
  • 2026-06-01 02:07:44.106587984 +0000 UTC m=+148165.176905374: ☑️ agreed by D3Hunter.

@ti-chi-bot ti-chi-bot Bot merged commit 99e1c67 into pingcap:master Jun 1, 2026
35 of 36 checks passed
@ti-chi-bot
Copy link
Copy Markdown
Member

@D3Hunter: new pull request created to branch release-nextgen-202603: #68822.

Details

In response to this:

/cherry-pick release-nextgen-202603

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

after #65740, query slow log may return less rows than expected

5 participants