Skip to content

planner: cache mpp info to avoid frequent grpc calling#65694

Open
xzhangxian1008 wants to merge 9 commits into
pingcap:masterfrom
xzhangxian1008:fix-tiflash-9663
Open

planner: cache mpp info to avoid frequent grpc calling#65694
xzhangxian1008 wants to merge 9 commits into
pingcap:masterfrom
xzhangxian1008:fix-tiflash-9663

Conversation

@xzhangxian1008
Copy link
Copy Markdown
Contributor

@xzhangxian1008 xzhangxian1008 commented Jan 21, 2026

What problem does this PR solve?

Issue Number: close #65701

Problem Summary:

What changed and how does it work?

We define a new struct called MPPInfoManager and set a global variable GlobalMPPInfoManager with MPPInfoManager type. mpp info are all cached in this variable. In getTiFlashServerMinLogicalCores, we will check the cache first then call grpc for uncached tiflash stores. When we get a failed tiflash store in filterAliveStoresHelper, we will remove it's cache.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

  • Refactor

    • TiFlash query paths now use a cached per-node CPU metric to reduce repeated hardware queries and improve planner efficiency.
    • Cluster server info retrieval now aggregates responses into a single result set while skipping failed nodes for more stable diagnostics and fewer partial failures.
  • Tests

    • Added tests covering the node info caching and manager behavior to ensure correct refresh and failure handling.

@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 21, 2026
@tiprow
Copy link
Copy Markdown

tiprow Bot commented Jan 21, 2026

Hi @xzhangxian1008. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@xzhangxian1008
Copy link
Copy Markdown
Contributor Author

/run-check-issue-triage-complete

@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 21, 2026

Codecov Report

❌ Patch coverage is 85.13514% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.3365%. Comparing base (319a683) to head (3199903).
⚠️ Report is 286 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #65694        +/-   ##
================================================
- Coverage   77.6914%   77.3365%   -0.3549%     
================================================
  Files          2016       1945        -71     
  Lines        551900     540986     -10914     
================================================
- Hits         428779     418380     -10399     
- Misses       121375     122199       +824     
+ Partials       1746        407      -1339     
Flag Coverage Δ
integration 47.4002% <ø> (-0.7288%) ⬇️
unit 76.2414% <85.1351%> (+0.0073%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 57.0098% <ø> (+0.2124%) ⬆️
parser ∅ <ø> (∅)
br 47.9705% <ø> (-12.8869%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@solotzg
Copy link
Copy Markdown
Contributor

solotzg commented Jan 21, 2026

/cc @copilot

@xzhangxian1008
Copy link
Copy Markdown
Contributor Author

/cc @windtalker @solotzg

@ti-chi-bot ti-chi-bot Bot requested review from solotzg and windtalker January 21, 2026 09:43
Comment thread pkg/planner/core/optimizer.go Outdated
}

if len(uncachedServersInfo) > 0 {
ch := make(chan [][]types.Datum, len(uncachedServersInfo))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use chan?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use chan?

deleted

Comment thread pkg/planner/core/optimizer.go Outdated
waitWg := &sync.WaitGroup{}
waitWg.Add(len(uncachedServersInfo))
for i := range uncachedServersInfo {
go func(info []infoschema.ServerInfo) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider fetching once for the full slice, or pass the single-server slice into the call and use the row’s address only when the fetch is truly per-server.

The extra goroutine layer may be unnecessary since FetchClusterServerInfoWithoutPrivilegeCheck already fans out internally.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider fetching once for the full slice, or pass the single-server slice into the call and use the row’s address only when the fetch is truly per-server.

The extra goroutine layer may be unnecessary since FetchClusterServerInfoWithoutPrivilegeCheck already fans out internally.

fixed

@xzhangxian1008
Copy link
Copy Markdown
Contributor Author

/hold

@ti-chi-bot ti-chi-bot Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 26, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Jan 26, 2026

@solotzg: adding LGTM is restricted to approvers and reviewers in OWNERS files.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Comment thread pkg/executor/memtable_reader.go Outdated
infos := infoschema.FetchClusterServerInfoWithoutPrivilegeCheck(ctx, sctx.GetSessionVars(), serversInfo, e.serverInfoType, true)
rowCount := 0
for _, info := range infos {
if info.Err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if info.Err == nil { ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if info.Err == nil { ?

fixed

Comment thread pkg/executor/memtable_reader.go Outdated

results := make([][]types.Datum, 0, rowCount)
for _, info := range infos {
if info.Err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

fixed

Comment thread pkg/planner/core/optimizer.go Outdated
var uncachedServersInfo []infoschema.ServerInfo
var minLogicalCores = initialMaxCores
for _, info := range serversInfo {
mppInfo := copr.GlobalMPPInfoManager.Get(info.Address)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider monitoring the StartTimestamp for any change in the event that a TiFlash instance's CPU configuration is altered.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider monitoring the StartTimestamp for any change in the event that a TiFlash instance's CPU configuration is altered.

fixed

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Mar 17, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: solotzg
Once this PR has been reviewed and has the lgtm label, please assign gmhdbjd, time-and-fate for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 17, 2026

📝 Walkthrough

Walkthrough

Refactors cluster server info fetching to return per-server results with errors preserved, adds a global MPP info manager to cache per-store CPU/start-timestamp, and updates the optimizer and executor to use/cache that MPP info when computing TiFlash min logical cores.

Changes

Cohort / File(s) Summary
Cache Infrastructure
pkg/store/copr/mpp_probe.go, pkg/store/copr/mpp_probe_test.go
Add MPPInfo, MppInfoManager, and global GlobalMPPInfoManager with thread-safe Add/Delete/Get. Integrate cache invalidation on failed-store reporting and add unit test for the manager.
Cluster Info Retrieval Refactor
pkg/infoschema/tables.go
Introduce exported ServerInfoResult type and change FetchClusterServerInfoWithoutPrivilegeCheck to return []ServerInfoResult (per-server Rows/Err/Idx) instead of combined [][]types.Datum, error.
Executor Adaptation
pkg/executor/memtable_reader.go
Update clusterServerInfoRetriever.retrieve to accept []ServerInfoResult, skip errored entries, and flatten remaining Rows into a single [][]types.Datum.
Optimizer Caching Integration
pkg/planner/core/optimizer.go, pkg/planner/core/optimizer_test.go
Add TiFlash min-core caching logic: split servers into cached/uncached, batch-fetch uncached hardware info, update GlobalMPPInfoManager, and use cached+fresh data to compute min logical cores. Add test verifying refresh when TiFlash restarts.
Build/Test Config
pkg/planner/core/BUILD.bazel, pkg/store/copr/BUILD.bazel
Add //pkg/store/copr dependency to planner core; adjust copr test shard count from 40→41.

Sequence Diagram(s)

sequenceDiagram
    participant Optimizer
    participant Cache as GlobalMPPInfoManager
    participant Fetcher as FetchClusterServerInfoWithoutPrivilegeCheck
    participant Executor as clusterServerInfoRetriever

    Optimizer->>Cache: Get(address) for each TiFlash
    Cache-->>Optimizer: MPPInfo or nil
    Optimizer->>Fetcher: Request info for uncached addresses
    Fetcher->>Executor: Gather ServerInfoResult[] (per-server Rows, Err, Idx)
    Executor-->>Fetcher: Return flattened [][]types.Datum (skipping Err)
    Fetcher-->>Optimizer: []ServerInfoResult
    Optimizer->>Cache: Add(address, LogicalCPUCount) for fresh entries
    Cache-->>Optimizer: Ack (cache updated)
    Optimizer->>Optimizer: Compute min logical cores from cached + fresh data
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • windtalker
  • guo-shaoge

Poem

🐰 I hopped through caches, neat and quick,
Collected cores with every tick,
Per-server tales I keep in store,
So optimizer runs smooth once more —
Hooray for bytes and carrot picks! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: implementing MPP info caching to reduce frequent gRPC calls.
Description check ✅ Passed The description includes the required Issue Number, Problem Summary, What changed sections, and checklist; however, it lacks specific details about the problem and how caching solves it.
Linked Issues check ✅ Passed The PR implements MPP info caching and failure-triggered eviction to reduce gRPC calls, directly addressing the goal of mitigating socket leaks in #65701.
Out of Scope Changes check ✅ Passed All changes are focused on implementing MPP caching logic and supporting infrastructure; no unrelated or out-of-scope modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.3)

Command failed


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can disable the changed files summary in the walkthrough.

Disable the reviews.changed_files_summary setting to disable the changed files summary in the walkthrough.

@hawkingrei
Copy link
Copy Markdown
Member

/ok-to-test

@ti-chi-bot ti-chi-bot Bot added the ok-to-test Indicates a PR is ready to be tested. label Mar 17, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
pkg/store/copr/mpp_probe_test.go (1)

194-210: Cover the failed-store eviction path, not only CRUD.

This exercises the map wrapper, but the behavior this change relies on is that MPPFailedStoreProber.Add() evicts an existing cache entry for the same address. A regression that seeds GlobalMPPInfoManager, calls GlobalMPPFailedStoreProber.Add(), and asserts Get(address) == nil would protect the actual integration path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/store/copr/mpp_probe_test.go` around lines 194 - 210, Add a test that
exercises the failed-store eviction path by seeding GlobalMPPInfoManager with an
MPPInfo for the target address, then calling
GlobalMPPFailedStoreProber.Add(address) and asserting that
GlobalMPPInfoManager.Get(address) returns nil; specifically, create an MPPInfo
and insert it into GlobalMPPInfoManager (use GlobalMPPInfoManager.Add or assign
into its cachedStores), call GlobalMPPFailedStoreProber.Add with the same
Address, and assert GlobalMPPInfoManager.Get(Address) == nil to ensure
MPPFailedStoreProber.Add evicts the cache entry.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/infoschema/tables.go`:
- Around line 2788-2790: The partial-fetch error returned from
getServerInfoByGRPC(ctx, remote, infoTp) is sent to the channel without node
context; wrap or annotate it with the target node (remote and/or
infoTp/serverTP) before sending so logs include which node failed. Update the
code that sends ServerInfoResult (the ch <- ServerInfoResult{Idx: index, Err:
err} lines) to either wrap err with context (e.g., fmt.Errorf("node %s(%s): %w",
remote, infoTp, err)) or add node identity fields to ServerInfoResult and
populate them (e.g., Address: remote, ServerTP: infoTp) so downstream logging
can show the failing node. Ensure the same change is applied at the other
occurrence around lines 2803-2807.

In `@pkg/planner/core/optimizer.go`:
- Around line 643-665: The current Get → RPC → Add is non-atomic and can revive
a store that was marked failed in between; change the cache-fill to perform a
conditional write: after the RPC returns but before calling
copr.GlobalMPPInfoManager.Add, re-check the cache and failure state (e.g., call
copr.GlobalMPPInfoManager.Get(address) and/or
copr.GlobalMPPFailedStoreProber.IsFailed(address)) and only call Add if the
cache entry is still absent and the store is not marked failed; alternatively
implement/use an AddIfNotFailed/AddWithGeneration API on GlobalMPPInfoManager
that compares a generation/tombstone from the initial Get and only stores when
unchanged so failure invalidation wins. Ensure you reference the same address
used when creating the copr.MPPInfo and avoid unconditional Add in the loop that
processes infos.Rows.

---

Nitpick comments:
In `@pkg/store/copr/mpp_probe_test.go`:
- Around line 194-210: Add a test that exercises the failed-store eviction path
by seeding GlobalMPPInfoManager with an MPPInfo for the target address, then
calling GlobalMPPFailedStoreProber.Add(address) and asserting that
GlobalMPPInfoManager.Get(address) returns nil; specifically, create an MPPInfo
and insert it into GlobalMPPInfoManager (use GlobalMPPInfoManager.Add or assign
into its cachedStores), call GlobalMPPFailedStoreProber.Add with the same
Address, and assert GlobalMPPInfoManager.Get(Address) == nil to ensure
MPPFailedStoreProber.Add evicts the cache entry.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 71bdce6f-f92a-461d-ab57-6573cb21d63d

📥 Commits

Reviewing files that changed from the base of the PR and between 319a683 and 264297d.

📒 Files selected for processing (7)
  • pkg/executor/memtable_reader.go
  • pkg/infoschema/tables.go
  • pkg/planner/core/BUILD.bazel
  • pkg/planner/core/optimizer.go
  • pkg/store/copr/BUILD.bazel
  • pkg/store/copr/mpp_probe.go
  • pkg/store/copr/mpp_probe_test.go

Comment thread pkg/infoschema/tables.go
Comment on lines 2788 to +2790
items, err := getServerInfoByGRPC(ctx, remote, infoTp)
if err != nil {
ch <- result{idx: index, err: err}
ch <- ServerInfoResult{Idx: index, Err: err}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Wrap partial-fetch errors with the target node.

These errors are later appended/logged without address or serverTP, so warnings degrade to generic transport failures with no clue which node failed. Please annotate the error before sending it to the channel, or carry the node identity on ServerInfoResult.

Suggested change
-					ch <- ServerInfoResult{Idx: index, Err: err}
+					ch <- ServerInfoResult{
+						Idx: index,
+						Err: errors.Annotatef(err, "fetch cluster server info from %s(%s)", serverTP, address),
+					}

As per coding guidelines, "Keep error handling actionable and contextual; avoid silently swallowing errors".

Also applies to: 2803-2807

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/infoschema/tables.go` around lines 2788 - 2790, The partial-fetch error
returned from getServerInfoByGRPC(ctx, remote, infoTp) is sent to the channel
without node context; wrap or annotate it with the target node (remote and/or
infoTp/serverTP) before sending so logs include which node failed. Update the
code that sends ServerInfoResult (the ch <- ServerInfoResult{Idx: index, Err:
err} lines) to either wrap err with context (e.g., fmt.Errorf("node %s(%s): %w",
remote, infoTp, err)) or add node identity fields to ServerInfoResult and
populate them (e.g., Address: remote, ServerTP: infoTp) so downstream logging
can show the failing node. Ensure the same change is applied at the other
occurrence around lines 2803-2807.

Comment thread pkg/planner/core/optimizer.go Outdated
Comment on lines +643 to +665
var uncachedServersInfo []infoschema.ServerInfo
var minLogicalCores = initialMaxCores
for _, info := range serversInfo {
mppInfo := copr.GlobalMPPInfoManager.Get(info.Address)
if mppInfo == nil {
uncachedServersInfo = append(uncachedServersInfo, info)
continue
}
minLogicalCores = min(minLogicalCores, mppInfo.LogicalCPUCount)
}

if len(uncachedServersInfo) > 0 {
infos := infoschema.FetchClusterServerInfoWithoutPrivilegeCheck(ctx, sctx.GetSessionVars(), uncachedServersInfo, diagnosticspb.ServerInfoType_HardwareInfo, false)
for _, info := range infos {
for _, row := range info.Rows {
if row[4].GetString() == "cpu-logical-cores" {
logicalCpus, err := strconv.Atoi(row[5].GetString())
if err == nil && logicalCpus > 0 {
copr.GlobalMPPInfoManager.Add(&copr.MPPInfo{
Address: uncachedServersInfo[info.Idx].Address,
LogicalCPUCount: uint64(logicalCpus),
})
minLogicalCores = min(minLogicalCores, uint64(logicalCpus))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't revive a failed store between cache miss and cache fill.

This is a non-atomic Get → RPC → Add sequence. If another goroutine marks the store failed while the RPC is in flight, GlobalMPPFailedStoreProber.Add() deletes the cache entry and this unconditional Add puts stale CPU info straight back. That breaks the new eviction guarantee and can keep planning off a store already marked failed. The cache fill needs a generation/tombstone or other conditional write so failure invalidation wins.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/planner/core/optimizer.go` around lines 643 - 665, The current Get → RPC
→ Add is non-atomic and can revive a store that was marked failed in between;
change the cache-fill to perform a conditional write: after the RPC returns but
before calling copr.GlobalMPPInfoManager.Add, re-check the cache and failure
state (e.g., call copr.GlobalMPPInfoManager.Get(address) and/or
copr.GlobalMPPFailedStoreProber.IsFailed(address)) and only call Add if the
cache entry is still absent and the store is not marked failed; alternatively
implement/use an AddIfNotFailed/AddWithGeneration API on GlobalMPPInfoManager
that compares a generation/tombstone from the initial Get and only stores when
unchanged so failure invalidation wins. Ensure you reference the same address
used when creating the copr.MPPInfo and avoid unconditional Add in the loop that
processes infos.Rows.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
pkg/planner/core/optimizer.go (1)

658-673: ⚠️ Potential issue | 🟠 Major

Avoid reviving failed-store cache entries in the refresh write path.

The refresh flow is still non-atomic (split/cache-check → remote fetch → unconditional Add). If a store is marked failed and evicted during fetch, this Add can repopulate stale data and undo the invalidation guarantee. Please make cache fill conditional (e.g., generation/tombstone-based write where invalidation wins).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/planner/core/optimizer.go` around lines 658 - 673, The refresh path
currently unconditionally repopulates MPPInfo via copr.GlobalMPPInfoManager.Add
using serversNeedingRefresh[info.Idx], which can resurrect stores that were
concurrently marked failed/evicted; before calling Add in the loop inside
splitTiFlashLogicalCoreCache's refresh logic, verify the target server is still
valid by checking a stable generation/tombstone or matching
StartTimestamp/address against the current cache/store list (e.g., compare
serversNeedingRefresh[info.Idx].StartTimestamp and Address or a generation
field), and only call Add when they still match; if your cache supports a
generation/tombstone, use that to make the write conditional so invalidation
wins.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/planner/core/optimizer.go`:
- Around line 658-673: The refresh path currently unconditionally repopulates
MPPInfo via copr.GlobalMPPInfoManager.Add using serversNeedingRefresh[info.Idx],
which can resurrect stores that were concurrently marked failed/evicted; before
calling Add in the loop inside splitTiFlashLogicalCoreCache's refresh logic,
verify the target server is still valid by checking a stable
generation/tombstone or matching StartTimestamp/address against the current
cache/store list (e.g., compare serversNeedingRefresh[info.Idx].StartTimestamp
and Address or a generation field), and only call Add when they still match; if
your cache supports a generation/tombstone, use that to make the write
conditional so invalidation wins.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 3838bf54-8b2e-407f-a842-76a96b47c29b

📥 Commits

Reviewing files that changed from the base of the PR and between 264297d and 3199903.

📒 Files selected for processing (4)
  • pkg/planner/core/optimizer.go
  • pkg/planner/core/optimizer_test.go
  • pkg/store/copr/mpp_probe.go
  • pkg/store/copr/mpp_probe_test.go

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Mar 18, 2026

@xzhangxian1008: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
idc-jenkins-ci-tidb/check_dev 3199903 link true /test check-dev
pull-unit-test-next-gen 3199903 link true /test pull-unit-test-next-gen
idc-jenkins-ci-tidb/unit-test 3199903 link true /test unit-test

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/needs-triage-completed ok-to-test Indicates a PR is ready to be tested. release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TiFlash panics with Too many open files due to grpc connection socket leak in the cloud GCP env

3 participants