Skip to content

pkg/planner, pkg/sessionctx: add delete-by-key support for instance plan cache#67495

Closed
winoros wants to merge 2 commits into
pingcap:masterfrom
winoros:instance-plan-cache-evictkey
Closed

pkg/planner, pkg/sessionctx: add delete-by-key support for instance plan cache#67495
winoros wants to merge 2 commits into
pingcap:masterfrom
winoros:instance-plan-cache-evictkey

Conversation

@winoros
Copy link
Copy Markdown
Member

@winoros winoros commented Apr 1, 2026

What problem does this PR solve?

Issue Number: close #67493

Problem Summary:

The instance plan cache only exposed Get, Put, All, and Evict(evictAll bool), so callers had no exact-key deletion capability. When an old exact cache key was known, stale instance-cache entries could only be left to logical invalidation or background eviction instead of being reclaimed eagerly.

What changed and how does it work?

  • add Delete(key string) (numDeleted int) to sessionctx.InstancePlanCache
  • implement exact-key bucket deletion in instancePlanCache
  • mark deleted heads and use a deletion sentinel so Get / Put / iteration stay consistent with concurrent deletion
  • update totCost and totPlan immediately when a key is deleted
  • extend the existing instance plan cache tests to cover exact-key deletion and deleting all variants under the same exact key

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Unit test commands:

./tools/check/failpoint-go-test.sh pkg/planner/core -run 'TestInstancePlanCache(Basic|WithMatchOpts|EvictAll|ConcurrentRead|ConcurrentWriteRead)$'
go test ./pkg/sessionctx ./pkg/domain ./pkg/planner/core -run '^$' -tags=intest,deadlock
make lint

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

  • New Features

    • Added an API to delete all cached query plans under a given key; deletion is atomic and thread-safe and updates memory/size accounting.
  • Tests

    • Added tests covering concurrent delete vs. put races and deleting entries that share a key but differ by parameter typing, ensuring exact deletion counts and correct accounting.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 1, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot Bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. labels Apr 1, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 1, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 2607fab2-d538-49f1-b161-b4b213ba0ef5

📥 Commits

Reviewing files that changed from the base of the PR and between 486250f and 702d502.

📒 Files selected for processing (2)
  • pkg/planner/core/plan_cache_instance.go
  • pkg/planner/core/plan_cache_instance_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/planner/core/plan_cache_instance.go
  • pkg/planner/core/plan_cache_instance_test.go

📝 Walkthrough

Walkthrough

Adds an exact-key eviction API to the instance plan cache: Delete(key string) which atomically detaches and removes a head bucket, marks nodes deleted via a sentinel, updates accounting, and adjusts cache operations to treat deleted heads as absent for concurrent safety.

Changes

Cohort / File(s) Summary
Core Deletion Implementation
pkg/planner/core/plan_cache_instance.go
Introduced Delete(key) on instancePlanCache. Added deleted flag on instancePCNode and a shared deletedInstancePCNode sentinel. Updated getHead, Get, Put, getPlanFromList, and foreach to ignore deleted/sentinel nodes and retry/create heads as needed. Delete serializes with evictMutex, swaps in the sentinel, removes the key from pc.heads, and reconciles accounting (totCost/totPlan) exactly once per removed node (includes a test failpoint path).
Cache Interface Extension
pkg/sessionctx/context.go
Added Delete(key string) (numDeleted int) to the InstancePlanCache interface signature to expose per-key eviction.
Test Coverage
pkg/planner/core/plan_cache_instance_test.go
Added helper _putWithParamTypes and new tests: concurrent Put-pause vs Delete with failpoint, deterministic single-key deletion, and deletion of same string key with different ParamTypes variants. Asserts correct NumDeleted, MemUsage(), Size(), and Get results.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I dug a tunnel through the plan-cache loam,

Swapped in a sentinel, sent old heads home.
Counters balanced, races softly shushed,
Exact-key vanished—no memory crushed.
Hooray, the cache can now make room! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and concisely describes the main change: adding delete-by-key support to the instance plan cache, and specifies the affected packages (pkg/planner, pkg/sessionctx).
Description check ✅ Passed The PR description follows the template structure, includes the issue number (close #67493), provides a clear problem summary and change explanation, marks unit tests as included, and identifies no side effects or documentation needs.
Linked Issues check ✅ Passed The PR fully implements the requirements from issue #67493: adds Delete(key) to InstancePlanCache interface, implements atomic bucket deletion with sentinel node, handles concurrent deletion, updates totCost/totPlan immediately, and includes comprehensive unit tests covering all specified scenarios.
Out of Scope Changes check ✅ Passed All changes are directly scoped to exact-key deletion for the instance plan cache as specified in issue #67493. The changes to three files (plan_cache_instance.go, plan_cache_instance_test.go, context.go) are all aligned with implementing the Delete(key) API and its supporting infrastructure.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.4)

level=error msg="Running error: context loading failed: failed to load packages: failed to load packages: failed to load with go/packages: context deadline exceeded"
level=error msg="Timeout exceeded: try increasing it by passing --timeout option"


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot Bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. sig/planner SIG: Planner labels Apr 1, 2026
@tiprow
Copy link
Copy Markdown

tiprow Bot commented Apr 1, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@winoros winoros marked this pull request as ready for review April 1, 2026 09:39
@ti-chi-bot ti-chi-bot Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 1, 2026
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai Bot commented Apr 1, 2026

@winoros I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

ℹ️ Learn more details on Pantheon AI.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/planner/core/plan_cache_instance.go (1)

42-50: Document the sentinel and deleted-head invariant.

deletedInstancePCNode and instancePCNode.deleted carry most of the concurrency contract here, but the file never explains why a deleted bucket is different from an empty bucket or why traversals must stop on this exact pointer. A short comment on those invariants would make the Delete/Get/Put interaction much easier to maintain.

As per coding guidelines, "Comments SHOULD explain non-obvious intent, constraints, invariants, concurrency guarantees, SQL/compatibility contracts, or important performance trade-offs, and SHOULD NOT restate what the code already makes clear."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/planner/core/plan_cache_instance.go` around lines 42 - 50, Add a concise
comment above the instancePCNode type and the deletedInstancePCNode sentinel
that documents the concurrency invariant: explain that deletedInstancePCNode is
a unique sentinel (not a nil/empty bucket) used to mark a removed bucket, that
instancePCNode.deleted is set to true for nodes that have been logically
removed, and that traversals in Delete, Get, and Put must stop when they
encounter the exact deletedInstancePCNode pointer (not merely a node with
deleted==true) to avoid ABA/race conditions; also note any ordering/visibility
expectations (e.g., deleted is set before pointer swaps) and the rationale why
deleted vs empty buckets are treated differently for correctness of concurrent
traversal and insertion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/planner/core/plan_cache_instance.go`:
- Around line 153-170: Delete currently detaches nodes and immediately
decrements pc.totCost/pc.totPlan which races with Put's publish/accounting;
instead, detach the list as you do with
headNode.next.Swap(deletedInstancePCNode) but defer subtracting each node's
MemoryUsage()/count until that node has completed the Put publish/accounting
boundary (detectable via the node's published/ready flag used by Put or by
waiting until the same atomic the Put sets when it finishes accounting).
Concretely: in instancePlanCache.Delete, after firstNode :=
headNode.next.Swap(...), iterate the list but only call pc.totCost.Sub(...) and
pc.totPlan.Sub(1) for nodes whose published flag/atomic is observed true (or
otherwise block/queue the subtraction until the node's publish flag is set);
reference deletedInstancePCNode, headNode.next, pc.totCost and pc.totPlan and
the publish/accounting flag used by Put to synchronize the decrement.

---

Nitpick comments:
In `@pkg/planner/core/plan_cache_instance.go`:
- Around line 42-50: Add a concise comment above the instancePCNode type and the
deletedInstancePCNode sentinel that documents the concurrency invariant: explain
that deletedInstancePCNode is a unique sentinel (not a nil/empty bucket) used to
mark a removed bucket, that instancePCNode.deleted is set to true for nodes that
have been logically removed, and that traversals in Delete, Get, and Put must
stop when they encounter the exact deletedInstancePCNode pointer (not merely a
node with deleted==true) to avoid ABA/race conditions; also note any
ordering/visibility expectations (e.g., deleted is set before pointer swaps) and
the rationale why deleted vs empty buckets are treated differently for
correctness of concurrent traversal and insertion.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 71398b3f-4160-45ba-b7ae-3a322cbe5f16

📥 Commits

Reviewing files that changed from the base of the PR and between dcf6f03 and 486250f.

📒 Files selected for processing (3)
  • pkg/planner/core/plan_cache_instance.go
  • pkg/planner/core/plan_cache_instance_test.go
  • pkg/sessionctx/context.go

Comment thread pkg/planner/core/plan_cache_instance.go
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 1, 2026

Codecov Report

❌ Patch coverage is 0% with 70 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.2192%. Comparing base (dcf6f03) to head (702d502).

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #67495        +/-   ##
================================================
- Coverage   77.7401%   77.2192%   -0.5209%     
================================================
  Files          1959       1942        -17     
  Lines        543377     543437        +60     
================================================
- Hits         422422     419638      -2784     
- Misses       120114     123797      +3683     
+ Partials        841          2       -839     
Flag Coverage Δ
integration 40.9280% <0.0000%> (+4.7533%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 61.4164% <ø> (ø)
parser ∅ <ø> (∅)
br 48.8241% <ø> (-12.1560%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 1, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hawkingrei

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Apr 1, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 1, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-04-01 10:23:49.360610672 +0000 UTC m=+347034.565970729: ☑️ agreed by hawkingrei.

@winoros
Copy link
Copy Markdown
Member Author

winoros commented Apr 1, 2026

@coderabbitai The nitpick is addressed in 702d502 as well. I added concise comments documenting the deleted-head / sentinel invariant, the ordering expectation (deleted before swapping the sentinel), and why traversals must stop on the exact deletedInstancePCNode pointer instead of treating it like an empty bucket.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 1, 2026

Tip

For best results, initiate chat on the files or code changes.

@winoros Thanks for the update! The added comments in 702d5023f2 clearly document the key invariants — the deleted-before-sentinel ordering requirement and the identity-check semantics for stopping traversal at deletedInstancePCNode — making the concurrent deletion logic much easier to reason about. The nitpick is well resolved. ✅


🧠 Learnings used
Learnt from: mjonss
Repo: pingcap/tidb PR: 66303
File: pkg/statistics/handle/storage/gc.go:170-172
Timestamp: 2026-03-05T16:38:28.156Z
Learning: In `pkg/statistics/handle/storage/gc.go` (pingcap/tidb), the `mysql.stats_table_data` table introduced in PR `#66303` does NOT store per-histogram (hist_id) granularity rows. Therefore, `deleteHistStatsFromKV` does NOT need to delete from `mysql.stats_table_data`. The table-level delete added in `DeleteTableStatsFromKV` is purely defensive programming for downgrade safety, since no data is currently written to the table.

Learnt from: D3Hunter
Repo: pingcap/tidb PR: 66560
File: pkg/server/handler/tikvhandler/tikv_handler.go:2066-2074
Timestamp: 2026-02-28T04:00:47.001Z
Learning: The DeleteKeyHandler in pkg/server/handler/tikvhandler/tikv_handler.go is a test-only API (under /test/delete/rowkey and /test/delete/indexkey endpoints) used for mocking GC and lock resolving scenarios. It intentionally supports only non-unique key deletion, which is sufficient for its testing purposes.

If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 1, 2026

@winoros: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
idc-jenkins-ci-tidb/unit-test 702d502 link true /test unit-test
pull-integration-realcluster-test-next-gen 702d502 link true /test pull-integration-realcluster-test-next-gen

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@winoros
Copy link
Copy Markdown
Member Author

winoros commented Apr 1, 2026

closed since #67411 use a way to impl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

planner: add evict-by-key capability to instance plan cache

2 participants