fix(plan): prevent OOM from non-selective secondary index range scans by jiangxinmeng1 · Pull Request #24309 · matrixorigin/matrixone

jiangxinmeng1 · 2026-05-08T03:25:57Z

What type of PR is this?

Bug fix

Which issue(s) does this PR fix?

Fixes #24307

What this PR does / why we need it:

Fixes OOM regression in sysbench 1000w insert-ignore scenario introduced by #24275.

Two root causes identified:

Missing selectivity guard in tryIndexOnlyScan: The function runs before the general selectivity check in applyIndicesForFiltersRegularIndex, so non-selective range queries (>= / <) recognized by feat(plan): support range queries on secondary indexes #24275's checkIndexFilter could trigger index-only scans without any cost check on large tables. This causes full index table scans → excessive metadata/bloom filter re-reads → unbounded mpool growth → OOM. Fixed by adding a guard that skips index-only scan for non-equality leading conditions on tables >= 50000 rows when selectivity > 0.3 or output cardinality > 10000.
Buffer overread in ConstructBasePKFilter's in_range case: types.DecodeInt64(vals[2]) reads 8 bytes from a 1-byte uint8 flag slice, causing undefined behavior. Fixed by reading the flag byte directly as vals[2][0].

qodo-code-review · 2026-05-08T03:26:01Z

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

…s to prevent stampede OOM (matrixorigin#24307) When 100 concurrent INSERT IGNORE scans miss the same cache entry simultaneously, all goroutines perform redundant I/O reads, loading identical bloom filters and object metadata into Go heap 100x. This causes heap_sys to balloon to 39 GiB (well beyond GOMEMLIMIT=24G) and triggers OOM under 55 GiB cgroup limit. Add dedupLoad using sync.Map + channel pattern (singleflight equivalent) so that only one goroutine performs the actual ReadExtent/ReadBloomFilter for a given key while others wait and share the result. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

LeftHandCold

Found one blocker in the group/merge_group mpool cleanup change.

Group.Prepare and MergeGroup.Prepare now delete ctr.mp before freeing or recreating the container state that may still own allocations from that pool. For example, Group.Prepare deletes the old pool at pkg/sql/colexec/group/exec2.go:51, but prepareGroupAndAggArg can then reuse the existing ctr.aggList when the aggregate count matches and calls ag.Free() / ag.GroupGrow(1) on aggregators that were created with the old ctr.mp in container.makeAggList. MergeGroup.Prepare has the same pattern, and buildOneBatch can reuse ctr.aggList when lengths match.

This means allocations/frees can continue through a deleted/deregistered mpool, while ctr.memUsed() and spill decisions read the newly created pool. It can hide memory from the spill accounting and also risks freeing old groupByBatches / spill buffers with the new pool during later cleanup.

Please free/reset the full container state with the old pool before deleting/replacing ctr.mp, or force all pool-owning state (aggList, hash state, group batches, spill buffers/evaluators, etc.) to be released and recreated before the old pool is deleted.

…tor reuse (matrixorigin#24307) When a Group/MergeGroup operator is reused (e.g. via StarCountMergeGroup cached on Scope), Prepare() creates a new mpool without freeing the old one, orphaning potentially gigabytes of allocated memory indefinitely. Use ctr.free() instead of bare mpool.DeleteMPool() to properly release all pool-owning state (aggList, groupByBatches, spill buffers, hash state, evaluators) with the old pool before deleting it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mergify · 2026-05-08T13:42:10Z

Merge Queue Status

✅ Entered queue — 2026-05-08 13:42 UTC · Rule: main
✅ Checks passed · in-place
✅ Merged — 2026-05-08 15:33 UTC · at 67785d4230060308fb64835fef164f3a51fefa3b · squash

This pull request spent 1 hour 51 minutes 44 seconds in the queue, including 58 minutes 40 seconds running CI.

Required conditions to merge

jiangxinmeng1 requested review from XuPeng-SH, aunjgr and ouyuanning as code owners May 8, 2026 03:25

jiangxinmeng1 temporarily deployed to ci May 8, 2026 03:26 — with GitHub Actions Inactive

jiangxinmeng1 had a problem deploying to ci May 8, 2026 03:26 — with GitHub Actions Error

jiangxinmeng1 temporarily deployed to ci May 8, 2026 03:26 — with GitHub Actions Inactive

matrix-meow added the size/S Denotes a PR that changes [10,99] lines label May 8, 2026

mergify Bot added the kind/bug Something isn't working label May 8, 2026

ouyuanning approved these changes May 8, 2026

View reviewed changes

jiangxinmeng1 had a problem deploying to ci May 8, 2026 06:50 — with GitHub Actions Error

jiangxinmeng1 force-pushed the fix/secondary-index-range-oom-24307 branch from 1509aeb to add2a82 Compare May 8, 2026 07:41

jiangxinmeng1 temporarily deployed to ci May 8, 2026 07:44 — with GitHub Actions Inactive

jiangxinmeng1 had a problem deploying to ci May 8, 2026 07:44 — with GitHub Actions Error

jiangxinmeng1 temporarily deployed to ci May 8, 2026 07:44 — with GitHub Actions Inactive

jiangxinmeng1 had a problem deploying to ci May 8, 2026 07:44 — with GitHub Actions Error

jiangxinmeng1 temporarily deployed to ci May 8, 2026 07:44 — with GitHub Actions Inactive

jiangxinmeng1 temporarily deployed to ci May 8, 2026 08:46 — with GitHub Actions Inactive

jiangxinmeng1 requested a review from LeftHandCold as a code owner May 8, 2026 09:43

jiangxinmeng1 temporarily deployed to ci May 8, 2026 09:45 — with GitHub Actions Inactive

jiangxinmeng1 had a problem deploying to ci May 8, 2026 09:45 — with GitHub Actions Error

jiangxinmeng1 temporarily deployed to ci May 8, 2026 09:45 — with GitHub Actions Inactive

jiangxinmeng1 had a problem deploying to ci May 8, 2026 09:45 — with GitHub Actions Error

jiangxinmeng1 temporarily deployed to ci May 8, 2026 09:45 — with GitHub Actions Inactive

jiangxinmeng1 had a problem deploying to ci May 8, 2026 09:45 — with GitHub Actions Error

jiangxinmeng1 temporarily deployed to ci May 8, 2026 10:23 — with GitHub Actions Inactive

LeftHandCold requested changes May 8, 2026

View reviewed changes

jiangxinmeng1 and others added 2 commits May 8, 2026 19:46

Merge branch 'main' into fix/secondary-index-range-oom-24307

621dfd6

LeftHandCold approved these changes May 8, 2026

View reviewed changes

Merge branch 'main' into fix/secondary-index-range-oom-24307

67785d4

jiangxinmeng1 mentioned this pull request May 11, 2026

[Bug]: [main tke regression] sysbench 1000w insert ignore oom. #24307

Closed

1 task

Ariznawlll mentioned this pull request May 11, 2026

[Perf Regression] sysbench random_ranges tps -45% after #24309 #24335

Open

jiangxinmeng1 mentioned this pull request May 11, 2026

fix(plan): relax index-only scan selectivity guard to fix random_ranges regression #24338

Merged

1 task

Ariznawlll mentioned this pull request May 11, 2026

[TPCC Consistency] ACID violation in TPCC 100W/1000terminals on TKE: D_NEXT_O_ID and w_ytd inconsistent after CN connection disruption #24346

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(plan): prevent OOM from non-selective secondary index range scans#24309

fix(plan): prevent OOM from non-selective secondary index range scans#24309
mergify[bot] merged 6 commits into
matrixorigin:mainfrom
jiangxinmeng1:fix/secondary-index-range-oom-24307

jiangxinmeng1 commented May 8, 2026 •

edited

Loading

Uh oh!

qodo-code-review Bot commented May 8, 2026

Uh oh!

LeftHandCold left a comment

Uh oh!

mergify Bot commented May 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

jiangxinmeng1 commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

Which issue(s) does this PR fix?

What this PR does / why we need it:

Uh oh!

qodo-code-review Bot commented May 8, 2026

Uh oh!

LeftHandCold left a comment

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Queue Status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jiangxinmeng1 commented May 8, 2026 •

edited

Loading

mergify Bot commented May 8, 2026 •

edited

Loading