Skip to content

fix(plan): prevent OOM from non-selective secondary index range scans#24309

Merged
mergify[bot] merged 6 commits into
matrixorigin:mainfrom
jiangxinmeng1:fix/secondary-index-range-oom-24307
May 8, 2026
Merged

fix(plan): prevent OOM from non-selective secondary index range scans#24309
mergify[bot] merged 6 commits into
matrixorigin:mainfrom
jiangxinmeng1:fix/secondary-index-range-oom-24307

Conversation

@jiangxinmeng1
Copy link
Copy Markdown
Contributor

@jiangxinmeng1 jiangxinmeng1 commented May 8, 2026

What type of PR is this?

  • Bug fix

Which issue(s) does this PR fix?

Fixes #24307

What this PR does / why we need it:

Fixes OOM regression in sysbench 1000w insert-ignore scenario introduced by #24275.

Two root causes identified:

  1. Missing selectivity guard in tryIndexOnlyScan: The function runs before the general selectivity check in applyIndicesForFiltersRegularIndex, so non-selective range queries (>= / <) recognized by feat(plan): support range queries on secondary indexes #24275's checkIndexFilter could trigger index-only scans without any cost check on large tables. This causes full index table scans → excessive metadata/bloom filter re-reads → unbounded mpool growth → OOM. Fixed by adding a guard that skips index-only scan for non-equality leading conditions on tables >= 50000 rows when selectivity > 0.3 or output cardinality > 10000.

  2. Buffer overread in ConstructBasePKFilter's in_range case: types.DecodeInt64(vals[2]) reads 8 bytes from a 1-byte uint8 flag slice, causing undefined behavior. Fixed by reading the flag byte directly as vals[2][0].

@qodo-code-review
Copy link
Copy Markdown

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

…s to prevent stampede OOM (matrixorigin#24307)

When 100 concurrent INSERT IGNORE scans miss the same cache entry simultaneously,
all goroutines perform redundant I/O reads, loading identical bloom filters and
object metadata into Go heap 100x. This causes heap_sys to balloon to 39 GiB
(well beyond GOMEMLIMIT=24G) and triggers OOM under 55 GiB cgroup limit.

Add dedupLoad using sync.Map + channel pattern (singleflight equivalent) so that
only one goroutine performs the actual ReadExtent/ReadBloomFilter for a given key
while others wait and share the result.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@LeftHandCold LeftHandCold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one blocker in the group/merge_group mpool cleanup change.

Group.Prepare and MergeGroup.Prepare now delete ctr.mp before freeing or recreating the container state that may still own allocations from that pool. For example, Group.Prepare deletes the old pool at pkg/sql/colexec/group/exec2.go:51, but prepareGroupAndAggArg can then reuse the existing ctr.aggList when the aggregate count matches and calls ag.Free() / ag.GroupGrow(1) on aggregators that were created with the old ctr.mp in container.makeAggList. MergeGroup.Prepare has the same pattern, and buildOneBatch can reuse ctr.aggList when lengths match.

This means allocations/frees can continue through a deleted/deregistered mpool, while ctr.memUsed() and spill decisions read the newly created pool. It can hide memory from the spill accounting and also risks freeing old groupByBatches / spill buffers with the new pool during later cleanup.

Please free/reset the full container state with the old pool before deleting/replacing ctr.mp, or force all pool-owning state (aggList, hash state, group batches, spill buffers/evaluators, etc.) to be released and recreated before the old pool is deleted.

jiangxinmeng1 and others added 2 commits May 8, 2026 19:46
…tor reuse (matrixorigin#24307)

When a Group/MergeGroup operator is reused (e.g. via StarCountMergeGroup
cached on Scope), Prepare() creates a new mpool without freeing the old
one, orphaning potentially gigabytes of allocated memory indefinitely.

Use ctr.free() instead of bare mpool.DeleteMPool() to properly release
all pool-owning state (aggList, groupByBatches, spill buffers, hash state,
evaluators) with the old pool before deleting it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 8, 2026

Merge Queue Status

  • Entered queue2026-05-08 13:42 UTC · Rule: main
  • Checks passed · in-place
  • Merged2026-05-08 15:33 UTC · at 67785d4230060308fb64835fef164f3a51fefa3b · squash

This pull request spent 1 hour 51 minutes 44 seconds in the queue, including 58 minutes 40 seconds running CI.

Required conditions to merge
  • #approved-reviews-by >= 1 [🛡 GitHub branch protection]
  • #changes-requested-reviews-by = 0 [🛡 GitHub branch protection]
  • #review-threads-unresolved = 0 [🛡 GitHub branch protection]
  • branch-protection-review-decision = APPROVED [🛡 GitHub branch protection]
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-neutral = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-skipped = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / SCA Test on Ubuntu/x86
    • check-neutral = Matrixone CI / SCA Test on Ubuntu/x86
    • check-skipped = Matrixone CI / SCA Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / UT Test on Ubuntu/x86
    • check-neutral = Matrixone CI / UT Test on Ubuntu/x86
    • check-skipped = Matrixone CI / UT Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-neutral = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-skipped = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Utils CI / Coverage
    • check-neutral = Matrixone Utils CI / Coverage
    • check-skipped = Matrixone Utils CI / Coverage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Something isn't working size/M Denotes a PR that changes [100,499] lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: [main tke regression] sysbench 1000w insert ignore oom.

6 participants