Skip to content

fix(mem_wal): build secondary indexes when flushing active memtable#6901

Merged
jackye1995 merged 1 commit into
lance-format:mainfrom
hamersaw:bug/wal-flush-indexes
May 21, 2026
Merged

fix(mem_wal): build secondary indexes when flushing active memtable#6901
jackye1995 merged 1 commit into
lance-format:mainfrom
hamersaw:bug/wal-flush-indexes

Conversation

@hamersaw
Copy link
Copy Markdown
Contributor

Summary

The background memtable flush handler (MemTableFlushHandler::flush_memtable) called flush, which persists the data file and bloom filter but builds no secondary indexes. The handler never even received the shard's index configs.

Two query-side consequences over flushed generations:

  • Point lookups (point_lookup.rs) run a filter_expr scan. Lance can route that through a scalar index — but none existed, so lookups fell back to a full scan (perf regression).
  • Vector search (vector_search.rs) uses index-only fast_search(). Its doc comment assumes "each flushed memtable has its own vector index built during flush", which was false — so flushed vector rows were invisible to KNN. This is a correctness bug, not just perf.

Changes

  • Thread the shard's index_configs into MemTableFlushHandler.
  • Call flush_with_indexes when any indexes are configured so each flushed generation carries the same secondary indexes as the active memtable; keep plain flush when none are configured to avoid a needless dataset open.
  • Box the flush_with_indexes future to keep the flush async block under the type-layout recursion limit.

Testing

  • New test_flushed_generation_is_indexed: writes through the real ShardWriter path, forces a flush, and asserts the flushed generation (a) carries the BTree index and (b) resolves id = 5 via ScalarIndexQuery rather than a scan. HNSW/FTS flush and fast_search over an indexed flushed generation are already covered by existing tests.
  • cargo test -p lance --lib dataset::mem_wal:: — 316 passed, 0 failed.
  • cargo fmt --all and cargo clippy -p lance --tests -- -D warnings — clean.

🤖 Generated with Claude Code

The background memtable flush handler called `flush`, which writes data
and the bloom filter but builds no secondary indexes. As a result point
lookups over flushed generations fell back to full scans, and vector
search — which uses index-only `fast_search` over flushed generations —
could not see the flushed rows at all (a correctness bug, not just a
perf regression).

Thread the shard's index configs into `MemTableFlushHandler` and call
`flush_with_indexes` when any are configured, so each flushed generation
carries the same secondary indexes as the active memtable. Plain `flush`
is still used when no indexes are configured to avoid a needless dataset
open. The indexed path's future is boxed to keep the flush async block
under the type-layout recursion limit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@github-actions github-actions Bot added the bug Something isn't working label May 21, 2026
@jackye1995
Copy link
Copy Markdown
Contributor

cc @touch-of-grey you probably want to rerun the backpressure benchmark after this

@jackye1995 jackye1995 merged commit cee7d32 into lance-format:main May 21, 2026
27 checks passed
touch-of-grey added a commit to touch-of-grey/lance that referenced this pull request May 22, 2026
…ance-format#6901)

lance lance-format#6901 makes the memtable flush handler build the shard's maintained
secondary indexes on each flushed generation, so the FTS index now exists
on every flushed gen without the bench creating it. Remove the manual
create_index loop; both scoring modes still use the fast indexed path, and
the rescore planner's flat fallback remains for the no-maintained-index
case (covered by unit tests).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants