-
Notifications
You must be signed in to change notification settings - Fork 696
refactor(mem_wal): redesign FTS mem index for single-writer multi-reader #6726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jackye1995
merged 37 commits into
lance-format:main
from
touch-of-grey:MemTableFTSBetter
May 19, 2026
Merged
Changes from all commits
Commits
Show all changes
37 commits
Select commit
Hold shift + click to select a range
b118529
refactor(mem_wal): redesign FTS mem index for single-writer multi-reader
touch-of-grey e604d84
fix(mem_wal): thread snapshot through compound FTS queries
touch-of-grey 9678130
bench: add fineweb FTS end-to-end benchmark
touch-of-grey d70daeb
fix(bench): replace post-ingest index-catchup spin with writer.close()
touch-of-grey af62c36
bench: rework fineweb FTS bench to mirror the ShardWriter backpressur…
touch-of-grey c6e1e22
fix(bench): wait for FTS index catchup before the read-phase queries
touch-of-grey 6bfc14a
fix(bench): force sync_indexed_write in the FTS read phase
touch-of-grey 802d5d4
fix(bench): tiny read-phase seed + id prefilter for consistency
touch-of-grey c900c7b
fix(bench): query MemTable FTS via MemTableScanner, compare to refere…
touch-of-grey bd295cd
fix(bench): force durable_write in the FTS read phase
touch-of-grey d99cf7d
bench: add --target-rows-per-sec pacing + backlog metrics to FTS writ…
touch-of-grey efbf8e0
bench: add backpressure sweep driver for async FineWeb FTS writes
touch-of-grey e1a1c09
fix(bench): bound max_memtable_rows default to avoid petabyte preallo…
touch-of-grey 3a23eec
bench: add --index-type vector|fts to the ShardWriter backpressure bench
touch-of-grey 95d2896
bench: add Lance FtsMemIndex vs Apache Lucene FTS comparison
touch-of-grey 5cac729
bench: chunk the Lucene MT QPS loop to match the Lance rayon harness
touch-of-grey a92d034
fix(bench): make Lucene MT thread count effectively final for the lambda
touch-of-grey 69aba10
feat(inverted): expose in-memory wand_search entry point
touch-of-grey beab662
feat(mem_wal): add immutable FTS Partition type (redesign stage 2)
touch-of-grey 50be021
perf(mem_wal): partition-structured FTS mem index (redesign stages 3-7)
touch-of-grey 0809e9d
fix(mem_wal): run per-partition FTS WAND unbounded to keep top-k exact
touch-of-grey 46b3b17
fix(mem_wal): score FTS partitions by direct posting scan, not WAND
touch-of-grey e096383
perf(mem_wal): compact FTS partition posting lists, drop Arrow builde…
touch-of-grey bde61dd
perf(mem_wal): prune FTS partition top-k queries with block-max WAND
touch-of-grey 9e7e2ba
perf(mem_wal): byte-compress FTS partition postings, dedup term dicti…
touch-of-grey d6bea03
perf(mem_wal): bit-pack FTS posting doc ids and freqs instead of VByte
touch-of-grey ee88e8c
perf(mem_wal): cross-partition shared-threshold WAND for FTS top-k
touch-of-grey fd8edaf
perf(mem_wal): block-max WAND for FTS top-k pruning
touch-of-grey 2094b7a
Revert "perf(mem_wal): block-max WAND for FTS top-k pruning"
touch-of-grey 75c512b
perf(mem_wal): SIMD-decode FTS posting blocks with BitPacker4x
touch-of-grey 8810a77
test(mem_wal): split FTS bench latency by query type
touch-of-grey 6014e03
perf(mem_wal): random-access compressed FTS positions for fast phrase
touch-of-grey eb88122
fix(bench): pick the freshest FTS bench binary by mtime
touch-of-grey d7da5e5
chore(bench): adapt fineweb FTS bench to the initialize_mem_wal build…
touch-of-grey a0c357a
refactor(mem_wal): drop unused lance-index WAND additions
touch-of-grey d182f3c
chore: sync python/Cargo.lock with the new bitpacking dependency
touch-of-grey 9f62f87
chore(bench): move mem_wal benches under benches/mem_wal and scrub in…
touch-of-grey File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should not create a new bench folder, these should be mved under rust/lance/benches. We should probably create a mem_wal folder with specific subfolders to organize different benchmarks at this point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Moved all the FTS bench files out of the top-level
bench/folder intorust/lance/benches/mem_wal/, with afts/subfolder for FTS-specific work:Cargo.toml's[[bench]]entries now point at the new paths viapath = .... I kept this scoped to the bench files this PR introduces — the pre-existing flatmem_wal_*.rsbenches are left in place to avoid an unrelated churn diff; happy to do a follow-up that consolidates those intomem_wal/too.Also scrubbed the driver scripts and bench doc comments of infra-specific references (S3 bucket names, instance paths, AWS region exports) — they now default to tmpdir-based paths and resolve the repo root via
git rev-parse.