Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
b118529
refactor(mem_wal): redesign FTS mem index for single-writer multi-reader
touch-of-grey May 10, 2026
e604d84
fix(mem_wal): thread snapshot through compound FTS queries
touch-of-grey May 10, 2026
9678130
bench: add fineweb FTS end-to-end benchmark
touch-of-grey May 10, 2026
d70daeb
fix(bench): replace post-ingest index-catchup spin with writer.close()
touch-of-grey May 11, 2026
af62c36
bench: rework fineweb FTS bench to mirror the ShardWriter backpressur…
touch-of-grey May 16, 2026
c6e1e22
fix(bench): wait for FTS index catchup before the read-phase queries
touch-of-grey May 16, 2026
6bfc14a
fix(bench): force sync_indexed_write in the FTS read phase
touch-of-grey May 16, 2026
802d5d4
fix(bench): tiny read-phase seed + id prefilter for consistency
touch-of-grey May 16, 2026
c900c7b
fix(bench): query MemTable FTS via MemTableScanner, compare to refere…
touch-of-grey May 16, 2026
bd295cd
fix(bench): force durable_write in the FTS read phase
touch-of-grey May 16, 2026
d99cf7d
bench: add --target-rows-per-sec pacing + backlog metrics to FTS writ…
touch-of-grey May 16, 2026
efbf8e0
bench: add backpressure sweep driver for async FineWeb FTS writes
touch-of-grey May 16, 2026
e1a1c09
fix(bench): bound max_memtable_rows default to avoid petabyte preallo…
touch-of-grey May 16, 2026
3a23eec
bench: add --index-type vector|fts to the ShardWriter backpressure bench
touch-of-grey May 16, 2026
95d2896
bench: add Lance FtsMemIndex vs Apache Lucene FTS comparison
touch-of-grey May 16, 2026
5cac729
bench: chunk the Lucene MT QPS loop to match the Lance rayon harness
touch-of-grey May 16, 2026
a92d034
fix(bench): make Lucene MT thread count effectively final for the lambda
touch-of-grey May 16, 2026
69aba10
feat(inverted): expose in-memory wand_search entry point
touch-of-grey May 16, 2026
beab662
feat(mem_wal): add immutable FTS Partition type (redesign stage 2)
touch-of-grey May 16, 2026
50be021
perf(mem_wal): partition-structured FTS mem index (redesign stages 3-7)
touch-of-grey May 16, 2026
0809e9d
fix(mem_wal): run per-partition FTS WAND unbounded to keep top-k exact
touch-of-grey May 16, 2026
46b3b17
fix(mem_wal): score FTS partitions by direct posting scan, not WAND
touch-of-grey May 16, 2026
e096383
perf(mem_wal): compact FTS partition posting lists, drop Arrow builde…
touch-of-grey May 17, 2026
bde61dd
perf(mem_wal): prune FTS partition top-k queries with block-max WAND
touch-of-grey May 17, 2026
9e7e2ba
perf(mem_wal): byte-compress FTS partition postings, dedup term dicti…
touch-of-grey May 17, 2026
d6bea03
perf(mem_wal): bit-pack FTS posting doc ids and freqs instead of VByte
touch-of-grey May 17, 2026
ee88e8c
perf(mem_wal): cross-partition shared-threshold WAND for FTS top-k
touch-of-grey May 18, 2026
fd8edaf
perf(mem_wal): block-max WAND for FTS top-k pruning
touch-of-grey May 18, 2026
2094b7a
Revert "perf(mem_wal): block-max WAND for FTS top-k pruning"
touch-of-grey May 18, 2026
75c512b
perf(mem_wal): SIMD-decode FTS posting blocks with BitPacker4x
touch-of-grey May 18, 2026
8810a77
test(mem_wal): split FTS bench latency by query type
touch-of-grey May 18, 2026
6014e03
perf(mem_wal): random-access compressed FTS positions for fast phrase
touch-of-grey May 18, 2026
eb88122
fix(bench): pick the freshest FTS bench binary by mtime
touch-of-grey May 18, 2026
d7da5e5
chore(bench): adapt fineweb FTS bench to the initialize_mem_wal build…
touch-of-grey May 19, 2026
a0c357a
refactor(mem_wal): drop unused lance-index WAND additions
touch-of-grey May 19, 2026
d182f3c
chore: sync python/Cargo.lock with the new bitpacking dependency
touch-of-grey May 19, 2026
9f62f87
chore(bench): move mem_wal benches under benches/mem_wal and scrub in…
touch-of-grey May 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ half = { "version" = "2.1", default-features = false, features = [
"std",
] }
lance-bitpacking = { version = "=7.0.0-beta.14", path = "./rust/compression/bitpacking" }
bitpacking = "0.9"
bitvec = "1"
bytes = "1.11.1"
byteorder = "1.5"
Expand Down
1 change: 1 addition & 0 deletions python/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion rust/lance-index/src/scalar/inverted.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ use datafusion::execution::SendableRecordBatchStream;
pub use index::*;
use lance_core::{Result, cache::LanceCache};
pub use lance_tokenizer::Language;
pub use scorer::MemBM25Scorer;
pub use scorer::{MemBM25Scorer, Scorer};
pub use tokenizer::*;

use crate::scalar::inverted::query::{FtsSearchParams, Tokens};
Expand Down
11 changes: 11 additions & 0 deletions rust/lance/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ arrow-schema = { workspace = true }
arrow-select = { workspace = true }
async-recursion.workspace = true
async-trait.workspace = true
bitpacking = { workspace = true }
byteorder.workspace = true
bytes.workspace = true
chrono.workspace = true
Expand Down Expand Up @@ -236,5 +237,15 @@ harness = false
name = "mem_wal_shard_writer_backpressure"
harness = false

[[bench]]
name = "mem_wal_fineweb_fts"
path = "benches/mem_wal/fts/mem_wal_fineweb_fts.rs"
harness = false

[[bench]]
name = "mem_wal_fts_bench"
path = "benches/mem_wal/fts/mem_wal_fts_bench.rs"
harness = false

[lints]
workspace = true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not create a new bench folder, these should be mved under rust/lance/benches. We should probably create a mem_wal folder with specific subfolders to organize different benchmarks at this point

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Moved all the FTS bench files out of the top-level bench/ folder into rust/lance/benches/mem_wal/, with a fts/ subfolder for FTS-specific work:

rust/lance/benches/mem_wal/
  run_shard_writer_backpressure.sh
  fts/
    mem_wal_fts_bench.rs
    mem_wal_fineweb_fts.rs
    LuceneFtsBench.java
    run_fts_compare.sh
    run_fineweb_fts.sh

Cargo.toml's [[bench]] entries now point at the new paths via path = .... I kept this scoped to the bench files this PR introduces — the pre-existing flat mem_wal_*.rs benches are left in place to avoid an unrelated churn diff; happy to do a follow-up that consolidates those into mem_wal/ too.

Also scrubbed the driver scripts and bench doc comments of infra-specific references (S3 bucket names, instance paths, AWS region exports) — they now default to tmpdir-based paths and resolve the repo root via git rev-parse.

Binary file not shown.
Loading
Loading