Round 3 of the laurus perf push surfaces five data-structure rewrites that cut across the lexical + vector boundary. Splitting them into per-area umbrellas creates merge conflicts and obscures the "shared infra" theme; this umbrella tracks them together.
Related per-area umbrellas: #533 (lexical/index), #534 (lexical/search), #535 (vector/index), and the vector/search umbrella filed alongside this one.
Scope
- DeletionBitmap → Roaring/FixedBitSet (cross-cutting with lexical + vector index umbrellas): `RwLock<AHashSet>` is shared by both lexical and vector index. Single migration; sparse/dense decision via density heuristic.
- FieldId(u16) registry (cross-cutting with lexical/index + vector/index + vector/search): replace `HashMap<String, _>` field lookup throughout the lexical writer, vector storage pools, and search hot paths. `String` field names retained only at the public API boundary.
- Internal-id u32 migration (cross-cutting with vector index + search): doc IDs as `u64` end-to-end inside the vector index even though they're segment-local ordinals; same opportunity exists in the lexical search packed top-K. `InternalId = u32` everywhere downstream of segment open.
- Packed-u64 top-K collector & heap (cross-cutting with lexical/search + vector/search): both the lexical `TopDocsCollector` and the HNSW `Candidate / ResultCandidate` heap could use `(score_bits << 32) | doc_id` packed u64 entries for single-integer compares (no NaN branch).
- Columnar / packed-bits DocValues column format (cross-cutting with lexical/index + lexical/search): per-segment Numeric / Sorted / Bytes columns with `bitpacking` (already a dep) for the lexical side; reused by vector index for ordinal stores.
Why this is a separate umbrella
Each item touches both lexical and vector code paths. They are not additive perf improvements — they unlock several sub-issues each in the per-area umbrellas (e.g. `TopFieldCollector` ordinals need the columnar DV; `filterable HNSW` benefits from the Roaring liveDocs; packed candidate heaps need u32 internal IDs).
Exit criteria
- All cross-cutting sub-issues below closed or explicitly deferred.
- Per-umbrella benchmarks show the dependent issues unlocked (e.g. `TopFieldCollector` via ordinal column delivers 2x sort throughput on numeric fields).
Sub-issues
| ID |
Issue |
Size |
Title |
| X-01 |
#684 |
X |
perf(maintenance): migrate DeletionBitmap to RoaringBitmap (lexical + vector) |
| X-02 |
#685 |
X |
perf: FieldId(u16) registry across crates (drop HashMap<String, _> from hot paths) |
| X-03 |
#686 |
X |
perf(vector): migrate to InternalId = u32 throughout vector index + search |
| X-04 |
#687 |
X |
perf: packed-u64 top-K candidate / heap entries (lexical + vector) |
| X-05 |
#688 |
X |
perf(lexical): columnar / packed-bits DocValues column format |
Round-3 investigation report: ~/.claude/tasks/laurus/20260523_perf_round3_audit/.
Round 3 of the laurus perf push surfaces five data-structure rewrites that cut across the lexical + vector boundary. Splitting them into per-area umbrellas creates merge conflicts and obscures the "shared infra" theme; this umbrella tracks them together.
Related per-area umbrellas: #533 (lexical/index), #534 (lexical/search), #535 (vector/index), and the vector/search umbrella filed alongside this one.
Scope
Why this is a separate umbrella
Each item touches both lexical and vector code paths. They are not additive perf improvements — they unlock several sub-issues each in the per-area umbrellas (e.g. `TopFieldCollector` ordinals need the columnar DV; `filterable HNSW` benefits from the Roaring liveDocs; packed candidate heaps need u32 internal IDs).
Exit criteria
Sub-issues
Round-3 investigation report:
~/.claude/tasks/laurus/20260523_perf_round3_audit/.