Summary
This PR adds distributed build support for bitmap scalar indexes.
Bitmap index build now supports two modes:
per-fragment: each worker builds one bitmap shard from a single fragment
shard-group: each worker builds one bitmap shard from an explicit fragment group using shard_id
Each worker writes an intermediate shard file under the shared index UUID. A later merge step scans all shard files, performs a streaming k-way merge on sorted keys, unions bitmaps for the same key, and writes the final bitmap_page_lookup.lance.
This keeps the distributed path memory-bounded and avoids materializing all shard state in memory at once.
Build Flow
Per-fragment mode
fragment 1 -> build -> part_(frag1<<32|1)_bitmap_page_lookup.lance
fragment 2 -> build -> part_(frag2<<32|1)_bitmap_page_lookup.lance
fragment 3 -> build -> part_(frag3<<32|1)_bitmap_page_lookup.lance
Shard-group mode
fragments [1,2] + shard_id=0 -> build -> part_(0<<32|0)_bitmap_page_lookup.lance
fragments [3,4] + shard_id=1 -> build -> part_(1<<32|0)_bitmap_page_lookup.lance
Merge flow
distributed bitmap build
+----------------+ +----------------+ +----------------+
| worker / frag1 | | worker / frag2 | | worker / shard |
| sorted values | | sorted values | | sorted values |
+-------+--------+ +-------+--------+ +-------+--------+
| | |
v v v
part_*_bitmap... part_*_bitmap... part_*_bitmap...
\ | /
\ | /
+-------------+-------------+
|
v
streaming k-way merge
same key => bitmap union
|
v
bitmap_page_lookup.lance
Key Changes
- Add bitmap distributed training parameters with optional shard_id
- Support both explicit shard-group build and implicit per-fragment build
- Merge shard files with a streaming k-way merge
Summary
This PR adds distributed build support for bitmap scalar indexes.
Bitmap index build now supports two modes:
per-fragment: each worker builds one bitmap shard from a single fragmentshard-group: each worker builds one bitmap shard from an explicit fragment group usingshard_idEach worker writes an intermediate shard file under the shared index UUID. A later merge step scans all shard files, performs a streaming k-way merge on sorted keys, unions bitmaps for the same key, and writes the final
bitmap_page_lookup.lance.This keeps the distributed path memory-bounded and avoids materializing all shard state in memory at once.
Build Flow
Per-fragment mode
Shard-group mode
Merge flow
Key Changes