Skip to content

Support distributed bitmap index build #6599

@zhangyue19921010

Description

@zhangyue19921010

Summary

This PR adds distributed build support for bitmap scalar indexes.

Bitmap index build now supports two modes:

  • per-fragment: each worker builds one bitmap shard from a single fragment
  • shard-group: each worker builds one bitmap shard from an explicit fragment group using shard_id

Each worker writes an intermediate shard file under the shared index UUID. A later merge step scans all shard files, performs a streaming k-way merge on sorted keys, unions bitmaps for the same key, and writes the final bitmap_page_lookup.lance.

This keeps the distributed path memory-bounded and avoids materializing all shard state in memory at once.

Build Flow

Per-fragment mode

fragment 1  -> build -> part_(frag1<<32|1)_bitmap_page_lookup.lance
fragment 2  -> build -> part_(frag2<<32|1)_bitmap_page_lookup.lance
fragment 3  -> build -> part_(frag3<<32|1)_bitmap_page_lookup.lance

Shard-group mode

fragments [1,2] + shard_id=0 -> build -> part_(0<<32|0)_bitmap_page_lookup.lance
fragments [3,4] + shard_id=1 -> build -> part_(1<<32|0)_bitmap_page_lookup.lance

Merge flow

                distributed bitmap build
     +----------------+   +----------------+   +----------------+
     | worker / frag1 |   | worker / frag2 |   | worker / shard |
     | sorted values  |   | sorted values  |   | sorted values  |
     +-------+--------+   +-------+--------+   +-------+--------+
             |                    |                    |
             v                    v                    v
      part_*_bitmap...     part_*_bitmap...     part_*_bitmap...
               \               |               /
                \              |              /
                 +-------------+-------------+
                               |
                               v
                    streaming k-way merge
                  same key => bitmap union
                               |
                               v
                    bitmap_page_lookup.lance

Key Changes

  • Add bitmap distributed training parameters with optional shard_id
  • Support both explicit shard-group build and implicit per-fragment build
  • Merge shard files with a streaming k-way merge

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions