Skip to content

Expose build_index_metadata_from_segments (or commit_existing_index_segments) for two-phase vector-index commits outside the lance crate #6666

@ragnorc

Description

@ragnorc

Companion to #6658 (two-phase delete).

Request

Make lance::index::build_index_metadata_from_segments (currently pub(crate) at src/index.rs:111) public, OR expose a higher-level commit_existing_index_segments(...) Rust API that mirrors the Python pattern documented at lance.org/guide/distributed_indexing.

Context

CreateIndexBuilder::execute_uncommitted (src/index/create.rs:135) is pub and returns IndexMetadata — great. But for index types that take the segment-commit path (uses_segment_commit_path returning true: Vector, IvfPq, IvfSq, IvfFlat, IvfRq, IvfHnswFlat, IvfHnswPq, IvfHnswSq), the Operation::CreateIndex transaction must be constructed via:

let segments = ds.create_index_segment_builder()
    .with_segments(vec![new_idx.clone()])
    .build_all().await?;
let new_indices = build_index_metadata_from_segments(ds, &name, field_id, segments).await?;
TransactionBuilder::new(version, Operation::CreateIndex { new_indices, removed_indices }).build()

build_index_metadata_from_segments is pub(crate), so external callers cannot construct the transaction. This forces vector index builds to go through CreateIndexBuilder::execute() which inline-commits — incompatible with patterns like our omnigraph project that want a strict stage+commit separation across all writers.

The IndexSegmentBuilder (src/index/create.rs:568) and with_segments / build_all ARE pub, which suggests the intent was to expose the segment commit path. The pub(crate) on build_index_metadata_from_segments looks like an oversight.

Use case

omnigraph (a graph DB built on Lance) sits in front of Lance and wants to enforce by-construction at its trait boundary that no engine writer can call an inline-commit Lance API. Scalar indices (BTree, Inverted, Bitmap, NGram) work today via the simple branch of CreateIndexBuilder::execute. Vector indices are blocked.

A similar hard dependency exists on #6658 (two-phase delete). Both unblock the same architectural pattern: hoist Lance's stage+commit two-phase write to a load-bearing trait invariant in upper layers.

Suggested fix

Either:

  1. Mark build_index_metadata_from_segments pub (smallest change).
  2. Add a higher-level commit_existing_index_segments(...) Rust API per the Python distributed-indexing docs.

Option 1 is the surgical fix; option 2 mirrors the documented Python pattern.

Proposed signature (option 2)

```rust
impl<'a> IndexSegmentBuilder<'a> {
/// Convert a segment plan into a fully-formed Operation::CreateIndex
/// transaction without committing. Caller hands the returned
/// transaction to CommitBuilder::execute to advance HEAD.
pub async fn into_uncommitted_transaction(
self,
index_name: &str,
field_id: i32,
removed_indices: Vec,
) -> Result;
}
```

Happy to send a PR if either approach is preferred.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions