Skip to content

Distributed IVF_RQ builds have no way of sharing the RaBitQ rotation across workers in Python #7012

@gstamatakis95

Description

@gstamatakis95

Summary

The engine-level capability for distributed IVF_RQ segment builds can't be driven from Python today. A distributed build requires every per-fragment segment to share one RaBitQ rotation so the segments are mergeable, and there is currently no Python-facing way to pin that rotation. As a result, a distributed IVF_RQ build driven from pylance produces segments with mutually incompatible rotations, which merge into a silently corrupt index (degraded recall, no error).

What already works

At the engine level the IVF family, including RQ, has distributed build, commit, merge, and segment-pruned query. On the Python side, create_index_uncommitted already lets callers pin the two other shared artifacts a distributed build needs:

  1. The IVF model, via ivf_centroids / precomputed_partitions_file
  2. The PQ codebook, via pq_codebook

Note: The one missing shared artifact for RQ is the rotation.

The gap

The RaBitQ rotation is generated during quantizer construction time and is not exposed as a build input. Consequently every per-fragment create_index_uncommitted call for IVF_RQ constructs its own quantizer with a fresh random rotation. The codes across segments aren't in a comparable space, so merge_existing_index_segments (or cross-segment query) yields incorrect results.

Proposed fix (minimal, mirrors pq_codebook)

Thread one shared, serialized rotation into the build params, exactly the way pq_codebook is threaded today/

rotation = build_rq_rotation(dimension=768, num_bits=1, dtype="float16")  # once, on the driver
seg_a = ds.create_index_uncommitted("vec", "IVF_RQ", name="idx",
                                    ivf_centroids=c, rq_rotation=rotation, fragment_ids=frags_a)
seg_b = ds.create_index_uncommitted("vec", "IVF_RQ", name="idx",
                                    ivf_centroids=c, rq_rotation=rotation, fragment_ids=frags_b)
merged = ds.merge_existing_index_segments([seg_a, seg_b])
ds = ds.commit_existing_index_segments("idx", "vec", [merged])

Note: Single-process create_index(index_type="IVF_RQ") is unaffected since the rotation never needs sharing there.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions