Summary
The engine-level capability for distributed IVF_RQ segment builds can't be driven from Python today. A distributed build requires every per-fragment segment to share one RaBitQ rotation so the segments are mergeable, and there is currently no Python-facing way to pin that rotation. As a result, a distributed IVF_RQ build driven from pylance produces segments with mutually incompatible rotations, which merge into a silently corrupt index (degraded recall, no error).
What already works
At the engine level the IVF family, including RQ, has distributed build, commit, merge, and segment-pruned query. On the Python side, create_index_uncommitted already lets callers pin the two other shared artifacts a distributed build needs:
- The IVF model, via ivf_centroids / precomputed_partitions_file
- The PQ codebook, via pq_codebook
Note: The one missing shared artifact for RQ is the rotation.
The gap
The RaBitQ rotation is generated during quantizer construction time and is not exposed as a build input. Consequently every per-fragment create_index_uncommitted call for IVF_RQ constructs its own quantizer with a fresh random rotation. The codes across segments aren't in a comparable space, so merge_existing_index_segments (or cross-segment query) yields incorrect results.
Proposed fix (minimal, mirrors pq_codebook)
Thread one shared, serialized rotation into the build params, exactly the way pq_codebook is threaded today/
rotation = build_rq_rotation(dimension=768, num_bits=1, dtype="float16") # once, on the driver
seg_a = ds.create_index_uncommitted("vec", "IVF_RQ", name="idx",
ivf_centroids=c, rq_rotation=rotation, fragment_ids=frags_a)
seg_b = ds.create_index_uncommitted("vec", "IVF_RQ", name="idx",
ivf_centroids=c, rq_rotation=rotation, fragment_ids=frags_b)
merged = ds.merge_existing_index_segments([seg_a, seg_b])
ds = ds.commit_existing_index_segments("idx", "vec", [merged])
Note: Single-process create_index(index_type="IVF_RQ") is unaffected since the rotation never needs sharing there.
Summary
The engine-level capability for distributed IVF_RQ segment builds can't be driven from Python today. A distributed build requires every per-fragment segment to share one RaBitQ rotation so the segments are mergeable, and there is currently no Python-facing way to pin that rotation. As a result, a distributed IVF_RQ build driven from pylance produces segments with mutually incompatible rotations, which merge into a silently corrupt index (degraded recall, no error).
What already works
At the engine level the IVF family, including RQ, has distributed build, commit, merge, and segment-pruned query. On the Python side, create_index_uncommitted already lets callers pin the two other shared artifacts a distributed build needs:
Note: The one missing shared artifact for RQ is the rotation.
The gap
The RaBitQ rotation is generated during quantizer construction time and is not exposed as a build input. Consequently every per-fragment create_index_uncommitted call for IVF_RQ constructs its own quantizer with a fresh random rotation. The codes across segments aren't in a comparable space, so merge_existing_index_segments (or cross-segment query) yields incorrect results.
Proposed fix (minimal, mirrors pq_codebook)
Thread one shared, serialized rotation into the build params, exactly the way pq_codebook is threaded today/
Note: Single-process create_index(index_type="IVF_RQ") is unaffected since the rotation never needs sharing there.