Cross-cutting data-structure rewrite tracked under #537 .
Current state
Doc IDs flow through the vector index as u64:
HnswGraph::id_to_index: AHashMap<u64, usize> and nodes: Vec<Vec<Vec<u64>>> (vector/index/hnsw/graph.rs:21,27).
ConcurrentHnswGraph::nodes: HashMap<u64, Vec<RwLock<Vec<u64>>>> (vector/index/hnsw/writer.rs:40-41).
Candidate / ResultCandidate carry id: u64.
IVF inverted lists hold Vec<(u64, String, Vector)>.
A single vector segment is capped at u32 entries elsewhere (IVF / Flat headers use vector_count: u32); the u64 is unnecessary inside the index.
Proposed direction
Define InternalId(u32) with TryFrom<u64> at the segment boundary.
HNSW neighbour lists: Vec<u32> (50% saving vs Vec<u64>).
IVF inverted list ids: Vec<u32>.
Pair with the CSR HNSW migration (perf(vector/index): Round-3 roadmap — Qdrant/LanceDB parity #535 children) so the neighbour arena is a single contiguous Vec<u32>.
Keep external u64 doc id at the searcher result boundary.
Acceptance
HNSW graph RAM drops by ~50% on the neighbour arrays.
Candidate heap packed format (X-04) becomes natural.
References
Lucene uses int ordinals throughout vector index.
Qdrant uses u32 PointOffset.
Cross-cutting data-structure rewrite tracked under #537.
Current state
Doc IDs flow through the vector index as
u64:HnswGraph::id_to_index: AHashMap<u64, usize>andnodes: Vec<Vec<Vec<u64>>>(vector/index/hnsw/graph.rs:21,27).ConcurrentHnswGraph::nodes: HashMap<u64, Vec<RwLock<Vec<u64>>>>(vector/index/hnsw/writer.rs:40-41).id: u64.Vec<(u64, String, Vector)>.A single vector segment is capped at u32 entries elsewhere (IVF / Flat headers use
vector_count: u32); the u64 is unnecessary inside the index.Proposed direction
InternalId(u32)withTryFrom<u64>at the segment boundary.Vec<u32>(50% saving vsVec<u64>).Vec<u32>.Vec<u32>.u64doc id at the searcher result boundary.Acceptance
References
intordinals throughout vector index.u32 PointOffset.