Remove GraphDataType from diskann-providers#950
Conversation
Replace GraphDataType bounds with direct type parameters in the three diskann-providers files that used it as a generic bound: - MemoryVectorProviderAsync<Data: GraphDataType> → <T: VectorRepr> - FastMemoryVectorProviderAsync<Data: GraphDataType> → <T: VectorRepr> - VectorDataIterator<SP, Data: GraphDataType> → <SP, T: VectorRepr, A = ()> This eliminates the AdHoc<T> wrapper from all internal usage sites and inmem provider type aliases (FullPrecisionStore<T>), reducing redundant monomorphizations and making generic bounds self-documenting. GraphDataType trait definition and re-export remain in diskann-providers for downstream consumers (diskann-disk, diskann-tools).
…tive_contrast - diskann-tools: replace 4 concrete GraphDataType impls with type aliases to AdHoc<T> - diskann-benchmark: replace GraphData<T> with AdHoc<T>, delete graph_data_type.rs - diskann-tools/relative_contrast.rs: use T: VectorRepr instead of Data: GraphDataType since only VectorDataType is accessed
- Move trait definition to diskann-disk/src/data_model/graph_data_types.rs - Move test concrete types (3 used) to diskann-disk/src/test_utils.rs, remove 6 unused types - Delete graph_data_types.rs, graph_data_type_utils.rs, and empty traits/ module from diskann-providers - Update all imports across diskann-disk, diskann-tools, diskann-benchmark - Clean up stale AdjacencyListType doc comments - Fix benchmark import brace structure
a709cc1 to
ab0bf34
Compare
|
@microsoft-github-policy-service agree company="Microsoft" |
Replace GraphDataF32VectorUnitData and GraphDataMinMaxVectorUnitData with type aliases to AdHoc<f32> and AdHoc<MinMax8>. Keep GraphDataF32VectorU32Data as a concrete type since it needs AssociatedDataType = u32 which AdHoc cannot express.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Refactors the codebase to relocate the GraphDataType/AdHoc disk-index data model into diskann-disk and removes GraphDataType coupling from diskann-providers internals by switching to direct vector/associated-data type parameters.
Changes:
- Moved
GraphDataType/AdHocusage fromdiskann-providerstodiskann-disk, updating imports across tools/benchmark/disk code. - Reworked
VectorDataIteratorand async in-memory vector providers to useT: VectorRepr(andA) directly instead ofGraphDataType. - Cleaned up tool/benchmark helpers by replacing concrete
GraphDataTypeimplementations withAdHoc<T>or rawTtype parameters.
Reviewed changes
Copilot reviewed 44 out of 45 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| diskann-tools/src/utils/search_disk_index.rs | Updates GraphDataType import to come from diskann-disk. |
| diskann-tools/src/utils/relative_contrast.rs | Removes GraphDataType dependency; makes computations generic on T: VectorRepr; updates tests to use AdHoc<Half>/Half. |
| diskann-tools/src/utils/range_search_disk_index.rs | Swaps GraphDataType import to diskann-disk. |
| diskann-tools/src/utils/ground_truth.rs | Updates VectorDataIterator instantiation to use direct VectorDataType/AssociatedDataType parameters; simplifies generics. |
| diskann-tools/src/utils/graph_data_types.rs | Replaces concrete GraphDataType impl structs with AdHoc<T> type aliases. |
| diskann-tools/src/utils/build_pq.rs | Updates GraphDataType import to diskann-disk. |
| diskann-tools/src/utils/build_disk_index.rs | Updates GraphDataType import to diskann-disk and removes providers trait import. |
| diskann-tools/src/bin/relative_contrast.rs | Switches CLI dispatch to call compute_relative_contrast::<T> directly. |
| diskann-providers/src/utils/vector_data_iterator.rs | Replaces GraphDataType generic with (T: VectorRepr, A) generics; updates iteration and tests accordingly. |
| diskann-providers/src/utils/medoid.rs | Drops AdHoc<T> usage and uses VectorDataIterator<_, T> directly. |
| diskann-providers/src/test_utils/mod.rs | Removes exports of graph_data_type_utils types. |
| diskann-providers/src/test_utils/graph_data_type_utils.rs | Deletes old providers-side concrete GraphDataType test types. |
| diskann-providers/src/storage/bin.rs | Drops AdHoc<T> in VectorDataIterator usage. |
| diskann-providers/src/model/graph/traits/mod.rs | Deletes the providers traits module entry point. |
| diskann-providers/src/model/graph/traits/graph_data_types.rs | Leaves behind the providers-side GraphDataType/AdHoc definitions (now apparently orphaned). |
| diskann-providers/src/model/graph/provider/async_/memory_vector_provider.rs | Refactors provider to MemoryVectorProviderAsync<T: VectorRepr> API. |
| diskann-providers/src/model/graph/provider/async_/inmem/spherical.rs | Removes AdHoc<T> usage from full-precision adapter signatures. |
| diskann-providers/src/model/graph/provider/async_/inmem/scalar.rs | Removes AdHoc<T> usage from full-precision adapter signatures. |
| diskann-providers/src/model/graph/provider/async_/inmem/product.rs | Removes AdHoc<T> usage from full-precision adapter signatures. |
| diskann-providers/src/model/graph/provider/async_/inmem/full_precision.rs | Updates type aliases and traits to use FastMemoryVectorProviderAsync<T> directly. |
| diskann-providers/src/model/graph/provider/async_/fast_memory_vector_provider.rs | Refactors provider to FastMemoryVectorProviderAsync<T: VectorRepr> API. |
| diskann-providers/src/model/graph/mod.rs | Removes pub mod traits; from providers graph module. |
| diskann-providers/src/index/diskann_async.rs | Updates internal tests to construct VectorDataIterator::<_, f32>. |
| diskann-providers/benches/benchmarks_iai/diskann_iai.rs | Updates bench iterator to VectorDataIterator::<_, f32>. |
| diskann-providers/benches/benchmarks/diskann_bench.rs | Updates bench iterator to VectorDataIterator::<_, f32>. |
| diskann-disk/src/test_utils.rs | Adds disk-side test-only GraphDataType implementations/types. |
| diskann-disk/src/storage/disk_index_writer.rs | Updates GraphDataType import to diskann-disk; repoints tests to disk-side test utils. |
| diskann-disk/src/search/traits/vertex_provider_factory.rs | Updates GraphDataType import to diskann-disk. |
| diskann-disk/src/search/traits/vertex_provider.rs | Updates GraphDataType import to diskann-disk. |
| diskann-disk/src/search/provider/disk_vertex_provider_factory.rs | Updates GraphDataType import; updates tests to disk-side test utils. |
| diskann-disk/src/search/provider/disk_vertex_provider.rs | Updates GraphDataType import; updates tests to disk-side test utils. |
| diskann-disk/src/search/provider/disk_provider.rs | Updates GraphDataType import; updates tests to disk-side test utils. |
| diskann-disk/src/search/provider/cached_disk_vertex_provider.rs | Updates GraphDataType import to diskann-disk. |
| diskann-disk/src/lib.rs | Exposes test_utils under cfg(test). |
| diskann-disk/src/data_model/mod.rs | Adds graph_data_types module and re-exports GraphDataType/AdHoc. |
| diskann-disk/src/data_model/cache.rs | Updates GraphDataType import; updates tests to disk-side test utils. |
| diskann-disk/src/build/configuration/filter_parameter.rs | Updates GraphDataType import; updates tests to disk-side test utils. |
| diskann-disk/src/build/builder/tests.rs | Points test type imports to disk-side test utils. |
| diskann-disk/src/build/builder/quantizer.rs | Updates GraphDataType import to diskann-disk. |
| diskann-disk/src/build/builder/core.rs | Updates GraphDataType import; points test type imports to disk-side test utils. |
| diskann-disk/src/build/builder/build.rs | Updates GraphDataType import; drops AdHoc<T> usage in iterators. |
| diskann-benchmark/src/backend/disk_index/search.rs | Removes benchmark-local GraphData wrapper in favor of AdHoc<T>. |
| diskann-benchmark/src/backend/disk_index/mod.rs | Removes the benchmark-local graph_data_type module. |
| diskann-benchmark/src/backend/disk_index/graph_data_type.rs | Deletes benchmark-local GraphDataType wrapper. |
| diskann-benchmark/src/backend/disk_index/build.rs | Removes benchmark-local GraphData wrapper in favor of AdHoc<T>. |
Comments suppressed due to low confidence (2)
diskann-providers/src/utils/vector_data_iterator.rs:1
- The
associated_data_streamformat description is ambiguous/inconsistent with how the iterator operates (it reads/seeksassociated_data_length * size_of::<A>()bytes per point). Please clarify in the doc comment whetherassociated_data_lengthis a count ofAelements or a byte length, and document that the iterator assumes a fixed-size per-point associated payload compatible with thesize_of::<A>()-based reading strategy.
diskann-providers/src/utils/vector_data_iterator.rs:1 - With
VectorDataIteratornow generic over(T, A), the tests in this file appear to cover only scalar associated data (u32) and the no-associated-data path. Add a test case exercising a non-scalar fixed-size associated type (e.g.[u8; 22]) to ensure thesize_of::<A>()-based read/seek logic and deserialization continue to work after the generic refactor.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #950 +/- ##
==========================================
- Coverage 89.33% 89.32% -0.01%
==========================================
Files 448 448
Lines 83627 83590 -37
==========================================
- Hits 74704 74670 -34
+ Misses 8923 8920 -3
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
## Summary Addresses the `graph_data_types.rs` cleanup item from #899. - **Decouple diskann-providers internal code**: Replace `GraphDataType` bounds with direct type params (`T: VectorRepr`, `A`) in `VectorDataIterator`, `MemoryVectorProviderAsync`, `FastMemoryVectorProviderAsync`. Eliminates the `AdHoc<T>` wrapper from all internal usage and inmem provider type aliases. - **Move trait to diskann-disk**: `GraphDataType` + `AdHoc` now live in `diskann-disk::data_model`. Delete trait definition, test types, and empty `traits/` module from diskann-providers. - **Replace boilerplate**: Concrete `GraphDataType` impls in diskann-tools replaced with `type Alias = AdHoc<T>`. Benchmark's `GraphData<T>` replaced with `AdHoc<T>` directly. - **Decouple relative_contrast.rs**: Use `T: VectorRepr` instead of `Data: GraphDataType` since only `VectorDataType` is accessed. After this PR, `GraphDataType` exists only in `diskann-disk` (where it belongs as a disk-index concept) and `diskann-tools` (which calls disk-index APIs).
Bumping to 0.50.1 to propagate changes to consumers. Changes since previous bump: ## What's Changed * Add more agentic guard rails by @hildebrandmw in #871 * Cleanup `diskann-benchmark-runner` and friends. by @hildebrandmw in #865 * Use `--all-targets` for the no-default-features CI run. by @hildebrandmw in #874 * Remove unused `normalizing_util.rs` from `diskann-providers` by @Copilot in #902 * Benchmark Support for A/B Tests by @hildebrandmw in #900 * [diskann-garnet] Bump diskann-garnet to 1.0.26 by @tiagonapoli in #925 * Remove the `AdjacencyList` from `diskann-providers` by @hildebrandmw in #915 * [PQ cleanup] Part 1: Move pq_scratch, quantizer_preprocess and pq_dataset to `diskann-disk` by @arkrishn94 in #930 * Forbid Debug in diskann-benchmark by @arrayka in #914 * Remove DebugProvider by @JordanMaples in #923 * [diskann-garnet] Create workflow to publish to nuget by @tiagonapoli in #926 * Move k-means implementation from diskann-providers to diskann-disk by @Copilot in #933 * Inline minmax distance evaluations by @arkrishn94 in #935 * Use `rust-toolchain.toml` in CI by @hildebrandmw in #934 * Add a globally blocking CI gate. by @hildebrandmw in #932 * Remove `utils/math_util.rs` from `diskann-providers` by @Copilot in #921 * Bump rand from 0.9.2 to 0.9.3 by @dependabot[bot] in #945 * Remove OPQ and friends by @arkrishn94 in #947 * Migrate test_flaky_consolidate from diskann_providers to diskann by @JordanMaples in #942 * Remove GraphDataType from diskann-providers by @wuw92 in #950 * Remove unused method extract_best_l_candidates in NeighborPriorityQueue by @doliawu in #951 * Add `Debug` bounds to `VectorRepr`'s distance GATs. by @hildebrandmw in #948 * Add benchmark pipeline with Rust-native A/B validation by @YuanyuanTian-hh in #912 * Remove unnecessary `Default` bound from `Neighbor`'s `VectorIdType` by @doliawu in #956 * Replace `AlignedBoxWithSlice` with plain `Vec` / `Matrix` where alignment is unused by @wuw92 in #955 * [minmax] 8-bit benchmark by @arkrishn94 in #959 * Add `MultiInsertStrategy` implementations for `BfTreeProvider` by @hildebrandmw in #949 * Replace `AlignedBoxWithSlice` with `Vec` in PQScratch and disk fp vector caches by @wuw92 in #960 * Adding unit tests for paged_search by @JordanMaples in #962 * Remove AlignedBoxWithSlice wrapper and add alias to Poly<[T], AlignedAllocator> by @JordanMaples in #965 * Remove synthetic/structured data generation from diskann-providers by @JordanMaples in #963 * added tests and some baselines for range_search by @JordanMaples in #961 ## New Contributors * @JordanMaples made their first contribution in #923 * @wuw92 made their first contribution in #950 * @doliawu made their first contribution in #951 * @YuanyuanTian-hh made their first contribution in #912 **Full Changelog**: v0.50.0...v0.50.1
Summary
Addresses the
graph_data_types.rscleanup item from #899.GraphDataTypebounds with direct type params (T: VectorRepr,A) inVectorDataIterator,MemoryVectorProviderAsync,FastMemoryVectorProviderAsync. Eliminates theAdHoc<T>wrapper from all internal usage and inmem provider type aliases.GraphDataType+AdHocnow live indiskann-disk::data_model. Delete trait definition, test types, and emptytraits/module from diskann-providers.GraphDataTypeimpls in diskann-tools replaced withtype Alias = AdHoc<T>. Benchmark'sGraphData<T>replaced withAdHoc<T>directly.T: VectorReprinstead ofData: GraphDataTypesince onlyVectorDataTypeis accessed.After this PR,
GraphDataTypeexists only indiskann-disk(where it belongs as a disk-index concept) anddiskann-tools(which calls disk-index APIs).