Skip to content

Remove GraphDataType from diskann-providers#950

Merged
wuw92 merged 7 commits intomainfrom
wewu2/remove-graph-data-type-from-providers
Apr 16, 2026
Merged

Remove GraphDataType from diskann-providers#950
wuw92 merged 7 commits intomainfrom
wewu2/remove-graph-data-type-from-providers

Conversation

@wuw92
Copy link
Copy Markdown
Contributor

@wuw92 wuw92 commented Apr 15, 2026

Summary

Addresses the graph_data_types.rs cleanup item from #899.

  • Decouple diskann-providers internal code: Replace GraphDataType bounds with direct type params (T: VectorRepr, A) in VectorDataIterator, MemoryVectorProviderAsync, FastMemoryVectorProviderAsync. Eliminates the AdHoc<T> wrapper from all internal usage and inmem provider type aliases.
  • Move trait to diskann-disk: GraphDataType + AdHoc now live in diskann-disk::data_model. Delete trait definition, test types, and empty traits/ module from diskann-providers.
  • Replace boilerplate: Concrete GraphDataType impls in diskann-tools replaced with type Alias = AdHoc<T>. Benchmark's GraphData<T> replaced with AdHoc<T> directly.
  • Decouple relative_contrast.rs: Use T: VectorRepr instead of Data: GraphDataType since only VectorDataType is accessed.

After this PR, GraphDataType exists only in diskann-disk (where it belongs as a disk-index concept) and diskann-tools (which calls disk-index APIs).

Wei Wu (from Dev Box) added 4 commits April 15, 2026 17:59
Replace GraphDataType bounds with direct type parameters in the three
diskann-providers files that used it as a generic bound:

- MemoryVectorProviderAsync<Data: GraphDataType> → <T: VectorRepr>
- FastMemoryVectorProviderAsync<Data: GraphDataType> → <T: VectorRepr>
- VectorDataIterator<SP, Data: GraphDataType> → <SP, T: VectorRepr, A = ()>

This eliminates the AdHoc<T> wrapper from all internal usage sites and
inmem provider type aliases (FullPrecisionStore<T>), reducing redundant
monomorphizations and making generic bounds self-documenting.

GraphDataType trait definition and re-export remain in diskann-providers
for downstream consumers (diskann-disk, diskann-tools).
…tive_contrast

- diskann-tools: replace 4 concrete GraphDataType impls with type
  aliases to AdHoc<T>
- diskann-benchmark: replace GraphData<T> with AdHoc<T>, delete
  graph_data_type.rs
- diskann-tools/relative_contrast.rs: use T: VectorRepr instead of
  Data: GraphDataType since only VectorDataType is accessed
- Move trait definition to diskann-disk/src/data_model/graph_data_types.rs
- Move test concrete types (3 used) to diskann-disk/src/test_utils.rs,
  remove 6 unused types
- Delete graph_data_types.rs, graph_data_type_utils.rs, and empty
  traits/ module from diskann-providers
- Update all imports across diskann-disk, diskann-tools, diskann-benchmark
- Clean up stale AdjacencyListType doc comments
- Fix benchmark import brace structure
@wuw92 wuw92 force-pushed the wewu2/remove-graph-data-type-from-providers branch from a709cc1 to ab0bf34 Compare April 15, 2026 10:01
@wuw92 wuw92 closed this Apr 15, 2026
@wuw92 wuw92 reopened this Apr 15, 2026
@wuw92
Copy link
Copy Markdown
Contributor Author

wuw92 commented Apr 15, 2026

@microsoft-github-policy-service agree company="Microsoft"

Replace GraphDataF32VectorUnitData and GraphDataMinMaxVectorUnitData
with type aliases to AdHoc<f32> and AdHoc<MinMax8>. Keep
GraphDataF32VectorU32Data as a concrete type since it needs
AssociatedDataType = u32 which AdHoc cannot express.
@wuw92 wuw92 linked an issue Apr 15, 2026 that may be closed by this pull request
@wuw92 wuw92 marked this pull request as ready for review April 15, 2026 11:37
@wuw92 wuw92 requested review from a team and Copilot April 15, 2026 11:37
@wuw92 wuw92 self-assigned this Apr 15, 2026
@wuw92 wuw92 added the enhancement New feature or request label Apr 15, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Refactors the codebase to relocate the GraphDataType/AdHoc disk-index data model into diskann-disk and removes GraphDataType coupling from diskann-providers internals by switching to direct vector/associated-data type parameters.

Changes:

  • Moved GraphDataType/AdHoc usage from diskann-providers to diskann-disk, updating imports across tools/benchmark/disk code.
  • Reworked VectorDataIterator and async in-memory vector providers to use T: VectorRepr (and A) directly instead of GraphDataType.
  • Cleaned up tool/benchmark helpers by replacing concrete GraphDataType implementations with AdHoc<T> or raw T type parameters.

Reviewed changes

Copilot reviewed 44 out of 45 changed files in this pull request and generated no comments.

Show a summary per file
File Description
diskann-tools/src/utils/search_disk_index.rs Updates GraphDataType import to come from diskann-disk.
diskann-tools/src/utils/relative_contrast.rs Removes GraphDataType dependency; makes computations generic on T: VectorRepr; updates tests to use AdHoc<Half>/Half.
diskann-tools/src/utils/range_search_disk_index.rs Swaps GraphDataType import to diskann-disk.
diskann-tools/src/utils/ground_truth.rs Updates VectorDataIterator instantiation to use direct VectorDataType/AssociatedDataType parameters; simplifies generics.
diskann-tools/src/utils/graph_data_types.rs Replaces concrete GraphDataType impl structs with AdHoc<T> type aliases.
diskann-tools/src/utils/build_pq.rs Updates GraphDataType import to diskann-disk.
diskann-tools/src/utils/build_disk_index.rs Updates GraphDataType import to diskann-disk and removes providers trait import.
diskann-tools/src/bin/relative_contrast.rs Switches CLI dispatch to call compute_relative_contrast::<T> directly.
diskann-providers/src/utils/vector_data_iterator.rs Replaces GraphDataType generic with (T: VectorRepr, A) generics; updates iteration and tests accordingly.
diskann-providers/src/utils/medoid.rs Drops AdHoc<T> usage and uses VectorDataIterator<_, T> directly.
diskann-providers/src/test_utils/mod.rs Removes exports of graph_data_type_utils types.
diskann-providers/src/test_utils/graph_data_type_utils.rs Deletes old providers-side concrete GraphDataType test types.
diskann-providers/src/storage/bin.rs Drops AdHoc<T> in VectorDataIterator usage.
diskann-providers/src/model/graph/traits/mod.rs Deletes the providers traits module entry point.
diskann-providers/src/model/graph/traits/graph_data_types.rs Leaves behind the providers-side GraphDataType/AdHoc definitions (now apparently orphaned).
diskann-providers/src/model/graph/provider/async_/memory_vector_provider.rs Refactors provider to MemoryVectorProviderAsync<T: VectorRepr> API.
diskann-providers/src/model/graph/provider/async_/inmem/spherical.rs Removes AdHoc<T> usage from full-precision adapter signatures.
diskann-providers/src/model/graph/provider/async_/inmem/scalar.rs Removes AdHoc<T> usage from full-precision adapter signatures.
diskann-providers/src/model/graph/provider/async_/inmem/product.rs Removes AdHoc<T> usage from full-precision adapter signatures.
diskann-providers/src/model/graph/provider/async_/inmem/full_precision.rs Updates type aliases and traits to use FastMemoryVectorProviderAsync<T> directly.
diskann-providers/src/model/graph/provider/async_/fast_memory_vector_provider.rs Refactors provider to FastMemoryVectorProviderAsync<T: VectorRepr> API.
diskann-providers/src/model/graph/mod.rs Removes pub mod traits; from providers graph module.
diskann-providers/src/index/diskann_async.rs Updates internal tests to construct VectorDataIterator::<_, f32>.
diskann-providers/benches/benchmarks_iai/diskann_iai.rs Updates bench iterator to VectorDataIterator::<_, f32>.
diskann-providers/benches/benchmarks/diskann_bench.rs Updates bench iterator to VectorDataIterator::<_, f32>.
diskann-disk/src/test_utils.rs Adds disk-side test-only GraphDataType implementations/types.
diskann-disk/src/storage/disk_index_writer.rs Updates GraphDataType import to diskann-disk; repoints tests to disk-side test utils.
diskann-disk/src/search/traits/vertex_provider_factory.rs Updates GraphDataType import to diskann-disk.
diskann-disk/src/search/traits/vertex_provider.rs Updates GraphDataType import to diskann-disk.
diskann-disk/src/search/provider/disk_vertex_provider_factory.rs Updates GraphDataType import; updates tests to disk-side test utils.
diskann-disk/src/search/provider/disk_vertex_provider.rs Updates GraphDataType import; updates tests to disk-side test utils.
diskann-disk/src/search/provider/disk_provider.rs Updates GraphDataType import; updates tests to disk-side test utils.
diskann-disk/src/search/provider/cached_disk_vertex_provider.rs Updates GraphDataType import to diskann-disk.
diskann-disk/src/lib.rs Exposes test_utils under cfg(test).
diskann-disk/src/data_model/mod.rs Adds graph_data_types module and re-exports GraphDataType/AdHoc.
diskann-disk/src/data_model/cache.rs Updates GraphDataType import; updates tests to disk-side test utils.
diskann-disk/src/build/configuration/filter_parameter.rs Updates GraphDataType import; updates tests to disk-side test utils.
diskann-disk/src/build/builder/tests.rs Points test type imports to disk-side test utils.
diskann-disk/src/build/builder/quantizer.rs Updates GraphDataType import to diskann-disk.
diskann-disk/src/build/builder/core.rs Updates GraphDataType import; points test type imports to disk-side test utils.
diskann-disk/src/build/builder/build.rs Updates GraphDataType import; drops AdHoc<T> usage in iterators.
diskann-benchmark/src/backend/disk_index/search.rs Removes benchmark-local GraphData wrapper in favor of AdHoc<T>.
diskann-benchmark/src/backend/disk_index/mod.rs Removes the benchmark-local graph_data_type module.
diskann-benchmark/src/backend/disk_index/graph_data_type.rs Deletes benchmark-local GraphDataType wrapper.
diskann-benchmark/src/backend/disk_index/build.rs Removes benchmark-local GraphData wrapper in favor of AdHoc<T>.
Comments suppressed due to low confidence (2)

diskann-providers/src/utils/vector_data_iterator.rs:1

  • The associated_data_stream format description is ambiguous/inconsistent with how the iterator operates (it reads/seeks associated_data_length * size_of::<A>() bytes per point). Please clarify in the doc comment whether associated_data_length is a count of A elements or a byte length, and document that the iterator assumes a fixed-size per-point associated payload compatible with the size_of::<A>()-based reading strategy.
    diskann-providers/src/utils/vector_data_iterator.rs:1
  • With VectorDataIterator now generic over (T, A), the tests in this file appear to cover only scalar associated data (u32) and the no-associated-data path. Add a test case exercising a non-scalar fixed-size associated type (e.g. [u8; 22]) to ensure the size_of::<A>()-based read/seek logic and deserialization continue to work after the generic refactor.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 15, 2026

Codecov Report

❌ Patch coverage is 84.16667% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.32%. Comparing base (f8b80e4) to head (fe8c807).

Files with missing lines Patch % Lines
diskann-tools/src/utils/ground_truth.rs 42.30% 15 Missing ⚠️
diskann-tools/src/bin/relative_contrast.rs 0.00% 4 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #950      +/-   ##
==========================================
- Coverage   89.33%   89.32%   -0.01%     
==========================================
  Files         448      448              
  Lines       83627    83590      -37     
==========================================
- Hits        74704    74670      -34     
+ Misses       8923     8920       -3     
Flag Coverage Δ
miri 89.32% <84.16%> (-0.01%) ⬇️
unittests 89.16% <84.16%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-benchmark/src/backend/disk_index/mod.rs 100.00% <ø> (ø)
diskann-disk/src/build/builder/build.rs 94.15% <100.00%> (-0.01%) ⬇️
diskann-disk/src/build/builder/core.rs 95.25% <ø> (ø)
diskann-disk/src/build/builder/quantizer.rs 90.52% <ø> (ø)
...n-disk/src/build/configuration/filter_parameter.rs 100.00% <ø> (ø)
diskann-disk/src/data_model/cache.rs 100.00% <ø> (ø)
...src/search/provider/cached_disk_vertex_provider.rs 94.11% <ø> (ø)
diskann-disk/src/search/provider/disk_provider.rs 91.02% <ø> (ø)
...n-disk/src/search/provider/disk_vertex_provider.rs 83.95% <ø> (ø)
...rc/search/provider/disk_vertex_provider_factory.rs 95.67% <ø> (ø)
... and 17 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@hildebrandmw hildebrandmw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@wuw92 wuw92 merged commit 12738e2 into main Apr 16, 2026
24 checks passed
@wuw92 wuw92 deleted the wewu2/remove-graph-data-type-from-providers branch April 16, 2026 07:56
hailangx pushed a commit that referenced this pull request Apr 16, 2026
## Summary

Addresses the `graph_data_types.rs` cleanup item from #899.

- **Decouple diskann-providers internal code**: Replace `GraphDataType`
bounds with direct type params (`T: VectorRepr`, `A`) in
`VectorDataIterator`, `MemoryVectorProviderAsync`,
`FastMemoryVectorProviderAsync`. Eliminates the `AdHoc<T>` wrapper from
all internal usage and inmem provider type aliases.
- **Move trait to diskann-disk**: `GraphDataType` + `AdHoc` now live in
`diskann-disk::data_model`. Delete trait definition, test types, and
empty `traits/` module from diskann-providers.
- **Replace boilerplate**: Concrete `GraphDataType` impls in
diskann-tools replaced with `type Alias = AdHoc<T>`. Benchmark's
`GraphData<T>` replaced with `AdHoc<T>` directly.
- **Decouple relative_contrast.rs**: Use `T: VectorRepr` instead of
`Data: GraphDataType` since only `VectorDataType` is accessed.

After this PR, `GraphDataType` exists only in `diskann-disk` (where it
belongs as a disk-index concept) and `diskann-tools` (which calls
disk-index APIs).
@arkrishn94 arkrishn94 mentioned this pull request Apr 22, 2026
arkrishn94 added a commit that referenced this pull request Apr 22, 2026
Bumping to 0.50.1 to propagate changes to consumers.

Changes since previous bump: 

## What's Changed
* Add more agentic guard rails by @hildebrandmw in
#871
* Cleanup `diskann-benchmark-runner` and friends. by @hildebrandmw in
#865
* Use `--all-targets` for the no-default-features CI run. by
@hildebrandmw in #874
* Remove unused `normalizing_util.rs` from `diskann-providers` by
@Copilot in #902
* Benchmark Support for A/B Tests by @hildebrandmw in
#900
* [diskann-garnet] Bump diskann-garnet to 1.0.26 by @tiagonapoli in
#925
* Remove the `AdjacencyList` from `diskann-providers` by @hildebrandmw
in #915
* [PQ cleanup] Part 1: Move pq_scratch, quantizer_preprocess and
pq_dataset to `diskann-disk` by @arkrishn94 in
#930
* Forbid Debug in diskann-benchmark by @arrayka in
#914
* Remove DebugProvider by @JordanMaples in
#923
* [diskann-garnet] Create workflow to publish to nuget by @tiagonapoli
in #926
* Move k-means implementation from diskann-providers to diskann-disk by
@Copilot in #933
* Inline minmax distance evaluations by @arkrishn94 in
#935
* Use `rust-toolchain.toml` in CI by @hildebrandmw in
#934
* Add a globally blocking CI gate. by @hildebrandmw in
#932
* Remove `utils/math_util.rs` from `diskann-providers` by @Copilot in
#921
* Bump rand from 0.9.2 to 0.9.3 by @dependabot[bot] in
#945
* Remove OPQ and friends by @arkrishn94 in
#947
* Migrate test_flaky_consolidate from diskann_providers to diskann by
@JordanMaples in #942
* Remove GraphDataType from diskann-providers by @wuw92 in
#950
* Remove unused method extract_best_l_candidates in
NeighborPriorityQueue by @doliawu in
#951
* Add `Debug` bounds to `VectorRepr`'s distance GATs. by @hildebrandmw
in #948
* Add benchmark pipeline with Rust-native A/B validation by
@YuanyuanTian-hh in #912
* Remove unnecessary `Default` bound from `Neighbor`'s `VectorIdType` by
@doliawu in #956
* Replace `AlignedBoxWithSlice` with plain `Vec` / `Matrix` where
alignment is unused by @wuw92 in
#955
* [minmax] 8-bit benchmark by @arkrishn94 in
#959
* Add `MultiInsertStrategy` implementations for `BfTreeProvider` by
@hildebrandmw in #949
* Replace `AlignedBoxWithSlice` with `Vec` in PQScratch and disk fp
vector caches by @wuw92 in #960
* Adding unit tests for paged_search by @JordanMaples in
#962
* Remove AlignedBoxWithSlice wrapper and add alias to Poly<[T],
AlignedAllocator> by @JordanMaples in
#965
* Remove synthetic/structured data generation from diskann-providers by
@JordanMaples in #963
* added tests and some baselines for range_search by @JordanMaples in
#961

## New Contributors
* @JordanMaples made their first contribution in
#923
* @wuw92 made their first contribution in
#950
* @doliawu made their first contribution in
#951
* @YuanyuanTian-hh made their first contribution in
#912

**Full Changelog**:
v0.50.0...v0.50.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove GraphDataType from diskann-providers and relocate to diskann-disk

5 participants