Skip to content

Cleanup diskann-benchmark-runner and friends.#865

Open
hildebrandmw wants to merge 2 commits intomainfrom
mhildebr/simplify-benchmark
Open

Cleanup diskann-benchmark-runner and friends.#865
hildebrandmw wants to merge 2 commits intomainfrom
mhildebr/simplify-benchmark

Conversation

@hildebrandmw
Copy link
Copy Markdown
Contributor

Simplify benchmark running and registration by:

  • Moving the Input away from a dyn-compatible trait to a simple trait with methods that don't receive by &self. That is, instead of

    trait Input {
        fn tag(&self) -> &'static str;
        fn try_deserialize(&self, ...);
        fn example(&self);
    }

    we get

    trait Input {
        fn tag() -> &'static str;
        fn try_deserialize(...);
        fn example();
    }

    The original pattern required an awkward proxy type for input registration (see the Input type in diskann-benchmark). This new pattern allows the actual input structs to be used directly.

  • Ditching the whole overly engineered dispatcher (which is cool, but inappropriate for this task) and instead introducing a Benchmark trait with the signature (see the code for doc-comments)

    pub trait Benchmark {
        type Input: Input + 'static;
        type Output: Serialize;
    
        fn try_match(input: &Self::Input) -> Result<MatchScore, FailureScore>;
    
        fn description(
            f: &mut std::fmt::Formatter<'_>,
            input: Option<&Self::Input>,
        ) -> std::fmt::Result;
    
        fn run(
            input: &Self::Input,
            checkpoint: Checkpoint<'_>,
            output: &mut dyn Output,
        ) -> anyhow::Result<Self::Output>;
    }

    The functions try_match and description match the original DispatchRule technique, but instead of relying on a convert method and a corresponding closure - we just run the benchmark directly.

    This almost entirely removes the need for the various register! macros as registration is significantly more straightforward. Additionally, the input and output types are concrete!

These changes remove a lot of boilerplate from diskann-benchmark. No need to match on the Any type anymore as this happens automatically. While updating diskann-benchmark, I was able to replace some of the macros for stamping out benchmarks with generics. Scalar and spherically-quantized index builds are still reliant on macros.

Finally, the backend of diskann-benchmark-runner (namely registry::Benchmarks) is cleaned up to not rely on the complicated dispatcher machinery for matching inputs and providing mismatch diagnostics. Instead, the logic is greatly simplified and just implemented inline.

Why

While reviewing #857, I was pondering what it would take to add A/B comparison logic directly to our benchmark framework. My thinking is that it would be nice to have something like

trait Compare: Benchmark {
    type Tolerance: Input;
    fn compare(
        tolerances: &Self::Tolerance,
        before: &Self::Output,
        after: &Self::Output,
    ) -> SomeResult;
}

Benchmarks that want A/B comparisons could implement Compare and use a registry::Benchmarks::register_compared method to register a benchmark that participates in A/B comparison. The backend of diskann-benchmark-runner could then use this to inspect the benchmarks that support comparison and this becomes an engine for A/B testing with concretely typed outputs!

The problem, of course, is the current way of benchmark registration is a little overly complicated and makes this kind of extension way too messy. So view this PR as serving two purposes. First, we clean up an overly complicated portion of benchmarking and second, we enable evolution of the benchmarking framework.

Suggested Reviewing Order

I would actually recommend looking at the changes in diskann-benchmark before diskann-benchmark-runner. The amount of code that got chopped is quite satisfying. The changes here aren't perfect (the use of the 'static lifetime for benchmarks is a remnant of the DispatchRule approach and fully cleaning that up would be a larger change).

Then, inside diskann-benchmark-runner, I suggest:

  • input.rs: New home of the Input trait plus the machinery for dynamically dispatching to inputs.
  • benchmark.rs: The new benchmark trait (plus associated dynamic dispatch machinery).
  • registry.rs: Updates to the registry to reflect the new traits. In particular, Benchmarks is updated so the matching logic is much clear (and not hidden behind a dispatcher). I expanded the tests a little and added a few more baselines to exercise the matching logic more.
  • dispatcher/: This directory has largely been made obsolete. The DispatchRule trait is preserved because it still has some utility in matching data types and composes somewhat with the Benchmark trait. We could probably get rid of this directory entirely in the future, but for now I'm preserving it for the sake of a less insane diff.

Feel free to skip diskann-benchmark-simd.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the benchmark framework to simplify input/benchmark registration and remove the previously generic multi-dispatch “dispatcher” machinery from the runner, replacing it with a Benchmark trait + type-erased wrappers and a simpler registry selection/mismatch diagnostic flow.

Changes:

  • Convert Input to a static-method trait and update input registration to be type-driven (register::<T>()).
  • Introduce Benchmark + DynBenchmark wrapper and simplify benchmark selection/debugging in registry::Benchmarks.
  • Migrate existing benchmarks (DiskANN benchmark crates + simd) and update/add golden tests for matching/mismatch behavior.

Reviewed changes

Copilot reviewed 41 out of 41 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
diskann-benchmark/src/utils/mod.rs Updates stub_impl! to register stub benchmarks via the new Benchmark trait API.
diskann-benchmark/src/inputs/mod.rs Removes proxy Input<T> helper and implements runner Input directly for input structs via macro.
diskann-benchmark/src/inputs/filters.rs Updates input registration to registry.register::<T>().
diskann-benchmark/src/inputs/exhaustive.rs Updates input registration to registry.register::<T>().
diskann-benchmark/src/inputs/disk.rs Updates input registration to registry.register::<T>().
diskann-benchmark/src/inputs/async_.rs Updates input registration to registry.register::<T>().
diskann-benchmark/src/backend/index/spherical.rs Migrates spherical-quant benchmarks to Benchmark trait and simplifies registration.
diskann-benchmark/src/backend/index/scalar.rs Migrates scalar-quant benchmarks to Benchmark trait and simplifies registration.
diskann-benchmark/src/backend/index/product.rs Migrates product-quant benchmarks to Benchmark trait and reduces macro usage.
diskann-benchmark/src/backend/index/benchmarks.rs Removes old registration macros and migrates full-precision/dynamic benchmarks to Benchmark.
diskann-benchmark/src/backend/filters/benchmark.rs Migrates metadata-index filter benchmark to Benchmark.
diskann-benchmark/src/backend/exhaustive/spherical.rs Migrates exhaustive spherical benchmark to Benchmark.
diskann-benchmark/src/backend/exhaustive/product.rs Migrates exhaustive PQ benchmark to Benchmark.
diskann-benchmark/src/backend/exhaustive/minmax.rs Migrates exhaustive minmax benchmark to Benchmark.
diskann-benchmark/src/backend/disk_index/benchmarks.rs Migrates disk-index benchmarks to Benchmark and simplifies registration.
diskann-benchmark-simd/src/lib.rs Migrates simd benchmarks and input to new Input/Benchmark APIs; removes old adapter types.
diskann-benchmark-simd/src/bin.rs Updates simd binary to register SimdOp input via register::<T>().
diskann-benchmark-runner/tests/test-overload-0/stdout.txt Adds golden output for overload resolution (best MatchScore wins).
diskann-benchmark-runner/tests/test-overload-0/stdin.txt Adds runner invocation fixture for overload resolution test.
diskann-benchmark-runner/tests/test-overload-0/output.json Adds expected JSON output for overload resolution test.
diskann-benchmark-runner/tests/test-overload-0/input.json Adds input file fixture for overload resolution test.
diskann-benchmark-runner/tests/test-overload-0/README.md Documents overload-resolution expectations for the new matching logic.
diskann-benchmark-runner/tests/test-mismatch-1/stdout.txt Adds golden output for mismatch diagnostics (new description paths).
diskann-benchmark-runner/tests/test-mismatch-1/stdin.txt Adds runner invocation fixture for mismatch test.
diskann-benchmark-runner/tests/test-mismatch-1/input.json Adds input file fixture for mismatch test.
diskann-benchmark-runner/tests/test-mismatch-1/README.md Documents mismatch diagnostic expectations.
diskann-benchmark-runner/tests/test-mismatch-0/stdout.txt Updates mismatch diagnostics ordering/content to match new implementation.
diskann-benchmark-runner/tests/test-4/stdout.txt Updates benchmark listing output to reflect new description-based listing.
diskann-benchmark-runner/src/utils/datatype.rs Removes dispatcher::Map usage; keeps dispatch-rule-based matching utilities.
diskann-benchmark-runner/src/test.rs Updates test inputs/benchmarks to new Input/Benchmark APIs; adds exact-match benchmark.
diskann-benchmark-runner/src/result.rs Removes Checkpoint::empty() and its test now that registry no longer needs it.
diskann-benchmark-runner/src/registry.rs Replaces dispatcher-backed benchmark registry with a simpler vector of type-erased benchmarks + scoring logic.
diskann-benchmark-runner/src/lib.rs Moves Input into its own module and re-exports Benchmark/Input.
diskann-benchmark-runner/src/jobs.rs Switches from &dyn Input to input::Registered wrapper for formatting/examples.
diskann-benchmark-runner/src/input.rs New static-method Input trait plus type-erased wrapper for dynamic registry storage.
diskann-benchmark-runner/src/dispatcher/mod.rs Shrinks dispatcher module to “dispatch rules” only; removes dynamic dispatcher API/docs/tests.
diskann-benchmark-runner/src/dispatcher/examples.rs Deletes now-obsolete examples used by the removed dispatcher docs.
diskann-benchmark-runner/src/dispatcher/dispatch.rs Deletes the generic multi-dispatch implementation.
diskann-benchmark-runner/src/dispatcher/api.rs Makes score types Eq/Ord and removes Map/dispatcher-related APIs.
diskann-benchmark-runner/src/benchmark.rs New Benchmark trait plus DynBenchmark wrapper for type-erased storage and JSON serialization.
diskann-benchmark-runner/src/app.rs Updates benchmark listing and mismatch reporting to use the new registry APIs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

2. "type-bench-i8": expected "int8" but found "float16"
3. "exact-type-bench-f32-1000": expected "float32" but found "float16"

could not find find a benchmark for all inputs No newline at end of file
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stdout fixture includes the message "could not find find a benchmark for all inputs" (duplicated "find"). If this is meant to be user-facing CLI output, consider fixing the typo in the application error message and updating this golden output accordingly.

Suggested change
could not find find a benchmark for all inputs
could not find a benchmark for all inputs

Copilot uses AI. Check for mistakes.
for (name, method) in benchmarks.methods() {
writeln!(output, " {}: {}", name, method.signatures()[0])?;
for (name, description) in benchmarks.names() {
writeln!(output, " {}: {}", name, description)?;
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

description can be multi-line (e.g., DynBenchmark::description writes a newline after the tag). Printing it inline with " {}: {}" causes subsequent lines to lose indentation and makes benchmarks output hard to read. Consider either making Benchmarks::names() return a single-line description, or splitting/indenting all lines when printing.

Suggested change
writeln!(output, " {}: {}", name, description)?;
let mut lines = description.lines();
if let Some(first) = lines.next() {
writeln!(output, " {}: {}", name, first)?;
for line in lines {
writeln!(output, " {}", line)?;
}
}

Copilot uses AI. Check for mistakes.
Type::<T>::try_match(&from.data_type)
fn try_match(input: &TypeInput) -> Result<MatchScore, FailureScore> {
// Try to match based on data type.
// Add a small penalty to `ExactTypeBench` can be more specific if it hits.
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment reads "Add a small penalty to ExactTypeBench can be more specific if it hits." This looks like a grammatical mix-up and also seems to describe the opposite of what the code does (the penalty is applied to TypeBench so ExactTypeBench wins when both match). Consider rewording for clarity.

Suggested change
// Add a small penalty to `ExactTypeBench` can be more specific if it hits.
// Add a small penalty to `TypeBench` so `ExactTypeBench` is preferred when both match.

Copilot uses AI. Check for mistakes.
Comment on lines +35 to +39
/// Register `input` in the registry.
///
/// Returns an error if any other input with the same [`Input::tag()`] has been registered
/// while leaving the underlying registry unchanged.
pub fn register<T>(&mut self, input: T) -> anyhow::Result<()>
pub fn register<T>(&mut self) -> anyhow::Result<()>
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inputs::register no longer takes an input value, but the doc comment still says "Register input in the registry". This is misleading now that registration is type-driven; update the wording to reflect register::<T>() (or reintroduce a parameter if that was intended).

Copilot uses AI. Check for mistakes.
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 26, 2026

Codecov Report

❌ Patch coverage is 75.36585% with 101 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.45%. Comparing base (b616aa3) to head (2e49c6e).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
diskann-benchmark/src/backend/index/benchmarks.rs 46.72% 65 Missing ⚠️
diskann-benchmark-runner/src/input.rs 78.37% 8 Missing ⚠️
diskann-benchmark-simd/src/lib.rs 73.07% 7 Missing ⚠️
diskann-benchmark/src/utils/mod.rs 45.45% 6 Missing ⚠️
diskann-benchmark-runner/src/dispatcher/mod.rs 33.33% 4 Missing ⚠️
diskann-benchmark-runner/src/registry.rs 93.54% 4 Missing ⚠️
diskann-benchmark-runner/src/test.rs 96.15% 3 Missing ⚠️
diskann-benchmark-runner/src/app.rs 88.88% 1 Missing ⚠️
diskann-benchmark-runner/src/benchmark.rs 97.22% 1 Missing ⚠️
diskann-benchmark/src/backend/filters/benchmark.rs 88.88% 1 Missing ⚠️
... and 1 more
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##             main     #865    +/-   ##
========================================
  Coverage   90.45%   90.45%            
========================================
  Files         442      442            
  Lines       83248    82919   -329     
========================================
- Hits        75301    75006   -295     
+ Misses       7947     7913    -34     
Flag Coverage Δ
miri 90.45% <75.36%> (+<0.01%) ⬆️
unittests 90.41% <75.36%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-benchmark-runner/src/dispatcher/api.rs 100.00% <ø> (ø)
diskann-benchmark-runner/src/jobs.rs 96.49% <100.00%> (ø)
diskann-benchmark-runner/src/result.rs 97.81% <ø> (-1.14%) ⬇️
diskann-benchmark-runner/src/utils/datatype.rs 98.98% <ø> (ø)
diskann-benchmark-simd/src/bin.rs 81.57% <100.00%> (ø)
diskann-benchmark/src/backend/exhaustive/minmax.rs 100.00% <ø> (ø)
...iskann-benchmark/src/backend/exhaustive/product.rs 100.00% <ø> (ø)
...kann-benchmark/src/backend/exhaustive/spherical.rs 100.00% <ø> (ø)
diskann-benchmark/src/backend/index/product.rs 100.00% <ø> (ø)
diskann-benchmark/src/backend/index/scalar.rs 100.00% <ø> (ø)
... and 16 more

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@arkrishn94 arkrishn94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mark, I love the simplification this brings to defining inputs to benchmark and running them.

Only one documentation related question - do I recall a README in diskann-benchmark that needs updating? Or am I misremembering?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants