Skip to content

Conversation

ibhati
Copy link
Member

@ibhati ibhati commented Sep 18, 2025

  • Add support for IVF search with SQData
  • Add Python tests for IVF
  • Enable IVF tests in CI

@ibhati ibhati changed the title IVF follow-up to add compressed vector search Add support for compressed vector search to IVF Sep 23, 2025
@ibhati ibhati marked this pull request as ready for review September 23, 2025 17:21
@ibhati ibhati marked this pull request as draft September 23, 2025 21:59
@ibhati ibhati marked this pull request as ready for review September 23, 2025 23:29
@ahuber21 ahuber21 requested a review from Copilot September 25, 2025 08:22
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive support for compressed vector search to the IVF (Inverted File Index) system. The main purpose is to enable IVF search with scalar quantized data (SQData) and establish a complete testing infrastructure for IVF functionality.

Key changes:

  • Implemented compressed vector search support for IVF with scalar quantization
  • Added comprehensive Python test suite for IVF functionality
  • Enabled IVF tests in CI pipeline with matrix builds

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/integration/ivf/scalar_search.cpp New C++ integration test for IVF scalar quantization search
tests/integration/ivf/index_build.cpp Temporarily disabled problematic thread tests
tests/CMakeLists.txt Added scalar search test to build
include/svs/quantization/scalar/scalar.h Added setter methods and type alias for SQDataset
include/svs/extensions/ivf/scalar.h New extension implementing IVF support for scalar quantized data
data/test_dataset/reference/ivf_reference.toml Updated reference timestamps
bindings/python/tests/test_vamana.py Fixed semicolon style issues
bindings/python/tests/test_ivf.py New comprehensive Python test suite for IVF
bindings/python/tests/common.py Added IVF test configuration constants
bindings/python/src/ivf.cpp Changed default hierarchical parameter to true
bindings/python/setup.py Enabled IVF experimental features in build
bindings/python/include/svs/python/ivf.h Simplified specialization patterns
bindings/python/CMakeLists.txt Removed extra whitespace
benchmark/include/svs-benchmark/ivf/test.h Removed unused LeanVec-specific members
.github/workflows/build-linux.yml Added IVF matrix testing to CI

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@ahuber21 ahuber21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting as a comment without explicit approval or change request, as I will be offline for a few days soon.
Please address the points in my comments, then feel free to merge. Thanks!

# Build from file loader
loader = svs.VectorDataLoader(test_data_svs, svs.DataType.float32)
matcher = UncompressedMatcher("bfloat16")
self._test_build(loader, svs.DistanceType.L2, matcher)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have a dedicated test for each distance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, only L2 and MIP is supported in IVF right now. Check here the corresponding Vamana tests: https://github.com/intel/ScalableVectorSearch/blob/main/bindings/python/tests/test_vamana.py#L377

)

for expected in expected_results:
n_probes, k_reorder, k, nq, expected_recall = \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this auto-formatting with \? Is this configured in our formatter? That's ancient, we should update those rules.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have formatting for Python code in the library

size_t new_size,
const Alloc& SVS_UNUSED(allocator)
) {
auto new_sqdata = SQDataset<typename Data::element_type, Data::extent>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding: We create an empty SQDataset of new_size x "old" dimensions with same scale and bias?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's correct

using value_type = const_value_type;

// Data wrapped in the library allocator.
using lib_alloc_data_type = SQDataset<T, Extent, lib::Allocator<T>>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used anywhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// With 4 inner threads
test_build<svs::BFloat16>(svs::DistanceL2(), 4);
test_build<svs::BFloat16>(svs::DistanceIP(), 4);
// TBD: CI is not happy with this, investigate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be fixed before merge? Or follow-up?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will keep the comment for now, it passes on my local machine but not on the CI machines.

@ibhati ibhati merged commit 0c588d7 into main Sep 25, 2025
14 checks passed
@ibhati ibhati deleted the ib/ivf-2 branch September 25, 2025 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants