Skip to content

Conversation

@ahuber21
Copy link
Contributor

@ahuber21 ahuber21 commented Dec 9, 2025

The previous code always performed a full width load on the provided data. In ragged-epilogue scenarios, where we request a masked load, this resulted in SEGV errors in certain runs with address sanitizer.

    if (i < count.size()) {
        auto mask = create_mask<simd_width>(count);
        s0 = op.accumulate(mask, s0, op.load_a(mask, a + i), op.load_b(mask, b + i));
    }

Why wasn't this caught sooner?

The OS only triggers a segmentation fault if a read accesses an unmapped memory page. Since memory protection (typically) operates at a 4KB page granularity, reading past the end of a buffer is "safe" from the OS's perspective unless the overflow happens to cross exactly into an unmapped page.

Why is ASan catching it sporadically?

Since our underlying object storage is std::vector, ASan detection requires two specific conditions to align:

  • No Spare Capacity: The vector's size() must equal its capacity(). If there is spare capacity, the unsafe load simply reads valid (though uninitialized) memory owned by the vector.
  • Alignment & Redzones: The underlying heap allocation must be sized and aligned such that the full-width SIMD read (e.g., 32 bytes) actually crosses the allocation boundary into the ASan redzone. If the allocator adds padding for alignment, the read might land in that valid padding instead.

@ahuber21 ahuber21 requested a review from ibhati as a code owner December 9, 2025 10:14
@ahuber21 ahuber21 removed the request for review from homksei December 9, 2025 15:09
@ahuber21 ahuber21 requested a review from ibhati December 10, 2025 09:50
@ahuber21
Copy link
Contributor Author

@copilot summarize the changes in this PR

Copy link
Contributor

Copilot AI commented Dec 10, 2025

@ahuber21 I've opened a new pull request, #242, to work on those changes. Once the pull request is ready, I'll request review from you.

@ahuber21 ahuber21 requested a review from Copilot December 10, 2025 16:33
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical SIMD memory safety bug in AVX2 masked load operations that could cause segmentation faults when reading beyond allocated memory boundaries. The fix ensures masked loads respect buffer boundaries in ragged-epilogue scenarios.

Key Changes:

  • Fixed AVX2 masked load implementation to prevent out-of-bounds reads
  • Added comprehensive ASan-detected regression tests for distance computations
  • Enhanced CI with dedicated AddressSanitizer build configuration

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/svs/lib/avx_detection.cpp Added test for runtime AVX flag patching mechanism
tests/svs/core/distance.cpp Added ASan regression tests for distance computation with ragged epilogues
tests/CMakeLists.txt Updated Catch2 to v3.11.0 and improved test discovery with tag-based labels
.github/workflows/build-linux.yml Added clang++-18 ASan build configuration with leak detection disabled
tests/svs/index/vamana/multi.cpp Tagged test as long-running to exclude from ASan builds
tests/svs/index/vamana/index.cpp Tagged test as long-running to exclude from ASan builds
tests/svs/index/inverted/memory_based.cpp Tagged test as long-running to exclude from ASan builds
tests/svs/index/inverted/clustering.cpp Tagged test as long-running to exclude from ASan builds

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ahuber21 and others added 4 commits December 10, 2025 09:01
This reverts commit f856a96.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Member

@ibhati ibhati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new workflow is added in the existing workflows, please make sure the earlier flow and options did not change.

} // namespace

CATCH_TEST_CASE("Random Clustering - End to End", "[inverted][random_clustering]") {
CATCH_TEST_CASE("Random Clustering - End to End", "[long][inverted][random_clustering]") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this [long]? Why is this required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, had a comment that I didn't post. I added a new tag [long] that marks long-running tests. They are skipped in the debug asan build.

# skip longer-running tests
ctest_args: "-LE long"

rfsaliev added a commit to RedisAI/VectorSimilarity that referenced this pull request Dec 10, 2025
Copy link
Member

@ethanglaser ethanglaser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know how long the ASan run takes if completing successfully?

ahuber21 added a commit that referenced this pull request Dec 11, 2025
…kip (#241)

ASan will be added in #239. It flags `SearchBuffer::can_skip`. Here we
reorder the logic to check `full()` before accessing `back()`. Accessing
`back()` on an empty buffer caused an index underflow (`SIZE_MAX`).
@ahuber21
Copy link
Contributor Author

Do we know how long the ASan run takes if completing successfully?

@ethanglaser about 10 minutes total, < 5 mins of which is testing.

@ahuber21
Copy link
Contributor Author

The new workflow is added in the existing workflows, please make sure the earlier flow and options did not change.

@ibhati the only change is that the existing steps are executing three more tests (with negligible runtime).

@ahuber21 ahuber21 merged commit 724ac33 into main Dec 11, 2025
15 checks passed
@ahuber21 ahuber21 deleted the dev/fix-unmasked-read branch December 11, 2025 10:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants