Skip to content

Fix IDSelectorBitmap conversion to cuVS bitset (#5211)#5211

Closed
mnorris11 wants to merge 1 commit into
facebookresearch:mainfrom
mnorris11:export-D104935985
Closed

Fix IDSelectorBitmap conversion to cuVS bitset (#5211)#5211
mnorris11 wants to merge 1 commit into
facebookresearch:mainfrom
mnorris11:export-D104935985

Conversation

@mnorris11
Copy link
Copy Markdown
Contributor

@mnorris11 mnorris11 commented May 13, 2026

Summary:

convert_to_bitset_bitmap in CuvsFilterConvert.cu assumed that IDSelectorBitmap.n was the bit/vector count, but it can also be the byte count of the bitmap array (which is how the Python wrapper constructs it via faiss.IDSelectorBitmap(numpy_array)). This caused two bugs:

  1. The RAFT_EXPECTS(bitset.size() == n) check failed when n was the byte count (e.g., 6.25M bytes vs 50M bits), throwing a raft::logic_error that propagated up and caused filtered search to silently fall back to unfiltered results.

  2. n_elements = (selector.n + 7) / 8 divided the byte count by 8 again, copying only 1/8 of the bitmap data when n was already the byte count.

The fix handles both conventions by computing the actual number of bytes to copy as min(n, ceil(bitset_size / 8)). This works regardless of whether n represents bytes or bits:

  • Python wrapper (n = byte count): copies all n bytes
  • C++ convention (n = bit count): copies ceil(bit_count / 8) bytes (the minimum needed)

Reviewed By: pankajsingh88

Differential Revision: D104935985

@meta-cla meta-cla Bot added the CLA Signed label May 13, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented May 13, 2026

@mnorris11 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D104935985.

@mnorris11 mnorris11 marked this pull request as draft May 13, 2026 16:35
@mnorris11 mnorris11 force-pushed the export-D104935985 branch from 307982d to fd9b1c4 Compare May 13, 2026 17:24
@meta-codesync meta-codesync Bot changed the title Fix IDSelectorBitmap conversion to cuVS bitset Fix IDSelectorBitmap conversion to cuVS bitset (#5211) May 14, 2026
@mnorris11 mnorris11 force-pushed the export-D104935985 branch from fd9b1c4 to 130b667 Compare May 14, 2026 17:23
mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request May 14, 2026
Summary:

`convert_to_bitset_bitmap` in `CuvsFilterConvert.cu` assumed that `IDSelectorBitmap.n` was the bit/vector count, but it can also be the byte count of the bitmap array (which is how the Python wrapper constructs it via `faiss.IDSelectorBitmap(numpy_array)`). This caused two bugs:

1. The `RAFT_EXPECTS(bitset.size() == n)` check failed when `n` was the byte count (e.g., 6.25M bytes vs 50M bits), throwing a `raft::logic_error` that propagated up and caused filtered search to silently fall back to unfiltered results.

2. `n_elements = (selector.n + 7) / 8` divided the byte count by 8 again, copying only 1/8 of the bitmap data when `n` was already the byte count.

The fix handles both conventions by computing the actual number of bytes to copy as `min(n, ceil(bitset_size / 8))`. This works regardless of whether `n` represents bytes or bits:
- Python wrapper (n = byte count): copies all n bytes
- C++ convention (n = bit count): copies ceil(bit_count / 8) bytes (the minimum needed)

Differential Revision: D104935985
@mnorris11 mnorris11 marked this pull request as ready for review May 14, 2026 18:00
mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request May 14, 2026
Summary:

`convert_to_bitset_bitmap` in `CuvsFilterConvert.cu` assumed that `IDSelectorBitmap.n` was the bit/vector count, but it can also be the byte count of the bitmap array (which is how the Python wrapper constructs it via `faiss.IDSelectorBitmap(numpy_array)`). This caused two bugs:

1. The `RAFT_EXPECTS(bitset.size() == n)` check failed when `n` was the byte count (e.g., 6.25M bytes vs 50M bits), throwing a `raft::logic_error` that propagated up and caused filtered search to silently fall back to unfiltered results.

2. `n_elements = (selector.n + 7) / 8` divided the byte count by 8 again, copying only 1/8 of the bitmap data when `n` was already the byte count.

The fix handles both conventions by computing the actual number of bytes to copy as `min(n, ceil(bitset_size / 8))`. This works regardless of whether `n` represents bytes or bits:
- Python wrapper (n = byte count): copies all n bytes
- C++ convention (n = bit count): copies ceil(bit_count / 8) bytes (the minimum needed)

Differential Revision: D104935985
@mnorris11 mnorris11 force-pushed the export-D104935985 branch from 130b667 to 20e1a9b Compare May 14, 2026 18:07
@mnorris11
Copy link
Copy Markdown
Contributor Author

@tarang-jain @lowener Does this change look right?

Summary:

`convert_to_bitset_bitmap` in `CuvsFilterConvert.cu` assumed that `IDSelectorBitmap.n` was the bit/vector count, but it can also be the byte count of the bitmap array (which is how the Python wrapper constructs it via `faiss.IDSelectorBitmap(numpy_array)`). This caused two bugs:

1. The `RAFT_EXPECTS(bitset.size() == n)` check failed when `n` was the byte count (e.g., 6.25M bytes vs 50M bits), throwing a `raft::logic_error` that propagated up and caused filtered search to silently fall back to unfiltered results.

2. `n_elements = (selector.n + 7) / 8` divided the byte count by 8 again, copying only 1/8 of the bitmap data when `n` was already the byte count.

The fix handles both conventions by computing the actual number of bytes to copy as `min(n, ceil(bitset_size / 8))`. This works regardless of whether `n` represents bytes or bits:
- Python wrapper (n = byte count): copies all n bytes
- C++ convention (n = bit count): copies ceil(bit_count / 8) bytes (the minimum needed)

Reviewed By: pankajsingh88

Differential Revision: D104935985
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented May 15, 2026

This pull request has been merged in 172324a.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant