Add deserialization vector limit enforcement for ResidualCoarseQuantizer by scsiguy · Pull Request #4997 · facebookresearch/faiss

scsiguy · 2026-03-27T00:42:33Z

Summary:
Collectively validate ntotal, beam_factor, and max_beam_size during
deserialization to ensure allocation limits are honored, even when
those allocations occur in search context (ResidualCoarseQuantizer::search()).

Specifically, search() allocates O(beam_size * M * n) int32_t elements where
beam_size is derived from min(k * beam_factor, ntotal). With an adversarial
ntotal of 2^60 or a beam_factor of 1e10, these allocations exceed max_size()
and throw an uncaught std::length_error, terminating the process.

This diff adds three layers of protection:

read_ResidualQuantizer(): Validates max_beam_size > 0 and that
max_beam_size * M * sizeof(int32_t) fits within the configurable
deserialization vector byte limit.
ResidualCoarseQuantizer ("ImRQ") deserialization path:
Validates beam_factor <= 1000 to prevent int overflow in beam_size
computation, and validates ntotal * M * sizeof(int32_t) against the
byte limit since search() can allocate O(ntotal * M) when beam_size
is capped to ntotal.
Why 1000? The default beam_size is 4 (expand candidate serarch by 4x).
A value of even 1000 makes no practical sense as in most cases it will
be equivalent to an exhaustive search (beam_size = -1), but is sufficient
to prevent overflow.
ResidualCoarseQuantizer::search():
Replaces bare arithmetic with mul_no_overflow() for the codes and
beam_distances allocations, so any overflow throws a catchable
FaissException instead of silently wrapping and causing undefined
behavior.

Differential Revision: D98169828

Summary: Collectively validate ntotal, beam_factor, and max_beam_size during deserialization to ensure allocation limits are honored, even when those allocations occur in search context (ResidualCoarseQuantizer::search()). Specifically, search() allocates O(beam_size * M * n) int32_t elements where beam_size is derived from min(k * beam_factor, ntotal). With an adversarial ntotal of 2^60 or a beam_factor of 1e10, these allocations exceed max_size() and throw an uncaught std::length_error, terminating the process. This diff adds three layers of protection: 1. read_ResidualQuantizer(): Validates max_beam_size > 0 and that max_beam_size * M * sizeof(int32_t) fits within the configurable deserialization vector byte limit. 2. ResidualCoarseQuantizer ("ImRQ") deserialization path: Validates beam_factor <= 1000 to prevent int overflow in beam_size computation, and validates ntotal * M * sizeof(int32_t) against the byte limit since search() can allocate O(ntotal * M) when beam_size is capped to ntotal. Why 1000? The default beam_size is 4 (expand candidate serarch by 4x). A value of even 1000 makes no practical sense as in most cases it will be equivalent to an exhaustive search (beam_size = -1), but is sufficient to prevent overflow. 3. ResidualCoarseQuantizer::search(): Replaces bare arithmetic with mul_no_overflow() for the codes and beam_distances allocations, so any overflow throws a catchable FaissException instead of silently wrapping and causing undefined behavior. Differential Revision: D98169828

meta-codesync · 2026-03-27T00:42:40Z

@scsiguy has exported this pull request. If you are a Meta employee, you can view the originating Diff in D98169828.

meta-codesync · 2026-03-29T22:41:12Z

This pull request has been merged in 18c85e1.

meta-cla Bot added the CLA Signed label Mar 27, 2026

meta-codesync Bot added fb-exported meta-exported labels Mar 27, 2026

meta-codesync Bot closed this in 18c85e1 Mar 29, 2026

facebook-github-tools Bot added the Merged label Mar 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add deserialization vector limit enforcement for ResidualCoarseQuantizer#4997

Add deserialization vector limit enforcement for ResidualCoarseQuantizer#4997
scsiguy wants to merge 1 commit into
facebookresearch:mainfrom
scsiguy:export-D98169828

scsiguy commented Mar 27, 2026

Uh oh!

meta-codesync Bot commented Mar 27, 2026

Uh oh!

meta-codesync Bot commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

scsiguy commented Mar 27, 2026

Uh oh!

meta-codesync Bot commented Mar 27, 2026

Uh oh!

meta-codesync Bot commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant