Fix IDSelector memory leak with SWIG, add unit tests (#3704) by mnorris11 · Pull Request #3810 · facebookresearch/faiss

mnorris11 · 2024-08-29T20:59:21Z

Summary:

Background

Prior attempted fix: https://github.com/facebookresearch/faiss/pull/3007/files#diff-d704f33c46a4ef2936c8cf2a66b05f1993e25e79ee5c19d4b63c4e0cf46b0a42

Partially copied from #3139 with an additional unit test.

It is a confirmed and reproducible memory leak every time. There is a workaround. See the comments on #2996.

Relevant SWIG docs: https://www.swig.org/Doc4.1/SWIGDocumentation.html#Typemaps_nn2

Current status

buck test faiss/tests:test_search_params -- test_ownership_2

Test prints:

without SWIG fix:
[0.49, 0.82, 1.15, 1.39, 1.67, 1.88, 2.16, 2.36, 2.65, 2.85, 3.13, 3.34, 3.62, 3.82, 4.11, 4.31, 4.6, 4.8, 5.08, 5.28]

with SWIG fix:
[0.52, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71]

Note: This test is not ideal. Ideally we could enable ASAN to catch these memory leaks. But ASAN does not seem to catch these Python memory leaks. Example:

def test_ownership_3(self):
        d = 32
        quantizer = faiss.IndexFlatL2(d)
        quantizer.this.own(False)

The above test produces no ASAN error, even though the quantizer object leaks.

Differential Revision: D61992599

facebook-github-bot · 2024-08-29T20:59:47Z

This pull request was exported from Phabricator. Differential Revision: D61992599

…h#3810) Summary: Pull Request resolved: facebookresearch#3810 Pull Request resolved: facebookresearch#3704 Background -- Issue: facebookresearch#2996 Prior attempted fix: https://github.com/facebookresearch/faiss/pull/3007/files#diff-d704f33c46a4ef2936c8cf2a66b05f1993e25e79ee5c19d4b63c4e0cf46b0a42 Partially copied from facebookresearch#3139 with an additional unit test. It is a confirmed and reproducible memory leak every time. There is a workaround. See the comments on facebookresearch#2996. Relevant SWIG docs: https://www.swig.org/Doc4.1/SWIGDocumentation.html#Typemaps_nn2 Current status -- `buck test faiss/tests:test_search_params -- test_ownership_2` Test prints: without SWIG fix: `[0.49, 0.82, 1.15, 1.39, 1.67, 1.88, 2.16, 2.36, 2.65, 2.85, 3.13, 3.34, 3.62, 3.82, 4.11, 4.31, 4.6, 4.8, 5.08, 5.28]` with SWIG fix: `[0.52, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71]` Note: This test is not ideal. Ideally we could enable ASAN to catch these memory leaks. But ASAN does not seem to catch these Python memory leaks. Example: ``` def test_ownership_3(self): d = 32 quantizer = faiss.IndexFlatL2(d) quantizer.this.own(False) ``` The above test produces no ASAN error, even though the `quantizer` object leaks. Differential Revision: D61992599

facebook-github-bot · 2024-08-29T22:51:08Z

This pull request was exported from Phabricator. Differential Revision: D61992599

mnorris11 · 2024-09-02T23:41:41Z

Looking into errors.

facebook-github-bot · 2024-09-12T15:49:09Z

This pull request was exported from Phabricator. Differential Revision: D61992599

…h#3810) Summary: Pull Request resolved: facebookresearch#3810 Pull Request resolved: facebookresearch#3704 Background -- Issue: facebookresearch#2996 Prior attempted fix: https://github.com/facebookresearch/faiss/pull/3007/files#diff-d704f33c46a4ef2936c8cf2a66b05f1993e25e79ee5c19d4b63c4e0cf46b0a42 Partially copied from facebookresearch#3139 with an additional unit test. It is a confirmed and reproducible memory leak every time. There is a workaround. See the comments on facebookresearch#2996. Relevant SWIG docs: https://www.swig.org/Doc4.1/SWIGDocumentation.html#Typemaps_nn2 Current status -- `buck test faiss/tests:test_search_params -- test_ownership_2` Test prints: without SWIG fix: `[0.49, 0.82, 1.15, 1.39, 1.67, 1.88, 2.16, 2.36, 2.65, 2.85, 3.13, 3.34, 3.62, 3.82, 4.11, 4.31, 4.6, 4.8, 5.08, 5.28]` with SWIG fix: `[0.52, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71]` Note: This test is not ideal. Ideally we could enable ASAN to catch these memory leaks. But ASAN does not seem to catch these Python memory leaks. Example: ``` def test_ownership_3(self): d = 32 quantizer = faiss.IndexFlatL2(d) quantizer.this.own(False) ``` The above test produces no ASAN error, even though the `quantizer` object leaks. Why change HNSW test? -- This fix causes the HNSW test to fail with heap-use-after-free. This is because the index.storage.get_distance_computer() somehow gets freed during clone_index, but only when reassigning to the same variable like `index.storage = clone(index.storage)`. I checked in https://fburl.com/code/qw6fznjt, and it is non-null before returning on the CPP side. After adding the temp variable, I also had to set `index.own_fields = False`, otherwise we get a heap-use-after-free again due to it being deleted already. Differential Revision: D61992599

facebook-github-bot · 2024-09-12T21:46:10Z

This pull request was exported from Phabricator. Differential Revision: D61992599

Summary: A user-reported leak (facebookresearch#2996 follow-up) reproduces on faiss 1.14.1 when an `IDSelectorBatch` is assigned to a `SearchParameters` field after construction. Each iteration of the reporter's loop leaks ~160 MB; over a long-running serving process this grows unbounded. # Root cause The Python wrapper for `IDSelectorBatch` carries a SWIG `thisown` flag that controls who calls C++ `delete`. The SWIG-generated property setter for `SearchParameters.sel` flips `thisown` to 0 on the assumption that the enclosing C++ object will take responsibility, but `SearchParameters`'s C++ destructor does not free `sel` (the field is borrowed by design). Result: the C++ `IDSelectorBatch` (a 40 MB sorted `std::vector<idx_t>` for the reporter's case) plus the retained numpy buffer are orphaned forever. `handle_SearchParameters` in `class_wrappers.py` already protected the kwargs construction path (`SearchParameters(sel=x)`) by wrapping the assignment in `RememberSwigOwnership` and `add_to_referenced_objects`. The bare-attribute path was unprotected. # Fix Override `__setattr__` on the base `SearchParameters` class so the same ownership protection runs for both `SearchParameters(sel=x)` and `params.sel = x`. The override filters via `hasattr(v, "thisown")` so only SWIG-wrapped C++ objects go through the dance; SWIG internals like `self.this` (a `SwigPyObject` without `thisown`), bookkeeping lists, and plain Python values bypass it. `replacement_init` is slimmed to a pure delegator: the protection moves into `replacement_setattr` so a single source of truth handles both code paths. Keeping both layers would double-wrap on kwargs construction. The filter changes from a denylist (`type(v) not in (int, float, bool, str)`) to a duck-type check (`hasattr(v, "thisown")`). The new filter is stricter: numpy arrays, lists, and `None` no longer get appended. Safe in practice because `SearchParameters` SWIG fields are typed as C++ pointers and only legitimately accept SWIG wrappers; numpy buffers passed indirectly (e.g., to `IDSelectorBatch`) are kept alive by the inner wrapper's own `add_to_referenced_objects`. The duck-type filter is field-name-agnostic, so other `SearchParameters` subclass fields with the same C++ shape (`Foo* field = nullptr` with no destructor freeing `field`) benefit from the same protection automatically. Examples include `SearchParametersPreTransform.index_params` and `SearchParametersIVF.quantizer_params`. # Per-field ref tracking A naive implementation that uses the existing `add_to_referenced_objects` helper (which appends to a list) leaks under a different access pattern: `params = SearchParameters(); for ...: params.sel = ...` accumulates one entry per reassignment because nothing drops the prior ref. This is a common pattern in long-lived servers that initialize a `SearchParameters` once and rebind `params.sel` per request. Fixed by using a per-field dict `self._sp_field_refs` keyed by attribute name. Reassigning the same field replaces the entry in the dict, so the prior wrapper loses its Python ref and the C++ object is freed. The else branch of `replacement_setattr` also drops any prior ref when the field is set to None or to a non-SWIG value, so explicit clear (`params.sel = None`) releases the C++ object immediately. # Subclass guard `__setattr__` is installed only once per class hierarchy. SWIG generates per-class `__init__` slots but does NOT generate per-class `__setattr__`, so subclasses (`SearchParametersIVF`, `SearchParametersHNSW`, ...) inherit the override via MRO. Without the `_protected_setattr` sentinel, calling `handle_SearchParameters(SearchParametersIVF)` would capture the inherited `replacement_setattr` as `parent_setattr`, then install a second `replacement_setattr` on top - chained calls would fire field tracking twice. The guard uses `getattr` (which walks the MRO) rather than `__dict__` lookup (own-attribute only) so subclasses correctly see the base's marker. `replacement_init` does not need this guard because each SWIG class generates its own `__init__` in its own `__dict__`; wrapping always captures SWIG's fresh init, never an inherited replacement. # Prior attempts and why this approach is targeted This bug has been attempted twice before via the SWIG layer: - facebookresearch#3139 (Apr 2024, abandoned) - facebookresearch#3810 (Sept 2024, changes requested) Both added a `%typemap(in) SWIGTYPE *` in `swigfaiss.swig` that strips the `$disown` flag globally, so SWIG never transfers ownership for any pointer field assignment anywhere in the bindings. The minimal change is one typemap. The trade-off proved untenable. PR 3810 surfaced the smoking gun while testing: `test_graph_based.test_io_no_storage` started crashing with heap-use-after-free at `index.storage = faiss.clone_index(index.storage)` with `own_fields = True`. The author worked around it by introducing a temp variable and flipping `own_fields = False` - but that flip is itself a behavior change to a documented FAISS contract that downstream users build on. The right framing is not "SWIG's default ownership transfer is always wrong" (the typemap assumption) but "SWIG's default ownership transfer is wrong specifically for `SearchParameters` because that class lacks a destructor for its borrowed pointer fields." This diff scopes the change to that class hierarchy; every other SWIG-managed field keeps its existing semantics. No `swigfaiss.swig` change, no surprising downstream breakage. The memory regression test pattern (`np.arange(5_000_000)` × N iterations, measure `faiss.get_mem_usage_kb`) was independently validated by PR 3810's `test_ownership_2` and is reused here. ASAN cannot catch this leak because the C++ object remains reachable from a Python wrapper that simply has no Python-side references - it is a Python-level reference leak, not a C++-level dangling pointer. Differential Revision: D104750178

Summary: Pull Request resolved: #5208 A user-reported leak (#2996 follow-up) reproduces on faiss 1.14.1 when an `IDSelectorBatch` is assigned to a `SearchParameters` field after construction. Each iteration of the reporter's loop leaks ~160 MB; over a long-running serving process this grows unbounded. # Root cause The Python wrapper for `IDSelectorBatch` carries a SWIG `thisown` flag that controls who calls C++ `delete`. The SWIG-generated property setter for `SearchParameters.sel` flips `thisown` to 0 on the assumption that the enclosing C++ object will take responsibility, but `SearchParameters`'s C++ destructor does not free `sel` (the field is borrowed by design). Result: the C++ `IDSelectorBatch` (a 40 MB sorted `std::vector<idx_t>` for the reporter's case) plus the retained numpy buffer are orphaned forever. `handle_SearchParameters` in `class_wrappers.py` already protected the kwargs construction path (`SearchParameters(sel=x)`) by wrapping the assignment in `RememberSwigOwnership` and `add_to_referenced_objects`. The bare-attribute path was unprotected. # Fix Override `__setattr__` on the base `SearchParameters` class so the same ownership protection runs for both `SearchParameters(sel=x)` and `params.sel = x`. The override filters via `hasattr(v, "thisown")` so only SWIG-wrapped C++ objects go through the dance; SWIG internals like `self.this` (a `SwigPyObject` without `thisown`), bookkeeping lists, and plain Python values bypass it. `replacement_init` is slimmed to a pure delegator: the protection moves into `replacement_setattr` so a single source of truth handles both code paths. Keeping both layers would double-wrap on kwargs construction. The filter changes from a denylist (`type(v) not in (int, float, bool, str)`) to a duck-type check (`hasattr(v, "thisown")`). The new filter is stricter: numpy arrays, lists, and `None` no longer get appended. Safe in practice because `SearchParameters` SWIG fields are typed as C++ pointers and only legitimately accept SWIG wrappers; numpy buffers passed indirectly (e.g., to `IDSelectorBatch`) are kept alive by the inner wrapper's own `add_to_referenced_objects`. The duck-type filter is field-name-agnostic, so other `SearchParameters` subclass fields with the same C++ shape (`Foo* field = nullptr` with no destructor freeing `field`) benefit from the same protection automatically. Examples include `SearchParametersPreTransform.index_params` and `SearchParametersIVF.quantizer_params`. # Per-field ref tracking A naive implementation that uses the existing `add_to_referenced_objects` helper (which appends to a list) leaks under a different access pattern: `params = SearchParameters(); for ...: params.sel = ...` accumulates one entry per reassignment because nothing drops the prior ref. This is a common pattern in long-lived servers that initialize a `SearchParameters` once and rebind `params.sel` per request. Fixed by using a per-field dict `self._sp_field_refs` keyed by attribute name. Reassigning the same field replaces the entry in the dict, so the prior wrapper loses its Python ref and the C++ object is freed. The else branch of `replacement_setattr` also drops any prior ref when the field is set to None or to a non-SWIG value, so explicit clear (`params.sel = None`) releases the C++ object immediately. # Subclass guard `__setattr__` is installed only once per class hierarchy. SWIG generates per-class `__init__` slots but does NOT generate per-class `__setattr__`, so subclasses (`SearchParametersIVF`, `SearchParametersHNSW`, ...) inherit the override via MRO. Without the `_protected_setattr` sentinel, calling `handle_SearchParameters(SearchParametersIVF)` would capture the inherited `replacement_setattr` as `parent_setattr`, then install a second `replacement_setattr` on top - chained calls would fire field tracking twice. The guard uses `getattr` (which walks the MRO) rather than `__dict__` lookup (own-attribute only) so subclasses correctly see the base's marker. `replacement_init` does not need this guard because each SWIG class generates its own `__init__` in its own `__dict__`; wrapping always captures SWIG's fresh init, never an inherited replacement. # Prior attempts and why this approach is targeted This bug has been attempted twice before via the SWIG layer: - #3139 (Apr 2024, abandoned) - #3810 (Sept 2024, changes requested) Both added a `%typemap(in) SWIGTYPE *` in `swigfaiss.swig` that strips the `$disown` flag globally, so SWIG never transfers ownership for any pointer field assignment anywhere in the bindings. The minimal change is one typemap. The trade-off proved untenable. PR 3810 surfaced the smoking gun while testing: `test_graph_based.test_io_no_storage` started crashing with heap-use-after-free at `index.storage = faiss.clone_index(index.storage)` with `own_fields = True`. The author worked around it by introducing a temp variable and flipping `own_fields = False` - but that flip is itself a behavior change to a documented FAISS contract that downstream users build on. The right framing is not "SWIG's default ownership transfer is always wrong" (the typemap assumption) but "SWIG's default ownership transfer is wrong specifically for `SearchParameters` because that class lacks a destructor for its borrowed pointer fields." This diff scopes the change to that class hierarchy; every other SWIG-managed field keeps its existing semantics. No `swigfaiss.swig` change, no surprising downstream breakage. The memory regression test pattern (`np.arange(5_000_000)` × N iterations, measure `faiss.get_mem_usage_kb`) was independently validated by PR 3810's `test_ownership_2` and is reused here. ASAN cannot catch this leak because the C++ object remains reachable from a Python wrapper that simply has no Python-side references - it is a Python-level reference leak, not a C++-level dangling pointer. Reviewed By: mnorris11 Differential Revision: D104750178 fbshipit-source-id: 41f865028d0dc863fe560d1fd1bbe979dc8e8e8f

facebook-github-bot added the CLA Signed label Aug 29, 2024

facebook-github-bot added the fb-exported label Aug 29, 2024

mnorris11 force-pushed the export-D61992599 branch from b888ed1 to cfa5848 Compare August 29, 2024 22:51

mnorris11 force-pushed the export-D61992599 branch from cfa5848 to 299f0b3 Compare September 12, 2024 15:49

mnorris11 force-pushed the export-D61992599 branch from 299f0b3 to d5d8caa Compare September 12, 2024 21:46

asadoughi added the Performance label Sep 15, 2024

asadoughi assigned mnorris11 Oct 2, 2024

mnorris11 added the ignore label Oct 25, 2024

alibeklfc mentioned this pull request May 12, 2026

Fix IDSelector leak via SearchParameters.sel setter #5208

Closed

alibeklfc closed this May 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix IDSelector memory leak with SWIG, add unit tests (#3704)#3810

Fix IDSelector memory leak with SWIG, add unit tests (#3704)#3810
mnorris11 wants to merge 1 commit into
facebookresearch:mainfrom
mnorris11:export-D61992599

mnorris11 commented Aug 29, 2024

Uh oh!

facebook-github-bot commented Aug 29, 2024

Uh oh!

facebook-github-bot commented Aug 29, 2024

Uh oh!

mnorris11 commented Sep 2, 2024

Uh oh!

facebook-github-bot commented Sep 12, 2024

Uh oh!

facebook-github-bot commented Sep 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

mnorris11 commented Aug 29, 2024

Background

Current status

Uh oh!

facebook-github-bot commented Aug 29, 2024

Uh oh!

facebook-github-bot commented Aug 29, 2024

Uh oh!

mnorris11 commented Sep 2, 2024

Uh oh!

facebook-github-bot commented Sep 12, 2024

Uh oh!

facebook-github-bot commented Sep 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants