Skip to content

Fix race condition, memory management, debug output, and hashtable lookup in sorting.cpp (#5078)#5078

Closed
alibeklfc wants to merge 1 commit intofacebookresearch:mainfrom
alibeklfc:export-D100317917
Closed

Fix race condition, memory management, debug output, and hashtable lookup in sorting.cpp (#5078)#5078
alibeklfc wants to merge 1 commit intofacebookresearch:mainfrom
alibeklfc:export-D100317917

Conversation

@alibeklfc
Copy link
Copy Markdown
Contributor

@alibeklfc alibeklfc commented Apr 10, 2026

Summary:

Four fixes in faiss/utils/sorting.cpp:

1. OpenMP directive fix in fvec_argsort_parallel

The initialization loop used #pragma omp parallel without the for directive. This caused every thread to execute the entire loop independently rather than distributing iterations. With nt threads, each permA[i] was written by all nt threads concurrently — a data race under the C++ memory model (multiple unsynchronized writes to the same non-atomic location), and O(n * nt) wasted work instead of O(n). Fixed by changing to #pragma omp parallel for.

In practice, all threads write the same value (permA[i] = i), so the output was always correct despite the UB. The fix eliminates the undefined behavior and the redundant work.

2. RAII memory management in fvec_argsort_parallel

Replaced new size_t[n] / delete[] perm2 with std::vector<size_t>. The old code had no realistic exception path between allocation and deallocation (all intermediate code is either C functions or non-throwing OpenMP regions), but the manual new/delete pattern is fragile against future edits that might introduce a throwing path. The std::vector provides RAII lifetime management with no behavioral change.

3. Removed debug printf in fvec_argsort_parallel

A leftover printf("merge %d %d, %d threads\n", ...) in the parallel merge loop wrote to stdout during normal operation. Removed.

4. Missing early termination in hashtable_int64_to_int64_lookup

The linear probing loop did not check for empty slots (tab[slot * 2] == -1). In an open-addressing hash table with no deletion support, an empty slot is definitive proof that the key was not inserted — the insert function would have placed it there or earlier. Without this check, lookups for absent keys probed every slot in the bucket before the wrap-around termination at slot == hk_i. The fix adds the standard empty-slot check, matching the structure of the insert function (hashtable_int64_to_int64_add). This is a performance optimization — the old code always returned the correct result (-1 after a full bucket scan), just slower.

Differential Revision: D100317917

@meta-cla meta-cla bot added the CLA Signed label Apr 10, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync bot commented Apr 10, 2026

@alibeklfc has exported this pull request. If you are a Meta employee, you can view the originating Diff in D100317917.

…okup in sorting.cpp (facebookresearch#5078)

Summary:

Four fixes in `faiss/utils/sorting.cpp`:

**1. OpenMP directive fix in `fvec_argsort_parallel`**

The initialization loop used `#pragma omp parallel` without the `for` directive. This caused every thread to execute the entire loop independently rather than distributing iterations. With `nt` threads, each `permA[i]` was written by all `nt` threads concurrently — a data race under the C++ memory model (multiple unsynchronized writes to the same non-atomic location), and O(n * nt) wasted work instead of O(n). Fixed by changing to `#pragma omp parallel for`.

In practice, all threads write the same value (`permA[i] = i`), so the output was always correct despite the UB. The fix eliminates the undefined behavior and the redundant work.

**2. RAII memory management in `fvec_argsort_parallel`**

Replaced `new size_t[n]` / `delete[] perm2` with `std::vector<size_t>`. The old code had no realistic exception path between allocation and deallocation (all intermediate code is either C functions or non-throwing OpenMP regions), but the manual `new`/`delete` pattern is fragile against future edits that might introduce a throwing path. The `std::vector` provides RAII lifetime management with no behavioral change.

**3. Removed debug `printf` in `fvec_argsort_parallel`**

A leftover `printf("merge %d %d, %d threads\n", ...)` in the parallel merge loop wrote to stdout during normal operation. Removed.

**4. Missing early termination in `hashtable_int64_to_int64_lookup`**

The linear probing loop did not check for empty slots (`tab[slot * 2] == -1`). In an open-addressing hash table with no deletion support, an empty slot is definitive proof that the key was not inserted — the insert function would have placed it there or earlier. Without this check, lookups for absent keys probed every slot in the bucket before the wrap-around termination at `slot == hk_i`. The fix adds the standard empty-slot check, matching the structure of the insert function (`hashtable_int64_to_int64_add`). This is a performance optimization — the old code always returned the correct result (`-1` after a full bucket scan), just slower.

Differential Revision: D100317917
@meta-codesync meta-codesync bot changed the title Fix race condition, memory management, debug output, and hashtable lookup in sorting.cpp Fix race condition, memory management, debug output, and hashtable lookup in sorting.cpp (#5078) Apr 10, 2026
@meta-codesync meta-codesync bot closed this in aa3ce37 Apr 11, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync bot commented Apr 11, 2026

This pull request has been merged in aa3ce37.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant