Skip to content

Bump GCC pin in faiss-gpu conda recipe to fix AVX2 SIMD miscompilation#5125

Closed
algoriddle wants to merge 1 commit into
facebookresearch:mainfrom
algoriddle:export-D101601476
Closed

Bump GCC pin in faiss-gpu conda recipe to fix AVX2 SIMD miscompilation#5125
algoriddle wants to merge 1 commit into
facebookresearch:mainfrom
algoriddle:export-D101601476

Conversation

@algoriddle
Copy link
Copy Markdown
Contributor

Summary:
The faiss-gpu conda recipe pins {{ compiler('cxx') }} =12.4 (GCC 12.4). GCC 12.4 miscompiles the 16-bin SIMD histogram reduction in partitioning_simdlib256.h, producing correct results for bins 0-7 but near-zero for bins 8-15. This causes test_16bin_bounded_bigrange in TestHistograms_AVX2 to fail in the CUDA 12.6 GPU nightly.

The bug is in GCC 12's code generation for the AVX2 cross-lane reduction chain (_mm256_hadd_epi16_mm256_permute2f128_si256_mm256_permutevar8x32_epi32). GCC 13 and 14 both compile this correctly. The CPU-only faiss/meta.yaml leaves the compiler unpinned (gets GCC 14), which is why only the GPU nightly fails.

The GCC 12.4 pin was introduced in D84193438 as part of a batch nightly fix — not a deliberate CUDA compatibility constraint. CUDA 12.6 supports up to GCC 13.x as host compiler (GCC 14 requires CUDA 12.9+), so we widen the pin to >=12.4,<14.

Reproduced locally: GCC 12.4 fails, GCC 13.4 passes, GCC 14.2 passes — all on the same faiss source, same test, same machine.

Differential Revision: D101601476

Summary:
The faiss-gpu conda recipe pins `{{ compiler('cxx') }} =12.4` (GCC 12.4). GCC 12.4 miscompiles the 16-bin SIMD histogram reduction in `partitioning_simdlib256.h`, producing correct results for bins 0-7 but near-zero for bins 8-15. This causes `test_16bin_bounded_bigrange` in `TestHistograms_AVX2` to fail in the CUDA 12.6 GPU nightly.

The bug is in GCC 12's code generation for the AVX2 cross-lane reduction chain (`_mm256_hadd_epi16` → `_mm256_permute2f128_si256` → `_mm256_permutevar8x32_epi32`). GCC 13 and 14 both compile this correctly. The CPU-only `faiss/meta.yaml` leaves the compiler unpinned (gets GCC 14), which is why only the GPU nightly fails.

The GCC 12.4 pin was introduced in D84193438 as part of a batch nightly fix — not a deliberate CUDA compatibility constraint. CUDA 12.6 supports up to GCC 13.x as host compiler (GCC 14 requires CUDA 12.9+), so we widen the pin to `>=12.4,<14`.

Reproduced locally: GCC 12.4 fails, GCC 13.4 passes, GCC 14.2 passes — all on the same faiss source, same test, same machine.

Differential Revision: D101601476
@meta-cla meta-cla Bot added the CLA Signed label Apr 20, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Apr 20, 2026

@algoriddle has exported this pull request. If you are a Meta employee, you can view the originating Diff in D101601476.

@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Apr 20, 2026

This pull request has been merged in 6e64c5d.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant