Bump GCC pin in faiss-gpu conda recipe to fix AVX2 SIMD miscompilation by algoriddle · Pull Request #5125 · facebookresearch/faiss

algoriddle · 2026-04-20T08:54:06Z

Summary:
The faiss-gpu conda recipe pins {{ compiler('cxx') }} =12.4 (GCC 12.4). GCC 12.4 miscompiles the 16-bin SIMD histogram reduction in partitioning_simdlib256.h, producing correct results for bins 0-7 but near-zero for bins 8-15. This causes test_16bin_bounded_bigrange in TestHistograms_AVX2 to fail in the CUDA 12.6 GPU nightly.

The bug is in GCC 12's code generation for the AVX2 cross-lane reduction chain (_mm256_hadd_epi16 → _mm256_permute2f128_si256 → _mm256_permutevar8x32_epi32). GCC 13 and 14 both compile this correctly. The CPU-only faiss/meta.yaml leaves the compiler unpinned (gets GCC 14), which is why only the GPU nightly fails.

The GCC 12.4 pin was introduced in D84193438 as part of a batch nightly fix — not a deliberate CUDA compatibility constraint. CUDA 12.6 supports up to GCC 13.x as host compiler (GCC 14 requires CUDA 12.9+), so we widen the pin to >=12.4,<14.

Reproduced locally: GCC 12.4 fails, GCC 13.4 passes, GCC 14.2 passes — all on the same faiss source, same test, same machine.

Differential Revision: D101601476

Summary: The faiss-gpu conda recipe pins `{{ compiler('cxx') }} =12.4` (GCC 12.4). GCC 12.4 miscompiles the 16-bin SIMD histogram reduction in `partitioning_simdlib256.h`, producing correct results for bins 0-7 but near-zero for bins 8-15. This causes `test_16bin_bounded_bigrange` in `TestHistograms_AVX2` to fail in the CUDA 12.6 GPU nightly. The bug is in GCC 12's code generation for the AVX2 cross-lane reduction chain (`_mm256_hadd_epi16` → `_mm256_permute2f128_si256` → `_mm256_permutevar8x32_epi32`). GCC 13 and 14 both compile this correctly. The CPU-only `faiss/meta.yaml` leaves the compiler unpinned (gets GCC 14), which is why only the GPU nightly fails. The GCC 12.4 pin was introduced in D84193438 as part of a batch nightly fix — not a deliberate CUDA compatibility constraint. CUDA 12.6 supports up to GCC 13.x as host compiler (GCC 14 requires CUDA 12.9+), so we widen the pin to `>=12.4,<14`. Reproduced locally: GCC 12.4 fails, GCC 13.4 passes, GCC 14.2 passes — all on the same faiss source, same test, same machine. Differential Revision: D101601476

meta-codesync · 2026-04-20T08:54:15Z

@algoriddle has exported this pull request. If you are a Meta employee, you can view the originating Diff in D101601476.

meta-codesync · 2026-04-20T10:53:59Z

This pull request has been merged in 6e64c5d.

meta-cla Bot added the CLA Signed label Apr 20, 2026

meta-codesync Bot added fb-exported meta-exported labels Apr 20, 2026

meta-codesync Bot closed this in 6e64c5d Apr 20, 2026

facebook-github-tools Bot added the Merged label Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump GCC pin in faiss-gpu conda recipe to fix AVX2 SIMD miscompilation#5125

Bump GCC pin in faiss-gpu conda recipe to fix AVX2 SIMD miscompilation#5125
algoriddle wants to merge 1 commit into
facebookresearch:mainfrom
algoriddle:export-D101601476

algoriddle commented Apr 20, 2026

Uh oh!

meta-codesync Bot commented Apr 20, 2026

Uh oh!

meta-codesync Bot commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

algoriddle commented Apr 20, 2026

Uh oh!

meta-codesync Bot commented Apr 20, 2026

Uh oh!

meta-codesync Bot commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant