Templatize simdlib types on SIMDLevel by algoriddle · Pull Request #4866 · facebookresearch/faiss

algoriddle · 2026-03-05T16:10:07Z

Summary:
Templatize all simd wrapper types (simd16uint16, simd32uint8, simd8float32,
etc.) on SIMDLevel. This is the foundation for PQ4 fast scan Dynamic Dispatch.

Primary templates are declared in simdlib.h. Each platform header provides
explicit specializations:

simdlib_avx2.h: simd16uint16, simd32uint8, etc.
simdlib_avx512.h: simd32uint16, simd64uint8, etc.
simdlib_neon.h: simd16uint16<ARM_NEON>, etc.
simdlib_emulated.h: simd16uint16, etc. (always included)
simdlib_ppc64.h: simd16uint16, etc. (PPC-optimized scalar)

SINGLE_SIMD_LEVEL (inline constexpr in simd_levels.h) resolves to NONE in DD
mode and to the compiled-in level in static mode. SINGLE_SIMD_LEVEL_256 maps
through simd256_level_selector for 256-bit types (AVX512->AVX2, SVE->NEON).
Code without explicit SL context uses these. This is migration scaffolding —
subsequent diffs will replace SINGLE_SIMD_LEVEL usages with proper SL dispatch.

simd_result_handlers.h is no longer %include'd by SWIG (the templatized types
are unparseable by SWIG). make_knn_handler methods are %ignore'd. The Python
API does not use these internal SIMD handler types.

Pre-existing bug fixes bundled with this refactor:

simdlib_avx512.h: simd512bit::bin() stack buffer overflow (char[257] -> char[513])
simdlib_avx2.h: simd256bit constructor used aligned _mm256_load_si256 instead
of unaligned _mm256_loadu_si256
All platform headers: simd16uint16/simd32uint8 operator+=/operator-= returned
by value instead of by reference

Static builds: zero performance change. Template specializations produce
identical layout, ABI, and codegen as the old plain structs.

Differential Revision: D95392150

Summary: Templatize all simd wrapper types (simd16uint16, simd32uint8, simd8float32, etc.) on SIMDLevel. This is the foundation for PQ4 fast scan Dynamic Dispatch. Primary templates are declared in simdlib.h. Each platform header provides explicit specializations: - simdlib_avx2.h: simd16uint16<AVX2>, simd32uint8<AVX2>, etc. - simdlib_avx512.h: simd32uint16<AVX512>, simd64uint8<AVX512>, etc. - simdlib_neon.h: simd16uint16<ARM_NEON>, etc. - simdlib_emulated.h: simd16uint16<NONE>, etc. (always included) - simdlib_ppc64.h: simd16uint16<NONE>, etc. (PPC-optimized scalar) SINGLE_SIMD_LEVEL (inline constexpr in simd_levels.h) resolves to NONE in DD mode and to the compiled-in level in static mode. SINGLE_SIMD_LEVEL_256 maps through simd256_level_selector for 256-bit types (AVX512->AVX2, SVE->NEON). Code without explicit SL context uses these. This is migration scaffolding — subsequent diffs will replace SINGLE_SIMD_LEVEL usages with proper SL dispatch. simd_result_handlers.h is no longer %include'd by SWIG (the templatized types are unparseable by SWIG). make_knn_handler methods are %ignore'd. The Python API does not use these internal SIMD handler types. Pre-existing bug fixes bundled with this refactor: - simdlib_avx512.h: simd512bit::bin() stack buffer overflow (char[257] -> char[513]) - simdlib_avx2.h: simd256bit constructor used aligned _mm256_load_si256 instead of unaligned _mm256_loadu_si256 - All platform headers: simd16uint16/simd32uint8 operator+=/operator-= returned by value instead of by reference Static builds: zero performance change. Template specializations produce identical layout, ABI, and codegen as the old plain structs. Differential Revision: D95392150

meta-codesync · 2026-03-06T11:02:09Z

@algoriddle has exported this pull request. If you are a Meta employee, you can view the originating Diff in D95392150.

Summary: Templatize all simd wrapper types (simd16uint16, simd32uint8, simd8float32, etc.) on SIMDLevel. This is the foundation for PQ4 fast scan Dynamic Dispatch. Primary templates are declared in simdlib.h. Each platform header provides explicit specializations: - simdlib_avx2.h: simd16uint16<AVX2>, simd32uint8<AVX2>, etc. - simdlib_avx512.h: simd32uint16<AVX512>, simd64uint8<AVX512>, etc. - simdlib_neon.h: simd16uint16<ARM_NEON>, etc. - simdlib_emulated.h: simd16uint16<NONE>, etc. (always included) - simdlib_ppc64.h: simd16uint16<NONE>, etc. (PPC-optimized scalar) SINGLE_SIMD_LEVEL (inline constexpr in simd_levels.h) resolves to NONE in DD mode and to the compiled-in level in static mode. SINGLE_SIMD_LEVEL_256 maps through simd256_level_selector for 256-bit types (AVX512->AVX2, SVE->NEON). Code without explicit SL context uses these. This is migration scaffolding — subsequent diffs will replace SINGLE_SIMD_LEVEL usages with proper SL dispatch. simd_result_handlers.h is no longer %include'd by SWIG (the templatized types are unparseable by SWIG). make_knn_handler methods are %ignore'd. The Python API does not use these internal SIMD handler types. Pre-existing bug fixes bundled with this refactor: - simdlib_avx512.h: simd512bit::bin() stack buffer overflow (char[257] -> char[513]) - simdlib_avx2.h: simd256bit constructor used aligned _mm256_load_si256 instead of unaligned _mm256_loadu_si256 - All platform headers: simd16uint16/simd32uint8 operator+=/operator-= returned by value instead of by reference Static builds: zero performance change. Template specializations produce identical layout, ABI, and codegen as the old plain structs. Differential Revision: D95392150

Summary: Pull Request resolved: facebookresearch#4866 Templatize all simd wrapper types (simd16uint16, simd32uint8, simd8float32, etc.) on SIMDLevel. This is the foundation for PQ4 fast scan Dynamic Dispatch. Primary templates are declared in simdlib.h. Each platform header provides explicit specializations: - simdlib_avx2.h: simd16uint16<AVX2>, simd32uint8<AVX2>, etc. - simdlib_avx512.h: simd32uint16<AVX512>, simd64uint8<AVX512>, etc. - simdlib_neon.h: simd16uint16<ARM_NEON>, etc. - simdlib_emulated.h: simd16uint16<NONE>, etc. (always included) - simdlib_ppc64.h: simd16uint16<NONE>, etc. (PPC-optimized scalar) SINGLE_SIMD_LEVEL (inline constexpr in simd_levels.h) resolves to NONE in DD mode and to the compiled-in level in static mode. SINGLE_SIMD_LEVEL_256 maps through simd256_level_selector for 256-bit types (AVX512->AVX2, SVE->NEON). Code without explicit SL context uses these. This is migration scaffolding — subsequent diffs will replace SINGLE_SIMD_LEVEL usages with proper SL dispatch. simd_result_handlers.h is no longer %include'd by SWIG (the templatized types are unparseable by SWIG). make_knn_handler methods are %ignore'd. The Python API does not use these internal SIMD handler types. Pre-existing bug fixes bundled with this refactor: - simdlib_avx512.h: simd512bit::bin() stack buffer overflow (char[257] -> char[513]) - simdlib_avx2.h: simd256bit constructor used aligned _mm256_load_si256 instead of unaligned _mm256_loadu_si256 - All platform headers: simd16uint16/simd32uint8 operator+=/operator-= returned by value instead of by reference Static builds: zero performance change. Template specializations produce identical layout, ABI, and codegen as the old plain structs. Differential Revision: D95392150

Summary: Pull Request resolved: facebookresearch#4866 Templatize all simd wrapper types (simd16uint16, simd32uint8, simd8float32, etc.) on SIMDLevel. This is the foundation for PQ4 fast scan Dynamic Dispatch. Primary templates are declared in simdlib.h. Each platform header provides explicit specializations: - simdlib_avx2.h: simd16uint16<AVX2>, simd32uint8<AVX2>, etc. - simdlib_avx512.h: simd32uint16<AVX512>, simd64uint8<AVX512>, etc. - simdlib_neon.h: simd16uint16<ARM_NEON>, etc. - simdlib_emulated.h: simd16uint16<NONE>, etc. (always included) - simdlib_ppc64.h: simd16uint16<NONE>, etc. (PPC-optimized scalar) SINGLE_SIMD_LEVEL (inline constexpr in simd_levels.h) resolves to NONE in DD mode and to the compiled-in level in static mode. SINGLE_SIMD_LEVEL_256 maps through simd256_level_selector for 256-bit types (AVX512->AVX2, SVE->NEON). Code without explicit SL context uses these. This is migration scaffolding — subsequent diffs will replace SINGLE_SIMD_LEVEL usages with proper SL dispatch. simd_result_handlers.h is no longer %include'd by SWIG (the templatized types are unparseable by SWIG). make_knn_handler methods are %ignore'd. The Python API does not use these internal SIMD handler types. Pre-existing bug fixes bundled with this refactor: - simdlib_avx512.h: simd512bit::bin() stack buffer overflow (char[257] -> char[513]) - simdlib_avx2.h: simd256bit constructor used aligned _mm256_load_si256 instead of unaligned _mm256_loadu_si256 - All platform headers: simd16uint16/simd32uint8 operator+=/operator-= returned by value instead of by reference Static builds: zero performance change. Template specializations produce identical layout, ABI, and codegen as the old plain structs. Reviewed By: mdouze Differential Revision: D95392150

meta-codesync · 2026-03-09T09:27:45Z

This pull request has been merged in 8d8268c.

meta-cla bot added the CLA Signed label Mar 5, 2026

meta-codesync bot added fb-exported meta-exported labels Mar 5, 2026

algoriddle force-pushed the export-D95392150 branch from 107d299 to afe0885 Compare March 6, 2026 11:02

algoriddle force-pushed the export-D95392150 branch from afe0885 to e22bcc6 Compare March 6, 2026 15:21

algoriddle force-pushed the export-D95392150 branch from e22bcc6 to b0e55b5 Compare March 6, 2026 15:22

algoriddle force-pushed the export-D95392150 branch 2 times, most recently from 1fccad8 to aa5bbc4 Compare March 8, 2026 13:35

algoriddle force-pushed the export-D95392150 branch from aa5bbc4 to 8df5c5f Compare March 8, 2026 15:56

algoriddle force-pushed the export-D95392150 branch from 8df5c5f to c9a7603 Compare March 8, 2026 15:59

meta-codesync bot closed this in 8d8268c Mar 9, 2026

facebook-github-bot added the Merged label Mar 9, 2026

kyamagu mentioned this pull request Mar 26, 2026

Windows ARM64 (MSVC) build broken by NEON SIMD templatization (D95392150) #4993

Open

Chessing234 mentioned this pull request Apr 6, 2026

Fix Windows ARM64 (MSVC) build: disable NEON SIMD path #5046

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Templatize simdlib types on SIMDLevel#4866

Templatize simdlib types on SIMDLevel#4866
algoriddle wants to merge 1 commit intofacebookresearch:mainfrom
algoriddle:export-D95392150

algoriddle commented Mar 5, 2026

Uh oh!

meta-codesync bot commented Mar 6, 2026

Uh oh!

meta-codesync bot commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

algoriddle commented Mar 5, 2026

Uh oh!

meta-codesync bot commented Mar 6, 2026

Uh oh!

meta-codesync bot commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants