Templatize simdlib types on SIMDLevel#4866
Closed
algoriddle wants to merge 1 commit intofacebookresearch:mainfrom
Closed
Templatize simdlib types on SIMDLevel#4866algoriddle wants to merge 1 commit intofacebookresearch:mainfrom
algoriddle wants to merge 1 commit intofacebookresearch:mainfrom
Conversation
algoriddle
added a commit
to algoriddle/faiss
that referenced
this pull request
Mar 6, 2026
Summary: Templatize all simd wrapper types (simd16uint16, simd32uint8, simd8float32, etc.) on SIMDLevel. This is the foundation for PQ4 fast scan Dynamic Dispatch. Primary templates are declared in simdlib.h. Each platform header provides explicit specializations: - simdlib_avx2.h: simd16uint16<AVX2>, simd32uint8<AVX2>, etc. - simdlib_avx512.h: simd32uint16<AVX512>, simd64uint8<AVX512>, etc. - simdlib_neon.h: simd16uint16<ARM_NEON>, etc. - simdlib_emulated.h: simd16uint16<NONE>, etc. (always included) - simdlib_ppc64.h: simd16uint16<NONE>, etc. (PPC-optimized scalar) SINGLE_SIMD_LEVEL (inline constexpr in simd_levels.h) resolves to NONE in DD mode and to the compiled-in level in static mode. SINGLE_SIMD_LEVEL_256 maps through simd256_level_selector for 256-bit types (AVX512->AVX2, SVE->NEON). Code without explicit SL context uses these. This is migration scaffolding — subsequent diffs will replace SINGLE_SIMD_LEVEL usages with proper SL dispatch. simd_result_handlers.h is no longer %include'd by SWIG (the templatized types are unparseable by SWIG). make_knn_handler methods are %ignore'd. The Python API does not use these internal SIMD handler types. Pre-existing bug fixes bundled with this refactor: - simdlib_avx512.h: simd512bit::bin() stack buffer overflow (char[257] -> char[513]) - simdlib_avx2.h: simd256bit constructor used aligned _mm256_load_si256 instead of unaligned _mm256_loadu_si256 - All platform headers: simd16uint16/simd32uint8 operator+=/operator-= returned by value instead of by reference Static builds: zero performance change. Template specializations produce identical layout, ABI, and codegen as the old plain structs. Differential Revision: D95392150
107d299 to
afe0885
Compare
Contributor
|
@algoriddle has exported this pull request. If you are a Meta employee, you can view the originating Diff in D95392150. |
algoriddle
added a commit
to algoriddle/faiss
that referenced
this pull request
Mar 6, 2026
Summary: Templatize all simd wrapper types (simd16uint16, simd32uint8, simd8float32, etc.) on SIMDLevel. This is the foundation for PQ4 fast scan Dynamic Dispatch. Primary templates are declared in simdlib.h. Each platform header provides explicit specializations: - simdlib_avx2.h: simd16uint16<AVX2>, simd32uint8<AVX2>, etc. - simdlib_avx512.h: simd32uint16<AVX512>, simd64uint8<AVX512>, etc. - simdlib_neon.h: simd16uint16<ARM_NEON>, etc. - simdlib_emulated.h: simd16uint16<NONE>, etc. (always included) - simdlib_ppc64.h: simd16uint16<NONE>, etc. (PPC-optimized scalar) SINGLE_SIMD_LEVEL (inline constexpr in simd_levels.h) resolves to NONE in DD mode and to the compiled-in level in static mode. SINGLE_SIMD_LEVEL_256 maps through simd256_level_selector for 256-bit types (AVX512->AVX2, SVE->NEON). Code without explicit SL context uses these. This is migration scaffolding — subsequent diffs will replace SINGLE_SIMD_LEVEL usages with proper SL dispatch. simd_result_handlers.h is no longer %include'd by SWIG (the templatized types are unparseable by SWIG). make_knn_handler methods are %ignore'd. The Python API does not use these internal SIMD handler types. Pre-existing bug fixes bundled with this refactor: - simdlib_avx512.h: simd512bit::bin() stack buffer overflow (char[257] -> char[513]) - simdlib_avx2.h: simd256bit constructor used aligned _mm256_load_si256 instead of unaligned _mm256_loadu_si256 - All platform headers: simd16uint16/simd32uint8 operator+=/operator-= returned by value instead of by reference Static builds: zero performance change. Template specializations produce identical layout, ABI, and codegen as the old plain structs. Differential Revision: D95392150
afe0885 to
e22bcc6
Compare
algoriddle
added a commit
to algoriddle/faiss
that referenced
this pull request
Mar 6, 2026
Summary: Templatize all simd wrapper types (simd16uint16, simd32uint8, simd8float32, etc.) on SIMDLevel. This is the foundation for PQ4 fast scan Dynamic Dispatch. Primary templates are declared in simdlib.h. Each platform header provides explicit specializations: - simdlib_avx2.h: simd16uint16<AVX2>, simd32uint8<AVX2>, etc. - simdlib_avx512.h: simd32uint16<AVX512>, simd64uint8<AVX512>, etc. - simdlib_neon.h: simd16uint16<ARM_NEON>, etc. - simdlib_emulated.h: simd16uint16<NONE>, etc. (always included) - simdlib_ppc64.h: simd16uint16<NONE>, etc. (PPC-optimized scalar) SINGLE_SIMD_LEVEL (inline constexpr in simd_levels.h) resolves to NONE in DD mode and to the compiled-in level in static mode. SINGLE_SIMD_LEVEL_256 maps through simd256_level_selector for 256-bit types (AVX512->AVX2, SVE->NEON). Code without explicit SL context uses these. This is migration scaffolding — subsequent diffs will replace SINGLE_SIMD_LEVEL usages with proper SL dispatch. simd_result_handlers.h is no longer %include'd by SWIG (the templatized types are unparseable by SWIG). make_knn_handler methods are %ignore'd. The Python API does not use these internal SIMD handler types. Pre-existing bug fixes bundled with this refactor: - simdlib_avx512.h: simd512bit::bin() stack buffer overflow (char[257] -> char[513]) - simdlib_avx2.h: simd256bit constructor used aligned _mm256_load_si256 instead of unaligned _mm256_loadu_si256 - All platform headers: simd16uint16/simd32uint8 operator+=/operator-= returned by value instead of by reference Static builds: zero performance change. Template specializations produce identical layout, ABI, and codegen as the old plain structs. Differential Revision: D95392150
e22bcc6 to
b0e55b5
Compare
algoriddle
added a commit
to algoriddle/faiss
that referenced
this pull request
Mar 6, 2026
Summary: Pull Request resolved: facebookresearch#4866 Templatize all simd wrapper types (simd16uint16, simd32uint8, simd8float32, etc.) on SIMDLevel. This is the foundation for PQ4 fast scan Dynamic Dispatch. Primary templates are declared in simdlib.h. Each platform header provides explicit specializations: - simdlib_avx2.h: simd16uint16<AVX2>, simd32uint8<AVX2>, etc. - simdlib_avx512.h: simd32uint16<AVX512>, simd64uint8<AVX512>, etc. - simdlib_neon.h: simd16uint16<ARM_NEON>, etc. - simdlib_emulated.h: simd16uint16<NONE>, etc. (always included) - simdlib_ppc64.h: simd16uint16<NONE>, etc. (PPC-optimized scalar) SINGLE_SIMD_LEVEL (inline constexpr in simd_levels.h) resolves to NONE in DD mode and to the compiled-in level in static mode. SINGLE_SIMD_LEVEL_256 maps through simd256_level_selector for 256-bit types (AVX512->AVX2, SVE->NEON). Code without explicit SL context uses these. This is migration scaffolding — subsequent diffs will replace SINGLE_SIMD_LEVEL usages with proper SL dispatch. simd_result_handlers.h is no longer %include'd by SWIG (the templatized types are unparseable by SWIG). make_knn_handler methods are %ignore'd. The Python API does not use these internal SIMD handler types. Pre-existing bug fixes bundled with this refactor: - simdlib_avx512.h: simd512bit::bin() stack buffer overflow (char[257] -> char[513]) - simdlib_avx2.h: simd256bit constructor used aligned _mm256_load_si256 instead of unaligned _mm256_loadu_si256 - All platform headers: simd16uint16/simd32uint8 operator+=/operator-= returned by value instead of by reference Static builds: zero performance change. Template specializations produce identical layout, ABI, and codegen as the old plain structs. Differential Revision: D95392150
1fccad8 to
aa5bbc4
Compare
algoriddle
added a commit
to algoriddle/faiss
that referenced
this pull request
Mar 8, 2026
Summary: Pull Request resolved: facebookresearch#4866 Templatize all simd wrapper types (simd16uint16, simd32uint8, simd8float32, etc.) on SIMDLevel. This is the foundation for PQ4 fast scan Dynamic Dispatch. Primary templates are declared in simdlib.h. Each platform header provides explicit specializations: - simdlib_avx2.h: simd16uint16<AVX2>, simd32uint8<AVX2>, etc. - simdlib_avx512.h: simd32uint16<AVX512>, simd64uint8<AVX512>, etc. - simdlib_neon.h: simd16uint16<ARM_NEON>, etc. - simdlib_emulated.h: simd16uint16<NONE>, etc. (always included) - simdlib_ppc64.h: simd16uint16<NONE>, etc. (PPC-optimized scalar) SINGLE_SIMD_LEVEL (inline constexpr in simd_levels.h) resolves to NONE in DD mode and to the compiled-in level in static mode. SINGLE_SIMD_LEVEL_256 maps through simd256_level_selector for 256-bit types (AVX512->AVX2, SVE->NEON). Code without explicit SL context uses these. This is migration scaffolding — subsequent diffs will replace SINGLE_SIMD_LEVEL usages with proper SL dispatch. simd_result_handlers.h is no longer %include'd by SWIG (the templatized types are unparseable by SWIG). make_knn_handler methods are %ignore'd. The Python API does not use these internal SIMD handler types. Pre-existing bug fixes bundled with this refactor: - simdlib_avx512.h: simd512bit::bin() stack buffer overflow (char[257] -> char[513]) - simdlib_avx2.h: simd256bit constructor used aligned _mm256_load_si256 instead of unaligned _mm256_loadu_si256 - All platform headers: simd16uint16/simd32uint8 operator+=/operator-= returned by value instead of by reference Static builds: zero performance change. Template specializations produce identical layout, ABI, and codegen as the old plain structs. Reviewed By: mdouze Differential Revision: D95392150
algoriddle
added a commit
to algoriddle/faiss
that referenced
this pull request
Mar 8, 2026
Summary: Pull Request resolved: facebookresearch#4866 Templatize all simd wrapper types (simd16uint16, simd32uint8, simd8float32, etc.) on SIMDLevel. This is the foundation for PQ4 fast scan Dynamic Dispatch. Primary templates are declared in simdlib.h. Each platform header provides explicit specializations: - simdlib_avx2.h: simd16uint16<AVX2>, simd32uint8<AVX2>, etc. - simdlib_avx512.h: simd32uint16<AVX512>, simd64uint8<AVX512>, etc. - simdlib_neon.h: simd16uint16<ARM_NEON>, etc. - simdlib_emulated.h: simd16uint16<NONE>, etc. (always included) - simdlib_ppc64.h: simd16uint16<NONE>, etc. (PPC-optimized scalar) SINGLE_SIMD_LEVEL (inline constexpr in simd_levels.h) resolves to NONE in DD mode and to the compiled-in level in static mode. SINGLE_SIMD_LEVEL_256 maps through simd256_level_selector for 256-bit types (AVX512->AVX2, SVE->NEON). Code without explicit SL context uses these. This is migration scaffolding — subsequent diffs will replace SINGLE_SIMD_LEVEL usages with proper SL dispatch. simd_result_handlers.h is no longer %include'd by SWIG (the templatized types are unparseable by SWIG). make_knn_handler methods are %ignore'd. The Python API does not use these internal SIMD handler types. Pre-existing bug fixes bundled with this refactor: - simdlib_avx512.h: simd512bit::bin() stack buffer overflow (char[257] -> char[513]) - simdlib_avx2.h: simd256bit constructor used aligned _mm256_load_si256 instead of unaligned _mm256_loadu_si256 - All platform headers: simd16uint16/simd32uint8 operator+=/operator-= returned by value instead of by reference Static builds: zero performance change. Template specializations produce identical layout, ABI, and codegen as the old plain structs. Reviewed By: mdouze Differential Revision: D95392150
aa5bbc4 to
8df5c5f
Compare
Summary: Pull Request resolved: facebookresearch#4866 Templatize all simd wrapper types (simd16uint16, simd32uint8, simd8float32, etc.) on SIMDLevel. This is the foundation for PQ4 fast scan Dynamic Dispatch. Primary templates are declared in simdlib.h. Each platform header provides explicit specializations: - simdlib_avx2.h: simd16uint16<AVX2>, simd32uint8<AVX2>, etc. - simdlib_avx512.h: simd32uint16<AVX512>, simd64uint8<AVX512>, etc. - simdlib_neon.h: simd16uint16<ARM_NEON>, etc. - simdlib_emulated.h: simd16uint16<NONE>, etc. (always included) - simdlib_ppc64.h: simd16uint16<NONE>, etc. (PPC-optimized scalar) SINGLE_SIMD_LEVEL (inline constexpr in simd_levels.h) resolves to NONE in DD mode and to the compiled-in level in static mode. SINGLE_SIMD_LEVEL_256 maps through simd256_level_selector for 256-bit types (AVX512->AVX2, SVE->NEON). Code without explicit SL context uses these. This is migration scaffolding — subsequent diffs will replace SINGLE_SIMD_LEVEL usages with proper SL dispatch. simd_result_handlers.h is no longer %include'd by SWIG (the templatized types are unparseable by SWIG). make_knn_handler methods are %ignore'd. The Python API does not use these internal SIMD handler types. Pre-existing bug fixes bundled with this refactor: - simdlib_avx512.h: simd512bit::bin() stack buffer overflow (char[257] -> char[513]) - simdlib_avx2.h: simd256bit constructor used aligned _mm256_load_si256 instead of unaligned _mm256_loadu_si256 - All platform headers: simd16uint16/simd32uint8 operator+=/operator-= returned by value instead of by reference Static builds: zero performance change. Template specializations produce identical layout, ABI, and codegen as the old plain structs. Reviewed By: mdouze Differential Revision: D95392150
8df5c5f to
c9a7603
Compare
Contributor
|
This pull request has been merged in 8d8268c. |
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Templatize all simd wrapper types (simd16uint16, simd32uint8, simd8float32,
etc.) on SIMDLevel. This is the foundation for PQ4 fast scan Dynamic Dispatch.
Primary templates are declared in simdlib.h. Each platform header provides
explicit specializations:
SINGLE_SIMD_LEVEL (inline constexpr in simd_levels.h) resolves to NONE in DD
mode and to the compiled-in level in static mode. SINGLE_SIMD_LEVEL_256 maps
through simd256_level_selector for 256-bit types (AVX512->AVX2, SVE->NEON).
Code without explicit SL context uses these. This is migration scaffolding —
subsequent diffs will replace SINGLE_SIMD_LEVEL usages with proper SL dispatch.
simd_result_handlers.h is no longer %include'd by SWIG (the templatized types
are unparseable by SWIG). make_knn_handler methods are %ignore'd. The Python
API does not use these internal SIMD handler types.
Pre-existing bug fixes bundled with this refactor:
of unaligned _mm256_loadu_si256
by value instead of by reference
Static builds: zero performance change. Template specializations produce
identical layout, ABI, and codegen as the old plain structs.
Differential Revision: D95392150