Skip to content

Templatize simdlib types on SIMDLevel#4866

Closed
algoriddle wants to merge 1 commit intofacebookresearch:mainfrom
algoriddle:export-D95392150
Closed

Templatize simdlib types on SIMDLevel#4866
algoriddle wants to merge 1 commit intofacebookresearch:mainfrom
algoriddle:export-D95392150

Conversation

@algoriddle
Copy link
Copy Markdown
Contributor

Summary:
Templatize all simd wrapper types (simd16uint16, simd32uint8, simd8float32,
etc.) on SIMDLevel. This is the foundation for PQ4 fast scan Dynamic Dispatch.

Primary templates are declared in simdlib.h. Each platform header provides
explicit specializations:

  • simdlib_avx2.h: simd16uint16, simd32uint8, etc.
  • simdlib_avx512.h: simd32uint16, simd64uint8, etc.
  • simdlib_neon.h: simd16uint16<ARM_NEON>, etc.
  • simdlib_emulated.h: simd16uint16, etc. (always included)
  • simdlib_ppc64.h: simd16uint16, etc. (PPC-optimized scalar)

SINGLE_SIMD_LEVEL (inline constexpr in simd_levels.h) resolves to NONE in DD
mode and to the compiled-in level in static mode. SINGLE_SIMD_LEVEL_256 maps
through simd256_level_selector for 256-bit types (AVX512->AVX2, SVE->NEON).
Code without explicit SL context uses these. This is migration scaffolding —
subsequent diffs will replace SINGLE_SIMD_LEVEL usages with proper SL dispatch.

simd_result_handlers.h is no longer %include'd by SWIG (the templatized types
are unparseable by SWIG). make_knn_handler methods are %ignore'd. The Python
API does not use these internal SIMD handler types.

Pre-existing bug fixes bundled with this refactor:

  • simdlib_avx512.h: simd512bit::bin() stack buffer overflow (char[257] -> char[513])
  • simdlib_avx2.h: simd256bit constructor used aligned _mm256_load_si256 instead
    of unaligned _mm256_loadu_si256
  • All platform headers: simd16uint16/simd32uint8 operator+=/operator-= returned
    by value instead of by reference

Static builds: zero performance change. Template specializations produce
identical layout, ABI, and codegen as the old plain structs.

Differential Revision: D95392150

@meta-cla meta-cla bot added the CLA Signed label Mar 5, 2026
algoriddle added a commit to algoriddle/faiss that referenced this pull request Mar 6, 2026
Summary:

Templatize all simd wrapper types (simd16uint16, simd32uint8, simd8float32,
etc.) on SIMDLevel. This is the foundation for PQ4 fast scan Dynamic Dispatch.

Primary templates are declared in simdlib.h. Each platform header provides
explicit specializations:
- simdlib_avx2.h: simd16uint16<AVX2>, simd32uint8<AVX2>, etc.
- simdlib_avx512.h: simd32uint16<AVX512>, simd64uint8<AVX512>, etc.
- simdlib_neon.h: simd16uint16<ARM_NEON>, etc.
- simdlib_emulated.h: simd16uint16<NONE>, etc. (always included)
- simdlib_ppc64.h: simd16uint16<NONE>, etc. (PPC-optimized scalar)

SINGLE_SIMD_LEVEL (inline constexpr in simd_levels.h) resolves to NONE in DD
mode and to the compiled-in level in static mode. SINGLE_SIMD_LEVEL_256 maps
through simd256_level_selector for 256-bit types (AVX512->AVX2, SVE->NEON).
Code without explicit SL context uses these. This is migration scaffolding —
subsequent diffs will replace SINGLE_SIMD_LEVEL usages with proper SL dispatch.

simd_result_handlers.h is no longer %include'd by SWIG (the templatized types
are unparseable by SWIG). make_knn_handler methods are %ignore'd. The Python
API does not use these internal SIMD handler types.

Pre-existing bug fixes bundled with this refactor:
- simdlib_avx512.h: simd512bit::bin() stack buffer overflow (char[257] -> char[513])
- simdlib_avx2.h: simd256bit constructor used aligned _mm256_load_si256 instead
  of unaligned _mm256_loadu_si256
- All platform headers: simd16uint16/simd32uint8 operator+=/operator-= returned
  by value instead of by reference

Static builds: zero performance change. Template specializations produce
identical layout, ABI, and codegen as the old plain structs.

Differential Revision: D95392150
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync bot commented Mar 6, 2026

@algoriddle has exported this pull request. If you are a Meta employee, you can view the originating Diff in D95392150.

algoriddle added a commit to algoriddle/faiss that referenced this pull request Mar 6, 2026
Summary:

Templatize all simd wrapper types (simd16uint16, simd32uint8, simd8float32,
etc.) on SIMDLevel. This is the foundation for PQ4 fast scan Dynamic Dispatch.

Primary templates are declared in simdlib.h. Each platform header provides
explicit specializations:
- simdlib_avx2.h: simd16uint16<AVX2>, simd32uint8<AVX2>, etc.
- simdlib_avx512.h: simd32uint16<AVX512>, simd64uint8<AVX512>, etc.
- simdlib_neon.h: simd16uint16<ARM_NEON>, etc.
- simdlib_emulated.h: simd16uint16<NONE>, etc. (always included)
- simdlib_ppc64.h: simd16uint16<NONE>, etc. (PPC-optimized scalar)

SINGLE_SIMD_LEVEL (inline constexpr in simd_levels.h) resolves to NONE in DD
mode and to the compiled-in level in static mode. SINGLE_SIMD_LEVEL_256 maps
through simd256_level_selector for 256-bit types (AVX512->AVX2, SVE->NEON).
Code without explicit SL context uses these. This is migration scaffolding —
subsequent diffs will replace SINGLE_SIMD_LEVEL usages with proper SL dispatch.

simd_result_handlers.h is no longer %include'd by SWIG (the templatized types
are unparseable by SWIG). make_knn_handler methods are %ignore'd. The Python
API does not use these internal SIMD handler types.

Pre-existing bug fixes bundled with this refactor:
- simdlib_avx512.h: simd512bit::bin() stack buffer overflow (char[257] -> char[513])
- simdlib_avx2.h: simd256bit constructor used aligned _mm256_load_si256 instead
  of unaligned _mm256_loadu_si256
- All platform headers: simd16uint16/simd32uint8 operator+=/operator-= returned
  by value instead of by reference

Static builds: zero performance change. Template specializations produce
identical layout, ABI, and codegen as the old plain structs.

Differential Revision: D95392150
algoriddle added a commit to algoriddle/faiss that referenced this pull request Mar 6, 2026
Summary:

Templatize all simd wrapper types (simd16uint16, simd32uint8, simd8float32,
etc.) on SIMDLevel. This is the foundation for PQ4 fast scan Dynamic Dispatch.

Primary templates are declared in simdlib.h. Each platform header provides
explicit specializations:
- simdlib_avx2.h: simd16uint16<AVX2>, simd32uint8<AVX2>, etc.
- simdlib_avx512.h: simd32uint16<AVX512>, simd64uint8<AVX512>, etc.
- simdlib_neon.h: simd16uint16<ARM_NEON>, etc.
- simdlib_emulated.h: simd16uint16<NONE>, etc. (always included)
- simdlib_ppc64.h: simd16uint16<NONE>, etc. (PPC-optimized scalar)

SINGLE_SIMD_LEVEL (inline constexpr in simd_levels.h) resolves to NONE in DD
mode and to the compiled-in level in static mode. SINGLE_SIMD_LEVEL_256 maps
through simd256_level_selector for 256-bit types (AVX512->AVX2, SVE->NEON).
Code without explicit SL context uses these. This is migration scaffolding —
subsequent diffs will replace SINGLE_SIMD_LEVEL usages with proper SL dispatch.

simd_result_handlers.h is no longer %include'd by SWIG (the templatized types
are unparseable by SWIG). make_knn_handler methods are %ignore'd. The Python
API does not use these internal SIMD handler types.

Pre-existing bug fixes bundled with this refactor:
- simdlib_avx512.h: simd512bit::bin() stack buffer overflow (char[257] -> char[513])
- simdlib_avx2.h: simd256bit constructor used aligned _mm256_load_si256 instead
  of unaligned _mm256_loadu_si256
- All platform headers: simd16uint16/simd32uint8 operator+=/operator-= returned
  by value instead of by reference

Static builds: zero performance change. Template specializations produce
identical layout, ABI, and codegen as the old plain structs.

Differential Revision: D95392150
algoriddle added a commit to algoriddle/faiss that referenced this pull request Mar 6, 2026
Summary:
Pull Request resolved: facebookresearch#4866

Templatize all simd wrapper types (simd16uint16, simd32uint8, simd8float32,
etc.) on SIMDLevel. This is the foundation for PQ4 fast scan Dynamic Dispatch.

Primary templates are declared in simdlib.h. Each platform header provides
explicit specializations:
- simdlib_avx2.h: simd16uint16<AVX2>, simd32uint8<AVX2>, etc.
- simdlib_avx512.h: simd32uint16<AVX512>, simd64uint8<AVX512>, etc.
- simdlib_neon.h: simd16uint16<ARM_NEON>, etc.
- simdlib_emulated.h: simd16uint16<NONE>, etc. (always included)
- simdlib_ppc64.h: simd16uint16<NONE>, etc. (PPC-optimized scalar)

SINGLE_SIMD_LEVEL (inline constexpr in simd_levels.h) resolves to NONE in DD
mode and to the compiled-in level in static mode. SINGLE_SIMD_LEVEL_256 maps
through simd256_level_selector for 256-bit types (AVX512->AVX2, SVE->NEON).
Code without explicit SL context uses these. This is migration scaffolding —
subsequent diffs will replace SINGLE_SIMD_LEVEL usages with proper SL dispatch.

simd_result_handlers.h is no longer %include'd by SWIG (the templatized types
are unparseable by SWIG). make_knn_handler methods are %ignore'd. The Python
API does not use these internal SIMD handler types.

Pre-existing bug fixes bundled with this refactor:
- simdlib_avx512.h: simd512bit::bin() stack buffer overflow (char[257] -> char[513])
- simdlib_avx2.h: simd256bit constructor used aligned _mm256_load_si256 instead
  of unaligned _mm256_loadu_si256
- All platform headers: simd16uint16/simd32uint8 operator+=/operator-= returned
  by value instead of by reference

Static builds: zero performance change. Template specializations produce
identical layout, ABI, and codegen as the old plain structs.

Differential Revision: D95392150
@algoriddle algoriddle force-pushed the export-D95392150 branch 2 times, most recently from 1fccad8 to aa5bbc4 Compare March 8, 2026 13:35
algoriddle added a commit to algoriddle/faiss that referenced this pull request Mar 8, 2026
Summary:
Pull Request resolved: facebookresearch#4866

Templatize all simd wrapper types (simd16uint16, simd32uint8, simd8float32,
etc.) on SIMDLevel. This is the foundation for PQ4 fast scan Dynamic Dispatch.

Primary templates are declared in simdlib.h. Each platform header provides
explicit specializations:
- simdlib_avx2.h: simd16uint16<AVX2>, simd32uint8<AVX2>, etc.
- simdlib_avx512.h: simd32uint16<AVX512>, simd64uint8<AVX512>, etc.
- simdlib_neon.h: simd16uint16<ARM_NEON>, etc.
- simdlib_emulated.h: simd16uint16<NONE>, etc. (always included)
- simdlib_ppc64.h: simd16uint16<NONE>, etc. (PPC-optimized scalar)

SINGLE_SIMD_LEVEL (inline constexpr in simd_levels.h) resolves to NONE in DD
mode and to the compiled-in level in static mode. SINGLE_SIMD_LEVEL_256 maps
through simd256_level_selector for 256-bit types (AVX512->AVX2, SVE->NEON).
Code without explicit SL context uses these. This is migration scaffolding —
subsequent diffs will replace SINGLE_SIMD_LEVEL usages with proper SL dispatch.

simd_result_handlers.h is no longer %include'd by SWIG (the templatized types
are unparseable by SWIG). make_knn_handler methods are %ignore'd. The Python
API does not use these internal SIMD handler types.

Pre-existing bug fixes bundled with this refactor:
- simdlib_avx512.h: simd512bit::bin() stack buffer overflow (char[257] -> char[513])
- simdlib_avx2.h: simd256bit constructor used aligned _mm256_load_si256 instead
  of unaligned _mm256_loadu_si256
- All platform headers: simd16uint16/simd32uint8 operator+=/operator-= returned
  by value instead of by reference

Static builds: zero performance change. Template specializations produce
identical layout, ABI, and codegen as the old plain structs.

Reviewed By: mdouze

Differential Revision: D95392150
algoriddle added a commit to algoriddle/faiss that referenced this pull request Mar 8, 2026
Summary:
Pull Request resolved: facebookresearch#4866

Templatize all simd wrapper types (simd16uint16, simd32uint8, simd8float32,
etc.) on SIMDLevel. This is the foundation for PQ4 fast scan Dynamic Dispatch.

Primary templates are declared in simdlib.h. Each platform header provides
explicit specializations:
- simdlib_avx2.h: simd16uint16<AVX2>, simd32uint8<AVX2>, etc.
- simdlib_avx512.h: simd32uint16<AVX512>, simd64uint8<AVX512>, etc.
- simdlib_neon.h: simd16uint16<ARM_NEON>, etc.
- simdlib_emulated.h: simd16uint16<NONE>, etc. (always included)
- simdlib_ppc64.h: simd16uint16<NONE>, etc. (PPC-optimized scalar)

SINGLE_SIMD_LEVEL (inline constexpr in simd_levels.h) resolves to NONE in DD
mode and to the compiled-in level in static mode. SINGLE_SIMD_LEVEL_256 maps
through simd256_level_selector for 256-bit types (AVX512->AVX2, SVE->NEON).
Code without explicit SL context uses these. This is migration scaffolding —
subsequent diffs will replace SINGLE_SIMD_LEVEL usages with proper SL dispatch.

simd_result_handlers.h is no longer %include'd by SWIG (the templatized types
are unparseable by SWIG). make_knn_handler methods are %ignore'd. The Python
API does not use these internal SIMD handler types.

Pre-existing bug fixes bundled with this refactor:
- simdlib_avx512.h: simd512bit::bin() stack buffer overflow (char[257] -> char[513])
- simdlib_avx2.h: simd256bit constructor used aligned _mm256_load_si256 instead
  of unaligned _mm256_loadu_si256
- All platform headers: simd16uint16/simd32uint8 operator+=/operator-= returned
  by value instead of by reference

Static builds: zero performance change. Template specializations produce
identical layout, ABI, and codegen as the old plain structs.

Reviewed By: mdouze

Differential Revision: D95392150
Summary:
Pull Request resolved: facebookresearch#4866

Templatize all simd wrapper types (simd16uint16, simd32uint8, simd8float32,
etc.) on SIMDLevel. This is the foundation for PQ4 fast scan Dynamic Dispatch.

Primary templates are declared in simdlib.h. Each platform header provides
explicit specializations:
- simdlib_avx2.h: simd16uint16<AVX2>, simd32uint8<AVX2>, etc.
- simdlib_avx512.h: simd32uint16<AVX512>, simd64uint8<AVX512>, etc.
- simdlib_neon.h: simd16uint16<ARM_NEON>, etc.
- simdlib_emulated.h: simd16uint16<NONE>, etc. (always included)
- simdlib_ppc64.h: simd16uint16<NONE>, etc. (PPC-optimized scalar)

SINGLE_SIMD_LEVEL (inline constexpr in simd_levels.h) resolves to NONE in DD
mode and to the compiled-in level in static mode. SINGLE_SIMD_LEVEL_256 maps
through simd256_level_selector for 256-bit types (AVX512->AVX2, SVE->NEON).
Code without explicit SL context uses these. This is migration scaffolding —
subsequent diffs will replace SINGLE_SIMD_LEVEL usages with proper SL dispatch.

simd_result_handlers.h is no longer %include'd by SWIG (the templatized types
are unparseable by SWIG). make_knn_handler methods are %ignore'd. The Python
API does not use these internal SIMD handler types.

Pre-existing bug fixes bundled with this refactor:
- simdlib_avx512.h: simd512bit::bin() stack buffer overflow (char[257] -> char[513])
- simdlib_avx2.h: simd256bit constructor used aligned _mm256_load_si256 instead
  of unaligned _mm256_loadu_si256
- All platform headers: simd16uint16/simd32uint8 operator+=/operator-= returned
  by value instead of by reference

Static builds: zero performance change. Template specializations produce
identical layout, ABI, and codegen as the old plain structs.

Reviewed By: mdouze

Differential Revision: D95392150
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync bot commented Mar 9, 2026

This pull request has been merged in 8d8268c.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants