Skip to content

Added AVX10_2 and AVX10_2_512 targets#2395

Merged
copybara-service[bot] merged 1 commit intogoogle:masterfrom
johnplatts:hwy_avx10_120124
Dec 2, 2024
Merged

Added AVX10_2 and AVX10_2_512 targets#2395
copybara-service[bot] merged 1 commit intogoogle:masterfrom
johnplatts:hwy_avx10_120124

Conversation

@johnplatts
Copy link
Copy Markdown
Contributor

@johnplatts johnplatts commented Dec 1, 2024

Added preliminary support for the HWY_AVX10_2 (AVX10.2 with 256-bit vectors) and HWY_AVX10_2_512 (AVX10.2 with 512-bit vectors) targets.

Added CPUID detection for AVX10.1 and AVX10.2 support in hwy/targets.cc.

Also added a new hwy/ops/x86_avx3-inl.h header, and moved some of the AVX3/AVX10-specific ops that have dependencies on hwy/ops/x86_512-inl.h if HWY_MAX_BYTES == 64 into the new hwy/ops/x86_avx3-inl.h.

Also moved some of the AVX3/AVX3_DL-specific ops that operate on 256-bit or smaller vectors into hwy/ops/x86_128-inl.h and hwy/ops/x86_256-inl.h to support AVX10.2 targets that do not support 512-bit vectors.

Also refactored some of the AVX3 target macros in hwy/ops/set_macros-inl.h as follows:

  • Added a new HWY_TARGET_STR_AVX3_VL512 macro that expands to ",evex512" for GCC/Clang versions that support the -mevex512 option, and is otherwise defined as an empty macro
  • Refactored the HWY_TARGET_STR_AVX3 macro to HWY_TARGET_STR_AVX3_256 (which does not include ",evex512" or ",no-evex512") and HWY_TARGET_STR_AVX3 (which includes ",evex512" if compiling with a GCC/Clang version that supports the "-mevex512" option)
  • Refactored the HWY_TARGET_STR_AVX3_DL macro to HWY_TARGET_STR_AVX3_DL_256 (which does not include ",evex512" or ",no-evex512") and HWY_TARGET_STR_AVX3_DL (which includes ",evex512" if compiling with a GCC/Clang version that supports the -mevex512 option)
  • Refactored the HWY_TARGET_STR_AVX3_ZEN4 macro to HWY_TARGET_STR_AVX3_ZEN4_256 (which does not include ",evex512" or ",no-evex512") and HWY_TARGET_STR_AVX3_ZEN4 (which includes ",evex512" if compiling with a GCC/Clang version that supports the -mevex512 option)
  • Refactored the HWY_TARGET_STR_AVX3_SPR macro to HWY_TARGET_STR_AVX3_SPR_256 (which does not include ",evex512" or ",no-evex512") and HWY_TARGET_STR_AVX3_SPR (which includes ",evex512" if compiling with a GCC/Clang version that supports the -mevex512 option)

To compile and run the Google Highway unit tests for the HWY_AVX10_2 and HWY_AVX10_2_512 targets, Clang 19 or later and Intel SDE 9.44 or later are needed.

There are some compilation issues with compiling the HWY_AVX10_2 and HWY_AVX10_2_512 with Clang 18 and GCC 14, even with both Clang 18 and GCC 14 supporting the -mno-evex512 option, including a compiler crash when compiling the matvec_test.cc for the HWY_AVX10_2 target with Clang 18 and compiler warnings that are emitted by GCC 14 when casting a int16_t to __bf16 when compiling with -march=sapphirerapids.

The HWY_AVX10_2 and HWY_AVX10_2_512 are not included in HWY_ATTAINABLE_TARGETS_X86 by default due to compiler issues with GCC or Clang 18.

@johnplatts
Copy link
Copy Markdown
Contributor Author

Here is the CMake command line to configure to build only the HWY_AVX10_2 target (but not any other targets) with Clang 19 (where <path_to_hwy_src> is substituted with the path to the Google Highway sources <path_to_sde_directory> is substituted with the directory containing the sde64 executable):
CC=clang-19 CXX=clang++-19 cmake <path_to_hwy_src> -DCMAKE_CXX_FLAGS='-march=sapphirerapids -mno-evex512 -DHWY_BASELINE_TARGETS=0x20 -DHWY_ATTAINABLE_TARGETS_X86=0x20 -DHWY_DISABLED_TARGETS=0x6000000000007FDF -DHWY_COMPILE_ONLY_STATIC=1' -DHWY_ENABLE_CONTRIB=OFF -DCMAKE_CXX_STANDARD=17 -DHWY_WARNINGS_ARE_ERRORS=ON -DCMAKE_CROSSCOMPILING_EMULATOR='<path_to_sde_directory>/sde64;-future;--'

Here is the CMake command line to configure to build only the HWY_AVX10_2_512 target (but not any other targets) with Clang 19 (where <path_to_hwy_src> is substituted with the path to the Google Highway sources <path_to_sde_directory> is substituted with the directory containing the sde64 executable):
CC=clang-19 CXX=clang++-19 cmake <path_to_hwy_src> -DCMAKE_CXX_FLAGS='-march=sapphirerapids -mevex512 -DHWY_BASELINE_TARGETS=0x08 -DHWY_ATTAINABLE_TARGETS_X86=0x08 -DHWY_DISABLED_TARGETS=0x6000000000007FF7 -DHWY_COMPILE_ONLY_STATIC=1' -DHWY_ENABLE_CONTRIB=OFF -DCMAKE_CXX_STANDARD=17 -DHWY_WARNINGS_ARE_ERRORS=ON -DCMAKE_CROSSCOMPILING_EMULATOR='<path_to_sde_directory>/sde64;-future;--'

Here is the CMake command line to configure to build the HWY_AVX10_2 and HWY_AVX10_2_512 targets (but not any other targets) with Clang 19 (where <path_to_hwy_src> is substituted with the path to the Google Highway sources <path_to_sde_directory> is substituted with the directory containing the sde64 executable):
CC=clang-19 CXX=clang++-19 cmake <path_to_hwy_src> -DCMAKE_CXX_FLAGS='-march=sapphirerapids -mno-evex512 -DHWY_BASELINE_TARGETS=0x20 -DHWY_ATTAINABLE_TARGETS_X86=0x28 -DHWY_DISABLED_TARGETS=0x6000000000007FD7' -DHWY_ENABLE_CONTRIB=OFF -DCMAKE_CXX_STANDARD=17 -DHWY_WARNINGS_ARE_ERRORS=ON -DCMAKE_CROSSCOMPILING_EMULATOR='<path_to_sde_directory>/sde64;-future;--'

Copy link
Copy Markdown
Member

@jan-wassenberg jan-wassenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Thanks for adding this already. Just one small typo, but we'll go ahead and try to import/run CI in case other changes are required.

Comment thread hwy/ops/x86_avx3-inl.h
// includes base.h and shared-inl.h.
#include "hwy/ops/x86_256-inl.h"
#else
// For AVX3/AVX10 targets that support 512-byte vectors. Already includes base.h
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

byte -> bit :)

@copybara-service copybara-service Bot merged commit b13a46f into google:master Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants