simd: allow wider vector width for 32 bit types #6802

ldh4 · 2024-02-10T00:00:59Z

Currently in Kokkos SIMD, all simd types use a uniformly set size per simd backend. This size is determined by the size of the largest simd register available in a simd backend divided by 64 (bit). However, 32 bit types can take advantage of this and pack twice as much of data than 64 bit types could in a given simd register.

This PR adds simd types with wider vector for:

AVX2: float, int32_t (size 8)
AVX512: float, int32_t, uint32_t (size 16)
NEON: float, int32_t (size 4)

crtrott · 2024-02-13T18:37:23Z

Windows CUDA wasn't passing because of restricitons of NVCC with MSVC. I think you may need to guard the test against Windows + CUDA

ldh4 · 2024-02-14T02:51:13Z

Modified the CMakeList.txt to prevent simd unit test files from building in Windows+CUDA build.
Let's see if this makes the CI for Windows CUDA build pass.

masterleinad · 2024-02-14T16:33:30Z

Works on my Mac M1 with ARM_NEON.

ldh4 · 2024-02-19T23:53:53Z

Retest this please.

masterleinad · 2024-05-15T19:31:55Z

The last CI results show

fatal error: error in backend: Instruction Combining seems stuck in an infinite loop after 1000 iterations.
clang-16: error: clang frontend command failed with exit code 70 (use -v to see invocation)
AMD clang version 16.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.6.0 23243 be997b2f3651a41597d7a41441fff8ade4ac59ac)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-5.6.0/llvm/bin

for HIP-ROCm-5.6-C++20. Do you need a workaround similar to #6449?

simd/unit_tests/CMakeLists.txt

masterleinad · 2024-05-16T19:46:55Z

simd/unit_tests/include/SIMDTesting_Utilities.hpp

+template <typename T>
+constexpr bool is_type_v<T, decltype(void(sizeof(T)))> = true;


What is this used for? Would you mind adding some comments?

This is to loosely check that the type T is a complete type.
Not all data types can be paired with an abi type with an extended vector width. But because of how abi_set and data_type_set are currently defined and used in tests, this check is simply used to skip compiling tests for those datatype+abi pairs that are not defined.

Would you mind adding a comment to that effect?

Added a comment explaining the use case of these structs.

ldh4

The last CI results show

fatal error: error in backend: Instruction Combining seems stuck in an infinite loop after 1000 iterations.
clang-16: error: clang frontend command failed with exit code 70 (use -v to see invocation)
AMD clang version 16.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.6.0 23243 be997b2f3651a41597d7a41441fff8ade4ac59ac)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-5.6.0/llvm/bin

for HIP-ROCm-5.6-C++20. Do you need a workaround similar to #6449?

Possibly, although neither _mm_maskload_epi32 nor _mm256_maskload_epi64 was used in this PR. I'm not sure which intrinsics are causing this issue, but I'll see if the same error occurs with a rebase.

simd/unit_tests/CMakeLists.txt

ldh4 · 2024-05-17T02:05:20Z

simd/unit_tests/include/SIMDTesting_Utilities.hpp

+template <typename T>
+constexpr bool is_type_v<T, decltype(void(sizeof(T)))> = true;


This is to loosely check that the type T is a complete type.
Not all data types can be paired with an abi type with an extended vector width. But because of how abi_set and data_type_set are currently defined and used in tests, this check is simply used to skip compiling tests for those datatype+abi pairs that are not defined.

ldh4 · 2024-05-17T23:39:19Z

It seems like rocm 5.6-6.0 can't compile _mm256_castsi256_ps when used outside of constructors. Applying the same treatment as #6449 and converted to use _mm256_cvtepi32_ps instead for rocm 5.6-6.0 builds.

masterleinad

Apart from #6802 (comment), this looks OK to me skimming through the implementation details. I haven't checked that all the intrinsic used are actually correct, though.

simd/src/Kokkos_SIMD_AVX2.hpp

masterleinad

Looks OK to me.

simd/src/Kokkos_SIMD_AVX512.hpp

ldh4 · 2024-06-26T00:26:07Z

Retest this please.

ldh4 force-pushed the simd_use_larger_vec_width branch from 3783385 to 1130e8b Compare February 10, 2024 00:07

ajpowelsnl mentioned this pull request Feb 28, 2024

Release Themes for 2024 #6804

Open

masterleinad reviewed May 16, 2024

View reviewed changes

ldh4 force-pushed the simd_use_larger_vec_width branch from a427852 to 4bb4a0e Compare May 17, 2024 02:08

ldh4 commented May 17, 2024

View reviewed changes

masterleinad reviewed May 21, 2024

View reviewed changes

simd/src/Kokkos_SIMD_AVX2.hpp Outdated Show resolved Hide resolved

masterleinad approved these changes May 24, 2024

View reviewed changes

Rombur reviewed Jun 5, 2024

View reviewed changes

simd/src/Kokkos_SIMD_AVX512.hpp Show resolved Hide resolved

Rombur approved these changes Jun 19, 2024

View reviewed changes

ldh4 added 8 commits June 26, 2024 10:31

Added width 8 abi for avx2

e320e00

Added for AVX512

aa83357

Added for width 4 for NEON

e02c6a3

clang-formatted

61de582

Disabling simd unit tests from building for Windows+CUDA build

1eb1abe

Workaround for the compilation failure for rocm 5.6-6.0

6e167f2

Added a comment about is_type structs

4d1278e

clang formating

b650199

ldh4 force-pushed the simd_use_larger_vec_width branch from 66b3b68 to b650199 Compare June 26, 2024 16:31

crtrott merged commit f562ca2 into kokkos:develop Jul 2, 2024
28 of 29 checks passed

ndellingwood mentioned this pull request Jul 3, 2024

Nightly test failure, Kokkos_UnitTest_SIMD simd.host_math_ops fail with intel/2023.2 icpc built for SKX #7111

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

simd: allow wider vector width for 32 bit types #6802

simd: allow wider vector width for 32 bit types #6802

ldh4 commented Feb 10, 2024

crtrott commented Feb 13, 2024

ldh4 commented Feb 14, 2024

masterleinad commented Feb 14, 2024

ldh4 commented Feb 19, 2024

masterleinad commented May 15, 2024

masterleinad May 16, 2024

ldh4 May 17, 2024 •

edited

Loading

masterleinad May 20, 2024

ldh4 May 23, 2024

ldh4 left a comment

ldh4 May 17, 2024 •

edited

Loading

ldh4 commented May 17, 2024

masterleinad left a comment

masterleinad left a comment

ldh4 commented Jun 26, 2024

		template <typename T>
		constexpr bool is_type_v<T, decltype(void(sizeof(T)))> = true;

simd: allow wider vector width for 32 bit types #6802

simd: allow wider vector width for 32 bit types #6802

Conversation

ldh4 commented Feb 10, 2024

crtrott commented Feb 13, 2024

ldh4 commented Feb 14, 2024

masterleinad commented Feb 14, 2024

ldh4 commented Feb 19, 2024

masterleinad commented May 15, 2024

masterleinad May 16, 2024

Choose a reason for hiding this comment

ldh4 May 17, 2024 • edited Loading

Choose a reason for hiding this comment

masterleinad May 20, 2024

Choose a reason for hiding this comment

ldh4 May 23, 2024

Choose a reason for hiding this comment

ldh4 left a comment

Choose a reason for hiding this comment

ldh4 May 17, 2024 • edited Loading

Choose a reason for hiding this comment

ldh4 commented May 17, 2024

masterleinad left a comment

Choose a reason for hiding this comment

masterleinad left a comment

Choose a reason for hiding this comment

ldh4 commented Jun 26, 2024

ldh4 May 17, 2024 •

edited

Loading

ldh4 May 17, 2024 •

edited

Loading