Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simd: allow wider vector width for 32 bit types #6802

Merged
merged 8 commits into from
Jul 2, 2024

Conversation

ldh4
Copy link
Contributor

@ldh4 ldh4 commented Feb 10, 2024

Currently in Kokkos SIMD, all simd types use a uniformly set size per simd backend. This size is determined by the size of the largest simd register available in a simd backend divided by 64 (bit). However, 32 bit types can take advantage of this and pack twice as much of data than 64 bit types could in a given simd register.

This PR adds simd types with wider vector for:

  • AVX2: float, int32_t (size 8)
  • AVX512: float, int32_t, uint32_t (size 16)
  • NEON: float, int32_t (size 4)

@crtrott
Copy link
Member

crtrott commented Feb 13, 2024

Windows CUDA wasn't passing because of restricitons of NVCC with MSVC. I think you may need to guard the test against Windows + CUDA

@ldh4
Copy link
Contributor Author

ldh4 commented Feb 14, 2024

Modified the CMakeList.txt to prevent simd unit test files from building in Windows+CUDA build.
Let's see if this makes the CI for Windows CUDA build pass.

@masterleinad
Copy link
Contributor

Works on my Mac M1 with ARM_NEON.

@ldh4
Copy link
Contributor Author

ldh4 commented Feb 19, 2024

Retest this please.

@masterleinad
Copy link
Contributor

The last CI results show

fatal error: error in backend: Instruction Combining seems stuck in an infinite loop after 1000 iterations.
clang-16: error: clang frontend command failed with exit code 70 (use -v to see invocation)
AMD clang version 16.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.6.0 23243 be997b2f3651a41597d7a41441fff8ade4ac59ac)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-5.6.0/llvm/bin

for HIP-ROCm-5.6-C++20. Do you need a workaround similar to #6449?

simd/unit_tests/CMakeLists.txt Show resolved Hide resolved
Comment on lines +187 to +192
template <typename T>
constexpr bool is_type_v<T, decltype(void(sizeof(T)))> = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this used for? Would you mind adding some comments?

Copy link
Contributor Author

@ldh4 ldh4 May 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to loosely check that the type T is a complete type.
Not all data types can be paired with an abi type with an extended vector width. But because of how abi_set and data_type_set are currently defined and used in tests, this check is simply used to skip compiling tests for those datatype+abi pairs that are not defined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding a comment to that effect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment explaining the use case of these structs.

@ldh4 ldh4 force-pushed the simd_use_larger_vec_width branch from a427852 to 4bb4a0e Compare May 17, 2024 02:08
Copy link
Contributor Author

@ldh4 ldh4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last CI results show

fatal error: error in backend: Instruction Combining seems stuck in an infinite loop after 1000 iterations.
clang-16: error: clang frontend command failed with exit code 70 (use -v to see invocation)
AMD clang version 16.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.6.0 23243 be997b2f3651a41597d7a41441fff8ade4ac59ac)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-5.6.0/llvm/bin

for HIP-ROCm-5.6-C++20. Do you need a workaround similar to #6449?

Possibly, although neither _mm_maskload_epi32 nor _mm256_maskload_epi64 was used in this PR. I'm not sure which intrinsics are causing this issue, but I'll see if the same error occurs with a rebase.

simd/unit_tests/CMakeLists.txt Show resolved Hide resolved
Comment on lines +187 to +192
template <typename T>
constexpr bool is_type_v<T, decltype(void(sizeof(T)))> = true;
Copy link
Contributor Author

@ldh4 ldh4 May 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to loosely check that the type T is a complete type.
Not all data types can be paired with an abi type with an extended vector width. But because of how abi_set and data_type_set are currently defined and used in tests, this check is simply used to skip compiling tests for those datatype+abi pairs that are not defined.

@ldh4
Copy link
Contributor Author

ldh4 commented May 17, 2024

It seems like rocm 5.6-6.0 can't compile _mm256_castsi256_ps when used outside of constructors. Applying the same treatment as #6449 and converted to use _mm256_cvtepi32_ps instead for rocm 5.6-6.0 builds.

Copy link
Contributor

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from #6802 (comment), this looks OK to me skimming through the implementation details. I haven't checked that all the intrinsic used are actually correct, though.

simd/src/Kokkos_SIMD_AVX2.hpp Outdated Show resolved Hide resolved
Copy link
Contributor

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK to me.

@ldh4
Copy link
Contributor Author

ldh4 commented Jun 26, 2024

Retest this please.

@ldh4 ldh4 force-pushed the simd_use_larger_vec_width branch from 66b3b68 to b650199 Compare June 26, 2024 16:31
@crtrott crtrott merged commit f562ca2 into kokkos:develop Jul 2, 2024
28 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants