-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simd: allow wider vector width for 32 bit types #6802
Conversation
3783385
to
1130e8b
Compare
Windows CUDA wasn't passing because of restricitons of NVCC with MSVC. I think you may need to guard the test against Windows + CUDA |
Modified the |
Works on my Mac M1 with |
Retest this please. |
The last CI results show
for HIP-ROCm-5.6-C++20. Do you need a workaround similar to #6449? |
template <typename T> | ||
constexpr bool is_type_v<T, decltype(void(sizeof(T)))> = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this used for? Would you mind adding some comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to loosely check that the type T
is a complete type.
Not all data types can be paired with an abi type with an extended vector width. But because of how abi_set
and data_type_set
are currently defined and used in tests, this check is simply used to skip compiling tests for those datatype+abi pairs that are not defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind adding a comment to that effect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment explaining the use case of these structs.
a427852
to
4bb4a0e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last CI results show
fatal error: error in backend: Instruction Combining seems stuck in an infinite loop after 1000 iterations. clang-16: error: clang frontend command failed with exit code 70 (use -v to see invocation) AMD clang version 16.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.6.0 23243 be997b2f3651a41597d7a41441fff8ade4ac59ac) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /opt/rocm-5.6.0/llvm/bin
for HIP-ROCm-5.6-C++20. Do you need a workaround similar to #6449?
Possibly, although neither _mm_maskload_epi32
nor _mm256_maskload_epi64
was used in this PR. I'm not sure which intrinsics are causing this issue, but I'll see if the same error occurs with a rebase.
template <typename T> | ||
constexpr bool is_type_v<T, decltype(void(sizeof(T)))> = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to loosely check that the type T
is a complete type.
Not all data types can be paired with an abi type with an extended vector width. But because of how abi_set
and data_type_set
are currently defined and used in tests, this check is simply used to skip compiling tests for those datatype+abi pairs that are not defined.
It seems like rocm 5.6-6.0 can't compile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from #6802 (comment), this looks OK to me skimming through the implementation details. I haven't checked that all the intrinsic used are actually correct, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK to me.
Retest this please. |
66b3b68
to
b650199
Compare
Currently in Kokkos SIMD, all simd types use a uniformly set size per simd backend. This size is determined by the size of the largest simd register available in a simd backend divided by 64 (bit). However, 32 bit types can take advantage of this and pack twice as much of data than 64 bit types could in a given simd register.
This PR adds simd types with wider vector for:
AVX2
:float
,int32_t
(size 8)AVX512
:float
,int32_t
,uint32_t
(size 16)NEON
:float
,int32_t
(size 4)