Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Kokkos_SIMD with AVX2 on 64-bit architectures #6075

Merged
merged 2 commits into from
Apr 26, 2023

Conversation

masterleinad
Copy link
Contributor

@masterleinad masterleinad commented Apr 21, 2023

Fixes #6070. For better test coverage (and convenience for the user?), this pull request also enables the avx2 overloads if __AVX2__ is defined (which should be the case when the user sets Kokkos_ARCH_AVX2 but also for Kokkos_ARCH_NATIVE=ON).

@masterleinad masterleinad marked this pull request as ready for review April 22, 2023 13:50
@ibaned
Copy link
Contributor

ibaned commented Apr 24, 2023

Does AVX2 get enabled if the user sets Kokkos_ARCH_BDW=ON?

@masterleinad
Copy link
Contributor Author

Does AVX2 get enabled if the user sets Kokkos_ARCH_BDW=ON?

Yes,

IF (KOKKOS_ARCH_BDW)
SET(KOKKOS_ARCH_AVX2 ON)
COMPILER_SPECIFIC_FLAGS(
COMPILER_ID KOKKOS_CXX_HOST_COMPILER_ID
Cray NO-VALUE-SPECIFIED
Intel -xCORE-AVX2
MSVC /arch:AVX2
NVHPC -tp=haswell
DEFAULT -march=core-avx2 -mtune=core-avx2 -mrtm
)
ENDIF()
.

simd/src/Kokkos_SIMD_AVX2.hpp Outdated Show resolved Hide resolved
Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this PR is about fixing a bug (_mm256_mask{load,store}_epi64 intrinsics expecting pointers to long long and us passing pointers to value_type) then drop the unrelated "convenience" changes.

@@ -680,6 +680,8 @@ template <>
class simd<std::int64_t, simd_abi::avx2_fixed_size<4>> {
__m256i m_value;

static_assert(sizeof(long long) == 8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you add that assertion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a minimal safeguard that the reinterpret_cast is reasonable.

@masterleinad
Copy link
Contributor Author

If this PR is about fixing a bug (_mm256_mask{load,store}_epi64 intrinsics expecting pointers to long long and us passing pointers to value_type) then drop the unrelated "convenience" changes.

Here you go!

@dalg24
Copy link
Member

dalg24 commented Apr 25, 2023

Why was this bug not caught before? Is there a gap in the testing (not necessarily the CI, presumably someone built on Broadwell before and it did not give a compile error)?

@ibaned
Copy link
Contributor

ibaned commented Apr 25, 2023

I have done Broadwell builds with no errors before this fix... I'm not sure if it is compiler or arch specific...

@masterleinad
Copy link
Contributor Author

I have done Broadwell builds with no errors before this fix... I'm not sure if it is compiler or arch specific...

My testing shows that this would only have worked on Linux 32-bit architectures.

@ibaned
Copy link
Contributor

ibaned commented Apr 25, 2023

maybe I never did un-masked loads of int64 vectors? Could be a gap in unit testing? Thanks for catching of course!

@dalg24
Copy link
Member

dalg24 commented Apr 25, 2023

I have done Broadwell builds with no errors before this fix... I'm not sure if it is compiler or arch specific...

My testing shows that this would only have worked on Linux 32-bit architectures.

@masterleinad meaning with the current develop you got a compile error? I am trying to determine whether we need to add unit tests (excluding the whether we need some more CI/nightly build)

@masterleinad
Copy link
Contributor Author

@masterleinad meaning with the current develop you got a compile error? I am trying to determine whether we need to add unit tests (excluding the whether we need some more CI/nightly build)

You see this with current develop and current tests but we don't test any host architectures in our GitHub or Jenkins CI (that's part of the reason why I am proposing to let Kokkos_ARCH_NATIVE enable SIMD intrinsic).

@dalg24
Copy link
Member

dalg24 commented Apr 25, 2023

You see this with current develop and current tests

OK

but we don't test any host architectures in our GitHub or Jenkins CI (that's part of the reason why I am proposing to let Kokkos_ARCH_NATIVE enable SIMD intrinsic).

Understood but this must be handled elsewhere.

@dalg24
Copy link
Member

dalg24 commented Apr 25, 2023

this must be handled elsewhere.

To elaborate on this a bit more, whether to enable based on compiler defines rather than user-provided configure options is a question that we need to examine with care and it is beyond the scope of this PR. I am not opposed to it, it is actually in my backlog, I was considering getting rid of that macro along with others such as KOKKOS_ARCH_AMDAVX, KOKKOS_ARCH_AMD_AVX2, KOKKOS_ARCH_AVX512MIC, or KOKKOS_ARCH_AVX512XEON.

@dalg24
Copy link
Member

dalg24 commented Apr 26, 2023

Retest this please

@dalg24 dalg24 merged commit 4b27b7d into kokkos:develop Apr 26, 2023
25 checks passed
@masterleinad masterleinad mentioned this pull request May 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants