Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #4942 Scratch Allocation Alignment #5687

Merged
merged 13 commits into from
Jan 5, 2023
Merged

Conversation

crtrott
Copy link
Member

@crtrott crtrott commented Dec 15, 2022

This fixes the scratch space allocation alignment, #4942.
Note there is a default over alignment used by views. This makes sure that in the mode in which we run with 64bit shared memory banks that you don't have two conflicting scratch views resulting in really bad bank conflicts.

That default alignment was always there, and our scratch allocation mechanism was now doing a weird (faulty) mix of default alignment and requested alignment. I cleaned that up in a way that get_shmem doesn't do any alignment, get_aligned_shmem uses only the provided alignment in the argument, and Kokkos::View takes the default alignment into account for its scratch allocations.

core/src/Kokkos_ScratchSpace.hpp Outdated Show resolved Hide resolved
core/src/Kokkos_ScratchSpace.hpp Outdated Show resolved Hide resolved
core/src/Kokkos_View.hpp Outdated Show resolved Hide resolved
@PhilMiller
Copy link
Contributor

I edited the description slightly. Please check that my change was accurate to your meaning

@crtrott crtrott changed the title Fix #4942 Fix #4942 Scratch Allocation Alignment Dec 15, 2022
core/src/Kokkos_ScratchSpace.hpp Show resolved Hide resolved
core/src/Kokkos_ScratchSpace.hpp Outdated Show resolved Hide resolved
core/src/Kokkos_View.hpp Outdated Show resolved Hide resolved
core/src/Kokkos_View.hpp Outdated Show resolved Hide resolved
core/src/Kokkos_View.hpp Outdated Show resolved Hide resolved
@crtrott crtrott mentioned this pull request Dec 15, 2022
Also fix some comment typos pointed out in review
@PhilMiller
Copy link
Contributor

HPX failure is salient - it's in scratch allocation.

Comment on lines 1643 to 1646
::Kokkos::max(::Kokkos::max(sizeof(typename traits::value_type),
alignof(typename traits::value_type)),
static_cast<size_t>(
traits::execution_space::scratch_memory_space::ALIGN));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
::Kokkos::max(::Kokkos::max(sizeof(typename traits::value_type),
alignof(typename traits::value_type)),
static_cast<size_t>(
traits::execution_space::scratch_memory_space::ALIGN));
max({sizeof(typename traits::value_type),
alignof(typename traits::value_type),
static_cast<size_t>(
traits::execution_space::scratch_memory_space::ALIGN)});

No need to qualify the call and there is an overload that takes std::initializer_list<T>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually there was an ambiguity with the unqualified cmath max function without it. But I guess that won't hit the multi argument one.

core/unit_test/TestTeam.hpp Outdated Show resolved Hide resolved
@crtrott crtrott added the Blocks Promotion Overview issue for release-blocking bugs label Dec 21, 2022
@crtrott
Copy link
Member Author

crtrott commented Dec 23, 2022

Retest this please

core/unit_test/TestTeam.hpp Outdated Show resolved Hide resolved
core/unit_test/TestTeam.hpp Outdated Show resolved Hide resolved
core/src/Kokkos_ScratchSpace.hpp Outdated Show resolved Hide resolved
core/src/Kokkos_ScratchSpace.hpp Outdated Show resolved Hide resolved
core/src/Kokkos_ScratchSpace.hpp Show resolved Hide resolved
Copy link
Contributor

@PhilMiller PhilMiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll reiterate that I don't think the UB concerns are actually pressing, unless we have a reason to be especially scrupulous in this piece of the codebase.

auto m_iter_old = m_iter;
if constexpr (alignment_requested) {
const ptrdiff_t missalign = size_t(m_iter) % alignment;
if (missalign) m_iter += alignment - missalign;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry to do this to you, but I just noticed, this has the same potential for overflow to evoke UB.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just accept this here, its starting to get pretty complicated otherwise, since we also need to check that for example m_end - alignment + missalign is not going below m_start. That is actually also a problem in the other place where we compare m_iter to m_end - size or whatever it was. Because the expression m_end - size may be UB already.

// This is each thread's start pointer for its allocation
// Note: for team scratch m_offset is 0, since every
// thread will get back the same shared pointer
void* tmp = m_iter + m_offset * size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And so does this

@crtrott
Copy link
Member Author

crtrott commented Jan 5, 2023

Also this PR does not make the situation worse with regard to UB.

Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good enough.
We can live a bit longer with UB and this is holding the 4.0 release.
I suggest we merge as in and deal with the issues pointed out by Phil in a follow up PR.

Copy link
Contributor

@PhilMiller PhilMiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with moving this ahead, and addressing UB concerns in the pointer arithmetic later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocks Promotion Overview issue for release-blocking bugs CHANGELOG Item to be included in release CHANGELOG
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cuda backend fails to allocate a single integer in the PerTeam scratch space.
4 participants