-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #4942 Scratch Allocation Alignment #5687
Conversation
I edited the description slightly. Please check that my change was accurate to your meaning |
Also fix some comment typos pointed out in review
HPX failure is salient - it's in scratch allocation. |
core/src/Kokkos_View.hpp
Outdated
::Kokkos::max(::Kokkos::max(sizeof(typename traits::value_type), | ||
alignof(typename traits::value_type)), | ||
static_cast<size_t>( | ||
traits::execution_space::scratch_memory_space::ALIGN)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
::Kokkos::max(::Kokkos::max(sizeof(typename traits::value_type), | |
alignof(typename traits::value_type)), | |
static_cast<size_t>( | |
traits::execution_space::scratch_memory_space::ALIGN)); | |
max({sizeof(typename traits::value_type), | |
alignof(typename traits::value_type), | |
static_cast<size_t>( | |
traits::execution_space::scratch_memory_space::ALIGN)}); |
No need to qualify the call and there is an overload that takes std::initializer_list<T>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually there was an ambiguity with the unqualified cmath max function without it. But I guess that won't hit the multi argument one.
Retest this please |
Can't increment a pointer past the end it points too ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll reiterate that I don't think the UB concerns are actually pressing, unless we have a reason to be especially scrupulous in this piece of the codebase.
auto m_iter_old = m_iter; | ||
if constexpr (alignment_requested) { | ||
const ptrdiff_t missalign = size_t(m_iter) % alignment; | ||
if (missalign) m_iter += alignment - missalign; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry to do this to you, but I just noticed, this has the same potential for overflow to evoke UB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just accept this here, its starting to get pretty complicated otherwise, since we also need to check that for example m_end - alignment + missalign is not going below m_start. That is actually also a problem in the other place where we compare m_iter to m_end - size or whatever it was. Because the expression m_end - size may be UB already.
// This is each thread's start pointer for its allocation | ||
// Note: for team scratch m_offset is 0, since every | ||
// thread will get back the same shared pointer | ||
void* tmp = m_iter + m_offset * size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And so does this
Also this PR does not make the situation worse with regard to UB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good enough.
We can live a bit longer with UB and this is holding the 4.0 release.
I suggest we merge as in and deal with the issues pointed out by Phil in a follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with moving this ahead, and addressing UB concerns in the pointer arithmetic later
This fixes the scratch space allocation alignment, #4942.
Note there is a default over alignment used by views. This makes sure that in the mode in which we run with 64bit shared memory banks that you don't have two conflicting scratch views resulting in really bad bank conflicts.
That default alignment was always there, and our scratch allocation mechanism was now doing a weird (faulty) mix of default alignment and requested alignment. I cleaned that up in a way that
get_shmem
doesn't do any alignment,get_aligned_shmem
uses only the provided alignment in the argument, andKokkos::View
takes the default alignment into account for its scratch allocations.