-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move allocation profiling to the allocate/deallocate calls #3084
Move allocation profiling to the allocate/deallocate calls #3084
Conversation
core/src/HIP/Kokkos_HIP_Space.cpp
Outdated
#endif | ||
|
||
m_space.deallocate(SharedAllocationRecord<void, void>::m_alloc_ptr, | ||
m_space.deallocate(RecordBase::m_alloc_ptr->m_label, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this supposed to work? Doesn't the thing above actually deep_copy the label back first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still can't work. you need to deep copy the header back to the host before accessing the label
core/src/Cuda/Kokkos_CudaSpace.cpp
Outdated
#endif | ||
|
||
m_space.deallocate(SharedAllocationRecord<void, void>::m_alloc_ptr, | ||
m_space.deallocate(RecordBase::m_alloc_ptr->m_label, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need to deep copy the header back first in order to get the label?
Quick note: it turns out that this exposes events from new Kokkos internal allocations to the user. Once I've fixed this and marked it ready for review, the reviewer should check to make sure the names we're using for those are acceptable |
It turns out that Views go through this code path: https://github.com/kokkos/kokkos/blob/master/core/src/impl/Kokkos_MemorySpace.hpp#L69 Which allocates the requested size plus a header. This breaks all of our Profiling, each header is now 128 bytes (sizeof(SharedAllocationHeader)) bigger than they used to be, as far as the Profiling system is concerned |
would it make this a bit simpler if we just removed the ENABLE_PROFILING thing in this PR? (Or assume here that it is on, and then do a follow on which removes it everywhere else and adds the "I don't have dlopen" option)? |
It would make it a little less ugly to remove ENABLE_PROFILING, but I'd prefer to separate that. When I do, I'll make this less ugly. I just don't want to do something as big as that in a disconnected PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to avoid unnecessary deep copies if no profile library is loaded.
Codecov Report
@@ Coverage Diff @@
## develop #3084 +/- ##
=======================================
Coverage 85.6% 85.7%
=======================================
Files 122 122
Lines 10391 10398 +7
=======================================
+ Hits 8905 8917 +12
+ Misses 1486 1481 -5
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the single argument allocate function works. Since it calls a 2 argument version which doesn't exist?
@@ -201,6 +201,19 @@ CudaHostPinnedSpace::CudaHostPinnedSpace() {} | |||
// <editor-fold desc="allocate()"> {{{1 | |||
|
|||
void *CudaSpace::allocate(const size_t arg_alloc_size) const { | |||
return allocate("[unlabeled]", arg_alloc_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this call doesn't work doesn't? I mean there doesn't seem to be an 2 argument allocate overload. Maybe arg_logical_size should just be defaulted to the arg_alloc_size thing. Or we should just report out physical allocation size instead of logical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the three-arg does have a default argument, it defaults to 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok the default is not in the cpp file I guess.
Addresses #3064 . Pushing to catch the inevitable preprocessor branches leading to undefined arguments, will mark ready to review when that's fixed