-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major duplicate code removal in SharedAllocationRecord specializations #3658
Conversation
2e3d69c
to
21fb670
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good but I am confused how does this not fail at link time this you forgot the explicit instantiation of HostInaccessibleSharedAllocationRecordCommon<SYCLDeviceUSMSpace>
?
protected: | ||
using record_base_t::record_base_t; | ||
|
||
void _fill_host_accessible_header_info(SharedAllocationHeader& arg_header, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we had settled for impl_
prefix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't public. impl_
is for things that need to be public for some reason (usually because friendship with some other library is difficult or impossible) but aren't part of the supported stable public API. Nothing in this class is part of the public API, but there are parts that are only used in this class hierarchy. A common convention for this is to prefix with an underscore, and since we don't really have a uniform convention for this in Kokkos, I just used that here. If you can point to somewhere else where we have a uniform convention for protected/private methods of implementation classes, I'll use that instead; otherwise, I don't see a reason to change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static auto allocate(MemorySpace const& arg_space, | ||
std::string const& arg_label, size_t arg_alloc_size) | ||
-> derived_t*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static auto allocate(MemorySpace const& arg_space, | |
std::string const& arg_label, size_t arg_alloc_size) | |
-> derived_t*; | |
static derived_t* allocate(MemorySpace const& arg_space, | |
std::string const& arg_label, size_t arg_alloc_size); |
or is there a reason to defer the return type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, see my response to your suggestion elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only true for the definition.
68388cf
to
4e4c681
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, except that I think the string formatting for printing pointers can be simplified by using %p
. I might be wrong about that though, but at least in the simple example I checked %p
was equivalent to the uintptr_t
hex string that you wrote.
If I am missing something or %p
is weird on some systems then you can ignore this.
friend class SharedAllocationRecordCommon< | ||
Kokkos::Experimental::SYCLSharedUSMSpace>; | ||
using base_t = | ||
SharedAllocationRecordCommon<Kokkos::Experimental::SYCLSharedUSMSpace>; | ||
using RecordBase = SharedAllocationRecord<void, void>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should RecordBase
be determined from base_t
? As far as I can tell we have this because base_t
derives from SharedAllocationRecord<void, void>
. This is the same in other example, I just picked this one because it was the first one I found while scrolling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mostly just avoided doing that to minimize the size of the diff. That said, it's often useful to know the size of the root class of some hierarchy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by size? I just meant something like. using RecordBase = base_t::blahblah
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping
// Formatting dependent on sizeof(uintptr_t) | ||
const char* format_string; | ||
|
||
if (sizeof(uintptr_t) == sizeof(unsigned long)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't you just use %p
https://godbolt.org/z/vszj3o
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better https://godbolt.org/z/PfG3Eo, also %#x
means you don't have to write 0x
in front of of the input. But I still think you just want %.12p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, but this is old code and a different issue from the current pull request. I just moved it over from the existing code that was duplicated across several backends. Either way, if we want to change it, we shouldn't do it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm skeptical of that argument. You just touched the lines and moved them as part of a clean up. I'm not saying you have to fix it, but now blame will imply that you wrote it this way instead of whoever wrote the code you copied and pasted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine with me. The failing CI checks are unrelated.
@@ -57,6 +51,13 @@ | |||
#include <impl/Kokkos_Error.hpp> | |||
#include <impl/Kokkos_MemorySpace.hpp> | |||
|
|||
#include <stdlib.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason to not use cstdlib
here? (I know it is copy + paste here, but since it is changing anyway...)
friend class HostInaccessibleSharedAllocationRecordCommon< | ||
Kokkos::Experimental::HIPSpace>; | ||
using base_t = HostInaccessibleSharedAllocationRecordCommon< | ||
Kokkos::Experimental::HIPSpace>; | ||
using RecordBase = SharedAllocationRecord<void, void>; | ||
|
||
SharedAllocationRecord(const SharedAllocationRecord&) = delete; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment: should we do the Rule of 5 (and should the delete functions be public)?
static auto get_record(void* alloc_ptr) -> derived_t*; | ||
std::string get_label() const; | ||
}; | ||
|
||
namespace { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this give us separate deallocate
in every translation unit? I'm not sure it matters...
arg_header.m_record = &self(); | ||
|
||
strncpy(arg_header.m_label, arg_label.c_str(), | ||
SharedAllocationHeader::maximum_label_length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can technically be SharedAllocationHeader::maximum_label_length-1
, since you always write a \0
in the last byte anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping
"0x%.12llx + %.8ld ] count(%d) dealloc(0x%.12llx) %s\n"; | ||
} | ||
|
||
snprintf(buffer, 256, format_string, MemorySpace::execution_space::name(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preference for sizeof(buffer)
instead of 256
, since the two are tied together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping
Based on @masterleinad suggestion. Co-authored-by: Daniel Arndt <arndtd@ornl.gov>
4e4c681
to
468c9b0
Compare
Retest this please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am mostly OK with these changes. Please make sure you replay to Nevin's comment
kokkos#3658) * started cleanup using a CRTP base * Cuda replicates removed * host space migration done * Hip changes for shared alloc cleanup * OpenMPTargetSpace shared alloc cleanup * SYCL shared alloc cleanup * Forgot explicit instantiations in a couple of places * redoing some formatting for a slightly different clang version * introduced HostInaccessibleSharedAllocationRecordCommon * fixed a couple of incorrect default constructors * a few more small bugs in SYCL and OmpTarget * Update to more modern use of (char)0 Based on @masterleinad suggestion. Co-authored-by: Daniel Arndt <arndtd@ornl.gov> * removed CudaSharedAllocationRecordCommon * revert change accidentally included in this changeset * added missing SYCL explicit template instantiation * reverted intel workaround removal * Post-rebase cleanup Co-authored-by: Daniel Arndt <arndtd@ornl.gov>
While working on the
View
refactor formdspan
, it became clear that there is a massive amount of (relatively technical, hard to understand, hard to recognize as identical) duplicated code across backends with respect toSharedAllocationRecord
partial specializations. SinceSharedAllocationRecord
andSharedAllocationTracker
are part of theView
refactor, I literally needed to rewrite a bunch of this code in a common place just to determine whether or not it's the same. I'd like to avoid touching individual backends in the actualView
refactor, and this is a start towards being able to do this.This pull request basically just introduces a CRTP base class named
SharedAllocationRecordCommon
that contains all of the default implementations that had been copy-pasted across backends over the past 10 years.