Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major duplicate code removal in SharedAllocationRecord specializations #3658

Merged
merged 17 commits into from
Mar 10, 2021

Conversation

dhollman
Copy link

@dhollman dhollman commented Dec 8, 2020

While working on the View refactor for mdspan, it became clear that there is a massive amount of (relatively technical, hard to understand, hard to recognize as identical) duplicated code across backends with respect to SharedAllocationRecord partial specializations. Since SharedAllocationRecord and SharedAllocationTracker are part of the View refactor, I literally needed to rewrite a bunch of this code in a common place just to determine whether or not it's the same. I'd like to avoid touching individual backends in the actual View refactor, and this is a start towards being able to do this.

This pull request basically just introduces a CRTP base class named SharedAllocationRecordCommon that contains all of the default implementations that had been copy-pasted across backends over the past 10 years.

@dhollman dhollman force-pushed the shared-alloc-cleanup branch 2 times, most recently from 2e3d69c to 21fb670 Compare December 8, 2020 20:05
Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but I am confused how does this not fail at link time this you forgot the explicit instantiation of HostInaccessibleSharedAllocationRecordCommon<SYCLDeviceUSMSpace>?

core/src/impl/Kokkos_ViewTracker.hpp Outdated Show resolved Hide resolved
core/src/impl/Kokkos_SharedAlloc.hpp Show resolved Hide resolved
protected:
using record_base_t::record_base_t;

void _fill_host_accessible_header_info(SharedAllocationHeader& arg_header,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we had settled for impl_ prefix

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't public. impl_ is for things that need to be public for some reason (usually because friendship with some other library is difficult or impossible) but aren't part of the supported stable public API. Nothing in this class is part of the public API, but there are parts that are only used in this class hierarchy. A common convention for this is to prefix with an underscore, and since we don't really have a uniform convention for this in Kokkos, I just used that here. If you can point to somewhere else where we have a uniform convention for protected/private methods of implementation classes, I'll use that instead; otherwise, I don't see a reason to change this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't buy your argument.
How standards proliferate

core/src/impl/Kokkos_SharedAlloc_timpl.hpp Show resolved Hide resolved
core/src/Kokkos_HostSpace.hpp Outdated Show resolved Hide resolved
core/src/SYCL/Kokkos_SYCL_Space.cpp Show resolved Hide resolved
core/src/HIP/Kokkos_HIP_Space.cpp Show resolved Hide resolved
core/src/Kokkos_HostSpace.hpp Outdated Show resolved Hide resolved
Comment on lines +264 to +275
static auto allocate(MemorySpace const& arg_space,
std::string const& arg_label, size_t arg_alloc_size)
-> derived_t*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static auto allocate(MemorySpace const& arg_space,
std::string const& arg_label, size_t arg_alloc_size)
-> derived_t*;
static derived_t* allocate(MemorySpace const& arg_space,
std::string const& arg_label, size_t arg_alloc_size);

or is there a reason to defer the return type?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, see my response to your suggestion elsewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only true for the definition.

core/src/impl/Kokkos_SharedAlloc.hpp Show resolved Hide resolved
core/src/impl/Kokkos_SharedAlloc.hpp Show resolved Hide resolved
core/src/impl/Kokkos_SharedAlloc_timpl.hpp Show resolved Hide resolved
core/src/impl/Kokkos_SharedAlloc_timpl.hpp Show resolved Hide resolved
core/src/impl/Kokkos_SharedAlloc_timpl.hpp Show resolved Hide resolved
core/src/impl/Kokkos_SharedAlloc_timpl.hpp Outdated Show resolved Hide resolved
@dhollman dhollman force-pushed the shared-alloc-cleanup branch 2 times, most recently from 68388cf to 4e4c681 Compare December 9, 2020 18:33
Copy link

@calewis calewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except that I think the string formatting for printing pointers can be simplified by using %p. I might be wrong about that though, but at least in the simple example I checked %p was equivalent to the uintptr_t hex string that you wrote.

If I am missing something or %p is weird on some systems then you can ignore this.

core/src/Cuda/Kokkos_CudaSpace.cpp Show resolved Hide resolved
friend class SharedAllocationRecordCommon<
Kokkos::Experimental::SYCLSharedUSMSpace>;
using base_t =
SharedAllocationRecordCommon<Kokkos::Experimental::SYCLSharedUSMSpace>;
using RecordBase = SharedAllocationRecord<void, void>;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should RecordBase be determined from base_t? As far as I can tell we have this because base_t derives from SharedAllocationRecord<void, void>. This is the same in other example, I just picked this one because it was the first one I found while scrolling.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly just avoided doing that to minimize the size of the diff. That said, it's often useful to know the size of the root class of some hierarchy.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by size? I just meant something like. using RecordBase = base_t::blahblah.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

// Formatting dependent on sizeof(uintptr_t)
const char* format_string;

if (sizeof(uintptr_t) == sizeof(unsigned long)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't you just use %p https://godbolt.org/z/vszj3o

Copy link

@calewis calewis Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better https://godbolt.org/z/PfG3Eo, also %#x means you don't have to write 0x in front of of the input. But I still think you just want %.12p

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, but this is old code and a different issue from the current pull request. I just moved it over from the existing code that was duplicated across several backends. Either way, if we want to change it, we shouldn't do it here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm skeptical of that argument. You just touched the lines and moved them as part of a clean up. I'm not saying you have to fix it, but now blame will imply that you wrote it this way instead of whoever wrote the code you copied and pasted.

Copy link
Contributor

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine with me. The failing CI checks are unrelated.

@@ -57,6 +51,13 @@
#include <impl/Kokkos_Error.hpp>
#include <impl/Kokkos_MemorySpace.hpp>

#include <stdlib.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to not use cstdlib here? (I know it is copy + paste here, but since it is changing anyway...)

friend class HostInaccessibleSharedAllocationRecordCommon<
Kokkos::Experimental::HIPSpace>;
using base_t = HostInaccessibleSharedAllocationRecordCommon<
Kokkos::Experimental::HIPSpace>;
using RecordBase = SharedAllocationRecord<void, void>;

SharedAllocationRecord(const SharedAllocationRecord&) = delete;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment: should we do the Rule of 5 (and should the delete functions be public)?

core/src/impl/Kokkos_SharedAlloc.hpp Show resolved Hide resolved
static auto get_record(void* alloc_ptr) -> derived_t*;
std::string get_label() const;
};

namespace {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this give us separate deallocate in every translation unit? I'm not sure it matters...

arg_header.m_record = &self();

strncpy(arg_header.m_label, arg_label.c_str(),
SharedAllocationHeader::maximum_label_length);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can technically be SharedAllocationHeader::maximum_label_length-1, since you always write a \0 in the last byte anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping

"0x%.12llx + %.8ld ] count(%d) dealloc(0x%.12llx) %s\n";
}

snprintf(buffer, 256, format_string, MemorySpace::execution_space::name(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preference for sizeof(buffer) instead of 256, since the two are tied together.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping

@masterleinad
Copy link
Contributor

Retest this please.

Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am mostly OK with these changes. Please make sure you replay to Nevin's comment

@crtrott crtrott merged commit 64c0518 into kokkos:develop Mar 10, 2021
masterleinad added a commit to masterleinad/kokkos that referenced this pull request Mar 11, 2021
kokkos#3658)

* started cleanup using a CRTP base

* Cuda replicates removed

* host space migration done

* Hip changes for shared alloc cleanup

* OpenMPTargetSpace shared alloc cleanup

* SYCL shared alloc cleanup

* Forgot explicit instantiations in a couple of places

* redoing some formatting for a slightly different clang version

* introduced HostInaccessibleSharedAllocationRecordCommon

* fixed a couple of incorrect default constructors

* a few more small bugs in SYCL and OmpTarget

* Update to more modern use of (char)0

Based on @masterleinad suggestion.

Co-authored-by: Daniel Arndt <arndtd@ornl.gov>

* removed CudaSharedAllocationRecordCommon

* revert change accidentally included in this changeset

* added missing SYCL explicit template instantiation

* reverted intel workaround removal

* Post-rebase cleanup

Co-authored-by: Daniel Arndt <arndtd@ornl.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants