-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix device_id in profiling #4997
Conversation
You may need to instantiate the classes now to actually be able to link against the function? |
dc1c5a1
to
3043c7f
Compare
Not quite yet what the problem is... |
Yeah, once I get some free time after the ECP meeting, I'll give it a shot. |
CI is only failing because of the changed keys for the |
Retest this please. |
Only |
@masterleinad yes, this appears to fix the issue. Thanks! |
@@ -1018,6 +1018,15 @@ void CudaSpaceInitializer::print_configuration(std::ostream &msg, | |||
} | |||
|
|||
} // namespace Impl | |||
|
|||
#ifdef KOKKOS_ENABLE_CXX14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the definition only C++14. From C++17 on, static constexpr
members are implicitly inline
.
@@ -179,6 +179,7 @@ namespace Experimental { | |||
template <> | |||
struct DeviceTypeTraits<OpenMP> { | |||
static constexpr DeviceType id = DeviceType::OpenMP; | |||
static int device_id(const OpenMP&) { return 0; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be omp_get_default_device()
like for OpenMPTarget?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this only make sense for offloading? I choose 0 assuming that there is only ever one CPU device to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick check in the standard and I couldn't find in the standard that the CPU will always have device_id = 0
, that's why I think it's safer to use omp_get_default_device()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I was trying to say is that the device reported by omp_get_default_device
is only for target regions but the OpenMP
doesn't use target regions (as opposed to OpenMPTarget
). Calling omp_set_default_device
should only impact OpenMPTarget
and not OpenMP
.
How do others feel about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I was trying to say is that the device reported by omp_get_default_device is only for target regions
What do you mean? That function does not work inside a target region. When called from within a target region the effect of this routine is unspecified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but AFAICT a device
is only used for target
constructs, see https://splichal.eu/scripts/sphinx/libgomp/_build/html/openmp-environment-variables/ompdefaultdevice.html#omp-default-device. You use it to define a target region, not inside one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could not find any documentation on the behavior of omp_get_default_device
when a target device is not present.
Fixes #4694 (comment).
@khuck Can you please have a look if this fixes the issue for you?
The important change is the correct shift in
device_id_root()
to make sure that this impactstype
and notdevice_id
inExecutionSpaceIdentifier
. Previously, we were not recording adevice_id
at all.