Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix device_id in profiling #4997

Merged
merged 3 commits into from
May 11, 2022

Conversation

masterleinad
Copy link
Contributor

@masterleinad masterleinad commented May 3, 2022

Fixes #4694 (comment).
@khuck Can you please have a look if this fixes the issue for you?

The important change is the correct shift in device_id_root() to make sure that this impacts type and not device_id in ExecutionSpaceIdentifier. Previously, we were not recording a device_id at all.

@crtrott
Copy link
Member

crtrott commented May 4, 2022

You may need to instantiate the classes now to actually be able to link against the function?

@masterleinad
Copy link
Contributor Author

You may need to instantiate the classes now to actually be able to link against the function?

Not quite yet what the problem is...

@khuck
Copy link

khuck commented May 4, 2022

Fixes #4694 (comment). @khuck Can you please have a look of this fixes the issue for you?

The important change is the correct shift in device_id_root() to make sure that this impacts type and not device_id in ExecutionSpaceIdentifier. Previously, we were not recording a device_id at all.

Yeah, once I get some free time after the ECP meeting, I'll give it a shot.

@masterleinad
Copy link
Contributor Author

CI is only failing because of the changed keys for the CUDA containers.

@masterleinad
Copy link
Contributor Author

Retest this please.

@masterleinad
Copy link
Contributor Author

Only CUDA-10.1-Clang-Tidy is failing (timeout), see https://cloud.cees.ornl.gov/jenkins-ci/blue/organizations/jenkins/Kokkos/detail/Kokkos/8680/pipeline/53, but KokkosCore_UnitTest_KokkosP (which is the one relevant for this pull request) is passing for that build as well.

@khuck
Copy link

khuck commented May 9, 2022

@masterleinad yes, this appears to fix the issue. Thanks!

@@ -1018,6 +1018,15 @@ void CudaSpaceInitializer::print_configuration(std::ostream &msg,
}

} // namespace Impl

#ifdef KOKKOS_ENABLE_CXX14
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the definition only C++14. From C++17 on, static constexpr members are implicitly inline.

@@ -179,6 +179,7 @@ namespace Experimental {
template <>
struct DeviceTypeTraits<OpenMP> {
static constexpr DeviceType id = DeviceType::OpenMP;
static int device_id(const OpenMP&) { return 0; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be omp_get_default_device() like for OpenMPTarget?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this only make sense for offloading? I choose 0 assuming that there is only ever one CPU device to use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick check in the standard and I couldn't find in the standard that the CPU will always have device_id = 0, that's why I think it's safer to use omp_get_default_device()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was trying to say is that the device reported by omp_get_default_device is only for target regions but the OpenMP doesn't use target regions (as opposed to OpenMPTarget). Calling omp_set_default_device should only impact OpenMPTarget and not OpenMP.
How do others feel about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was trying to say is that the device reported by omp_get_default_device is only for target regions

What do you mean? That function does not work inside a target region. When called from within a target region the effect of this routine is unspecified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but AFAICT a device is only used for target constructs, see https://splichal.eu/scripts/sphinx/libgomp/_build/html/openmp-environment-variables/ompdefaultdevice.html#omp-default-device. You use it to define a target region, not inside one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could not find any documentation on the behavior of omp_get_default_device when a target device is not present.

@crtrott crtrott merged commit 6dc1825 into kokkos:develop May 11, 2022
@crtrott crtrott added Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) Patch Release labels May 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) Patch Release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants