Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fence Profiling Support in all backends #3966

Merged
merged 93 commits into from
Jul 30, 2021

Conversation

DavidPoliakoff
Copy link
Contributor

So, this one needs conversation, we're really at a "design review" point.

One thing I've been promising for ages, but haven't gotten around to doing, is to add fence profiling to Kokkos. We have the callbacks (#2891), but they're currently unused.

What I see, is that we basically need a Kokkos wrapper around any fencing function. cudaDeviceSynchronize,cudaStreamSynchronize, that take in at a minimum a string name, probably also a deviceID? Then we never directly call the fencing function, only the wrapper. But this change radiates out. If cudaStreamSynchronize takes a name, then so must Kokkos::Cuda::fence(), so that it can forward it to cudaStreamSynchronize. We also need compatibility, so we need a Kokkos::Cuda::fence with no args, that calls the named version.

I've sketched what this might look like in Cuda. Note that the only changes I make are to Cuda, we also need to modify generic code (CopyViews and the like), but those changes wouldn't compile.

What I need to proceed is for somebody to sign off on the design. Once you do that, I will write the infrastructure for this in every backend. My hope is that we can merge at that point. Once we have the infrastructure, I (or a backend owner) can start labeling fences, and starting to put this in generic code as well (like that in CopyViews).

@crtrott
Copy link
Member

crtrott commented Apr 26, 2021

This looks reasonable to me. We need to discuss the naming scheme, and unify that accross all of Kokkos internally named things though. I know we did a lot of ClassName::function_name naming for example in messages, etc. but its not super uniform.

So basically: Kokkos::ClassName::whatever or kokkos.classname.whatever or ...

@masterleinad
Copy link
Contributor

Retest this please.

@masterleinad
Copy link
Contributor

Looks reasonable to me. I'm not quite sure how portable deviceID is.

@DavidPoliakoff
Copy link
Contributor Author

Don't rereview this yet. I'm using CI to see what all I need to convert

@masterleinad
Copy link
Contributor

The indentation checker complains. 🙂

@DavidPoliakoff
Copy link
Contributor Author

I have to do some more technical work on it, I'm not worried about formatting until the content is ready

@masterleinad
Copy link
Contributor

Just saying that you won't see any results for the GPU-backends until then. 🙂

@DavidPoliakoff
Copy link
Contributor Author

@crtrott: how do you feel about this as a resolution for thread-safety? Basically we're only accessing the map on ExecSpace construction. This could still be a bit slower when a bunch of threads are making spaces, but I honestly think that ID lookup will be a lot cheaper than all of the scratch allocations and the like

@DavidPoliakoff DavidPoliakoff moved this from In progress to Awaiting Feedback in Developer: David P Jul 9, 2021
@DavidPoliakoff
Copy link
Contributor Author

@crtrott: let me know if you want further edits (like making those functions Impl). Personally I'm okay with this PR as-is

@crtrott crtrott added this to In progress in Kokkos Release 3.5 Jul 14, 2021
@crtrott crtrott moved this from In progress to Awaiting Feedback in Kokkos Release 3.5 Jul 14, 2021
Conflicts:
	core/src/Cuda/Kokkos_Cuda_Instance.hpp
	core/src/HIP/Kokkos_HIP_Instance.hpp
Copy link
Member

@crtrott crtrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the new overloads for fence internal functions instead, in particular since they don't exist for every execution space.

@@ -107,8 +108,13 @@ class Threads {
/// method does not return until all dispatched functors on this
/// device have completed.
static void impl_static_fence();
static void impl_static_fence(const std::string& name);

void fence(Impl::fence_is_static is_static) const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would feel better with that then being an internal function. We already distinguish between the static fence and the non-static fence technically: impl_static_fence and fence, with the static version only being there for internal purposes such as for the implementation of Kokkos::fence(); So if the reason right now is that impl_static_fence() calls fence(), then just make them both call somethign internal.

Developer: David P automation moved this from Awaiting Feedback to In progress Jul 19, 2021
@DavidPoliakoff DavidPoliakoff moved this from In progress to Awaiting Feedback in Developer: David P Jul 26, 2021
Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not a thorough review)

core/src/Cuda/Kokkos_CudaSpace.cpp Outdated Show resolved Hide resolved
core/src/Cuda/Kokkos_CudaSpace.cpp Outdated Show resolved Hide resolved
@DavidPoliakoff
Copy link
Contributor Author

Good catches, I'll fix those

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants