-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fence Profiling Support in all backends #3966
Fence Profiling Support in all backends #3966
Conversation
This looks reasonable to me. We need to discuss the naming scheme, and unify that accross all of Kokkos internally named things though. I know we did a lot of ClassName::function_name naming for example in messages, etc. but its not super uniform. So basically: Kokkos::ClassName::whatever or kokkos.classname.whatever or ... |
Retest this please. |
Looks reasonable to me. I'm not quite sure how portable |
Conflicts: core/src/Kokkos_CopyViews.hpp
Don't rereview this yet. I'm using CI to see what all I need to convert |
…f/kokkos into feature/profile-gpu-fences � Conflicts: � core/src/Cuda/Kokkos_Cuda_Instance.cpp � core/src/impl/Kokkos_Profiling.hpp
The indentation checker complains. 🙂 |
I have to do some more technical work on it, I'm not worried about formatting until the content is ready |
Just saying that you won't see any results for the |
…an "accurate" deviceID
…f/kokkos into feature/profile-gpu-fences Conflicts: core/src/Cuda/Kokkos_Cuda_Instance.cpp
…f/kokkos into feature/profile-gpu-fences
@crtrott: how do you feel about this as a resolution for thread-safety? Basically we're only accessing the map on ExecSpace construction. This could still be a bit slower when a bunch of threads are making spaces, but I honestly think that ID lookup will be a lot cheaper than all of the scratch allocations and the like |
@crtrott: let me know if you want further edits (like making those functions Impl). Personally I'm okay with this PR as-is |
Conflicts: core/src/Cuda/Kokkos_Cuda_Instance.hpp core/src/HIP/Kokkos_HIP_Instance.hpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make the new overloads for fence internal functions instead, in particular since they don't exist for every execution space.
core/src/Kokkos_Threads.hpp
Outdated
@@ -107,8 +108,13 @@ class Threads { | |||
/// method does not return until all dispatched functors on this | |||
/// device have completed. | |||
static void impl_static_fence(); | |||
static void impl_static_fence(const std::string& name); | |||
|
|||
void fence(Impl::fence_is_static is_static) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would feel better with that then being an internal function. We already distinguish between the static fence and the non-static fence technically: impl_static_fence and fence, with the static version only being there for internal purposes such as for the implementation of Kokkos::fence(); So if the reason right now is that impl_static_fence() calls fence(), then just make them both call somethign internal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Not a thorough review)
Good catches, I'll fix those |
…f/kokkos into feature/profile-gpu-fences
So, this one needs conversation, we're really at a "design review" point.
One thing I've been promising for ages, but haven't gotten around to doing, is to add fence profiling to Kokkos. We have the callbacks (#2891), but they're currently unused.
What I see, is that we basically need a Kokkos wrapper around any fencing function. cudaDeviceSynchronize,cudaStreamSynchronize, that take in at a minimum a string name, probably also a deviceID? Then we never directly call the fencing function, only the wrapper. But this change radiates out. If cudaStreamSynchronize takes a name, then so must Kokkos::Cuda::fence(), so that it can forward it to cudaStreamSynchronize. We also need compatibility, so we need a Kokkos::Cuda::fence with no args, that calls the named version.
I've sketched what this might look like in Cuda. Note that the only changes I make are to Cuda, we also need to modify generic code (CopyViews and the like), but those changes wouldn't compile.
What I need to proceed is for somebody to sign off on the design. Once you do that, I will write the infrastructure for this in every backend. My hope is that we can merge at that point. Once we have the infrastructure, I (or a backend owner) can start labeling fences, and starting to put this in generic code as well (like that in CopyViews).