-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NVTX support and RMM_FUNC_RANGE() macro #1558
Conversation
cmake/thirdparty/get_nvtx.cmake
Outdated
# ============================================================================= | ||
|
||
# This function finds NVTX and sets any additional necessary environment variables. | ||
function(find_and_configure_nvtx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably set up NVTX in rapids-cmake since we have three repos using the same code (rmm, cudf, cuspatial -- can't recall if we finished cuspatial but it's supposed to use this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I was surprised to find it wasn't already in rapids-cmake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The build failure shows:
This is because we're using NVTX as a BUILD_LOCAL_INTERFACE, so it doesn't seem to get picked up by the
I know we had to enable this for cuDF in rapidsai/cudf#15271 because not doing so led to breakage in static builds. I don't know how this should be handled for RMM as a header-only library. Maybe we just need NVTX to have a normal edit: I checked with @KyleFromNVIDIA and attempted this change in 702915b. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It really makes me happy to see how simple this is :)
Depends on #rapidsai/rapids-cmake#606 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
include/rmm/detail/nvtx/ranges.hpp
Outdated
@@ -0,0 +1,61 @@ | |||
/* | |||
* Copyright (c) 2020-2024, NVIDIA CORPORATION. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Copyright (c) 2020-2024, NVIDIA CORPORATION. | |
* Copyright (c) 2024, NVIDIA CORPORATION. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks.
@@ -117,6 +118,7 @@ class device_memory_resource { | |||
*/ | |||
void* allocate(std::size_t bytes, cuda_stream_view stream = cuda_stream_view{}) | |||
{ | |||
RMM_FUNC_RANGE(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Do we want the same set of annotations for the host_memory_resource
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I added them. Note that the new pinned_host_memory_resource
does not derive from host_memory_resource
so I had to add ranges to it explicitly. This is the way it will have to be done for all MRs that just implement the concepts in the future, unfortunately.
Adds `rapids_cpm_nvtx3`, adapted from the code that is currently implemented in libcudf. This aims to help with rapidsai/rmm#1558. Authors: - Bradley Dice (https://github.com/bdice) - Mark Harris (https://github.com/harrism) - Robert Maynard (https://github.com/robertmaynard) Approvers: - Robert Maynard (https://github.com/robertmaynard) URL: #606
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of small things. I would like to hear Robert's opinion on one since I think it's perpetuating a form of tech debt that we have throughout RAPIDS CMake code and I'd like to establish a different practice, but it shouldn't block merging so feel free to ping me tomorrow if it isn't addressed by the freeze deadline.
|
||
endfunction() | ||
|
||
find_and_configure_nvtx3() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I find these extra modules to be overkill when they're so trivial and would prefer that we inline them. In a few of the more recent cases we have done so. @robertmaynard WDYT? This feels like a pattern that's been copy-pasted many times into a number of scenarios where it isn't really needed.
Specifically I'm talking about just moving lines 18/19 into CMakeLists.txt and removing this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree, I also feel that this PR is consistent with the style of 3 other rapids-cmake packages used in RMM (CCCL, fmt, spdlog -- which has additional local cmake code for some reason). Therefore I would like to defer this change to a later PR, which should be applied in unison with the same style change across all of RAPIDS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CMake approval.
/merge |
target_compile_features(rmm INTERFACE cxx_std_17 $<BUILD_INTERFACE:cuda_std_17>) | ||
target_compile_definitions(rmm INTERFACE LIBCUDACXX_ENABLE_EXPERIMENTAL_MEMORY_RESOURCE) | ||
|
||
# Disable NVTX if necessary | ||
if(NOT USE_NVTX) | ||
target_compile_definitions(rmm INTERFACE NVTX_DISABLE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The option for disabling NVTX does not appear to be used. This seems like a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description
Let's get RMM allocate/deallocates showing up in profiler timelines.
Closes #495
Checklist