-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable correct supported activities for kineto on rocm #88207
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88207
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 FailuresAs of commit 01cd18c: The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think according to this logic if code is build with GPU support, either NOROCTRACER
or NOCUPTI
would be deinfed:
pytorch/cmake/Dependencies.cmake
Lines 1908 to 2000 in 6e6f929
if(USE_KINETO) | |
if((NOT USE_CUDA) OR MSVC) | |
set(LIBKINETO_NOCUPTI ON CACHE STRING "" FORCE) | |
else() | |
set(LIBKINETO_NOCUPTI OFF CACHE STRING "") | |
message(STATUS "Using Kineto with CUPTI support") | |
endif() | |
if(NOT USE_ROCM) | |
set(LIBKINETO_NOROCTRACER ON CACHE STRING "" FORCE) | |
else() | |
set(LIBKINETO_NOROCTRACER OFF CACHE STRING "") | |
message(STATUS "Using Kineto with Roctracer support") | |
endif() | |
if(LIBKINETO_NOCUPTI AND LIBKINETO_NOROCTRACER) | |
message(STATUS "Using CPU-only version of Kineto") | |
endif() | |
set(CAFFE2_THIRD_PARTY_ROOT "${PROJECT_SOURCE_DIR}/third_party" CACHE STRING "") | |
set(KINETO_SOURCE_DIR "${CAFFE2_THIRD_PARTY_ROOT}/kineto/libkineto" CACHE STRING "") | |
set(KINETO_BUILD_TESTS OFF CACHE BOOL "") | |
set(KINETO_LIBRARY_TYPE "static" CACHE STRING "") | |
message(STATUS "Configuring Kineto dependency:") | |
message(STATUS " KINETO_SOURCE_DIR = ${KINETO_SOURCE_DIR}") | |
message(STATUS " KINETO_BUILD_TESTS = ${KINETO_BUILD_TESTS}") | |
message(STATUS " KINETO_LIBRARY_TYPE = ${KINETO_LIBRARY_TYPE}") | |
if(NOT LIBKINETO_NOCUPTI) | |
set(CUDA_SOURCE_DIR "${CUDA_TOOLKIT_ROOT_DIR}" CACHE STRING "") | |
message(STATUS " CUDA_SOURCE_DIR = ${CUDA_SOURCE_DIR}") | |
message(STATUS " CUDA_INCLUDE_DIRS = ${CUDA_INCLUDE_DIRS}") | |
if(NOT MSVC) | |
if(USE_CUPTI_SO) | |
set(CUPTI_LIB_NAME "libcupti.so") | |
else() | |
set(CUPTI_LIB_NAME "libcupti_static.a") | |
endif() | |
else() | |
set(CUPTI_LIB_NAME "cupti.lib") | |
endif() | |
find_library(CUPTI_LIBRARY_PATH ${CUPTI_LIB_NAME} PATHS | |
${CUDA_SOURCE_DIR} | |
${CUDA_SOURCE_DIR}/extras/CUPTI/lib64 | |
${CUDA_SOURCE_DIR}/lib64 | |
NO_DEFAULT_PATH) | |
find_path(CUPTI_INCLUDE_DIR cupti.h PATHS | |
${CUDA_SOURCE_DIR}/extras/CUPTI/include | |
${CUDA_INCLUDE_DIRS} | |
${CUDA_SOURCE_DIR} | |
${CUDA_SOURCE_DIR}/include | |
NO_DEFAULT_PATH) | |
if(CUPTI_LIBRARY_PATH AND CUPTI_INCLUDE_DIR) | |
message(STATUS " CUPTI_INCLUDE_DIR = ${CUPTI_INCLUDE_DIR}") | |
set(CUDA_cupti_LIBRARY ${CUPTI_LIBRARY_PATH}) | |
message(STATUS " CUDA_cupti_LIBRARY = ${CUDA_cupti_LIBRARY}") | |
message(STATUS "Found CUPTI") | |
set(LIBKINETO_NOCUPTI OFF CACHE STRING "" FORCE) | |
# I've only tested this sanity check on Linux; if someone | |
# runs into this bug on another platform feel free to | |
# generalize it accordingly | |
if(NOT USE_CUPTI_SO AND UNIX) | |
include(CheckCXXSourceRuns) | |
# rt is handled by the CMAKE_REQUIRED_LIBRARIES set above | |
if(NOT APPLE) | |
set(CMAKE_REQUIRED_LIBRARIES ${CMAKE_REQUIRED_LIBRARIES} "dl") | |
endif() | |
set(CMAKE_REQUIRED_LINK_OPTIONS "-Wl,--whole-archive,${CUPTI_LIBRARY_PATH},--no-whole-archive") | |
check_cxx_source_runs("#include <stdexcept> | |
int main() { | |
try { | |
throw std::runtime_error(\"error\"); | |
} catch (...) { | |
return 0; | |
} | |
return 1; | |
}" EXCEPTIONS_WORK) | |
set(CMAKE_REQUIRED_LINK_OPTIONS "") | |
if(NOT EXCEPTIONS_WORK) | |
message(FATAL_ERROR "Detected that statically linking against CUPTI causes exceptions to stop working. See https://github.com/pytorch/pytorch/issues/57744 for more details. Perhaps try: USE_CUPTI_SO=1 python setup.py develop --cmake") | |
endif() | |
endif() | |
else() | |
message(STATUS "Could not find CUPTI library, using CPU-only Kineto build") | |
set(LIBKINETO_NOCUPTI ON CACHE STRING "" FORCE) | |
endif() |
Correct: We need to enable ActivityType::CUDA (which means GPU in this case) whenever direct kernel measurements are available, i.e. cupti or roctracer are present. So we need that activity enabled unless BOTH are missing. An observation: It would probably be better to query the compiled libkineto at runtime for its actual capabilities than guessing it's capabilities based on CMAKE variables we passed to its build. Also, the negative define variables seem like a bad choice as you immediately end up with double negatives. LIBKINETO_USE_CUPTI and !LIBKINETO_USE_CUPTI speak to me more clearly. Also, you then wouldn't need to use DeMorgan's law make sense of, e.g., this patch. Which may have been done on purpose. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks very confusing, but if CI is green (assuming we are testing kineto in CI), looks good to me.
So the second line is the same as: Overall, if:
...then we want to use the gpu profiling library. And no argument that it is confusing. I was just dealing with what was there to begin with. |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: This PR is too stale; the last push date was more than 3 days ago. Please rebase and try again. You can rebase by leaving the following comment on this PR: Details for Dev Infra teamRaised by workflow job |
@pytorchbot rebase |
You don't have permissions to rebase this PR, only people with write permissions may rebase PRs. |
@mwootton Can you please double-check this CI failure: https://github.com/pytorch/pytorch/actions/runs/3463699947/jobs/5784908770
|
@pytorchbot merge -f "ROCM tests are green" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@pytorchbot revert -m "Broke test_kineto on trunk / win-vs2019-cuda11.6-py3 / test (default, 4, 5, windows.8xlarge.nvidia.gpu)" -c ignoredsignal |
@pytorchbot successfully started a revert job. Check the current status here. |
@mwootton your PR has been successfully reverted. |
This reverts commit 35093fc. Reverted #88207 on behalf of https://github.com/kit1980 due to Broke test_kineto on trunk / win-vs2019-cuda11.6-py3 / test (default, 4, 5, windows.8xlarge.nvidia.gpu)
A compile time guard was preventing ActivityType::CUDA from being available on rocm. This caused both the GPU_FALLBACK and CUDA modes to be active at the same time. So operators were being charged gpu time for the hipEventRecord ranges and the actual kernel execution times. This caused incorrect (and often negative) cuda times, in e.g. table(). Pull Request resolved: pytorch#88207 Approved by: https://github.com/malfet, https://github.com/jeffdaily
* Update to kineto with fixes * Enable correct supported activities for kineto on rocm (pytorch#88207) A compile time guard was preventing ActivityType::CUDA from being available on rocm. This caused both the GPU_FALLBACK and CUDA modes to be active at the same time. So operators were being charged gpu time for the hipEventRecord ranges and the actual kernel execution times. This caused incorrect (and often negative) cuda times, in e.g. table(). Pull Request resolved: pytorch#88207 Approved by: https://github.com/malfet, https://github.com/jeffdaily Co-authored-by: Michael Wootton <michael.wootton@amd.com>
* Update to kineto with fixes * Enable correct supported activities for kineto on rocm (pytorch#88207) A compile time guard was preventing ActivityType::CUDA from being available on rocm. This caused both the GPU_FALLBACK and CUDA modes to be active at the same time. So operators were being charged gpu time for the hipEventRecord ranges and the actual kernel execution times. This caused incorrect (and often negative) cuda times, in e.g. table(). Pull Request resolved: pytorch#88207 Approved by: https://github.com/malfet, https://github.com/jeffdaily Co-authored-by: Michael Wootton <michael.wootton@amd.com>
Continuation of #88207 A compile time guard was preventing ActivityType::CUDA from being available on rocm. This caused both the GPU_FALLBACK and CUDA modes to be active at the same time. So operators were being charged gpu time for the hipEventRecord ranges and the actual kernel execution times. This caused incorrect (and often negative) cuda times, in e.g. table(). Previously a cmake variable was not being propagated to a '-D', causing an issue on Windows, which uses cuda but not cupti. Pull Request resolved: #89785 Approved by: https://github.com/jeffdaily, https://github.com/malfet
A compile time guard was preventing ActivityType::CUDA from being available on rocm. This caused both the GPU_FALLBACK and CUDA modes to be active at the same time. So operators were being charged gpu time for the hipEventRecord ranges and the actual kernel execution times. This caused incorrect (and often negative) cuda times, in e.g. table(). Pull Request resolved: pytorch#88207 Approved by: https://github.com/malfet, https://github.com/jeffdaily
…ch#88207)" This reverts commit 35093fc. Reverted pytorch#88207 on behalf of https://github.com/kit1980 due to Broke test_kineto on trunk / win-vs2019-cuda11.6-py3 / test (default, 4, 5, windows.8xlarge.nvidia.gpu)
Continuation of pytorch#88207 A compile time guard was preventing ActivityType::CUDA from being available on rocm. This caused both the GPU_FALLBACK and CUDA modes to be active at the same time. So operators were being charged gpu time for the hipEventRecord ranges and the actual kernel execution times. This caused incorrect (and often negative) cuda times, in e.g. table(). Previously a cmake variable was not being propagated to a '-D', causing an issue on Windows, which uses cuda but not cupti. Pull Request resolved: pytorch#89785 Approved by: https://github.com/jeffdaily, https://github.com/malfet
…ineto (#92889) PR #88207 enabled ActivityType::CUDA for ROCm. TestProfiler.test_kineto needs an update in the test code to look for the correct pattern for gemm kernels for ROCm. Pull Request resolved: #92889 Approved by: https://github.com/malfet
A compile time guard was preventing ActivityType::CUDA from being available on rocm. This caused both the GPU_FALLBACK and CUDA modes to be active at the same time. So operators were being charged gpu time for the hipEventRecord ranges and the actual kernel execution times. This caused incorrect (and often negative) cuda times, in e.g. table().
cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport