Resolve CUPTI cbid names via cuptiGetCallbackName#1400
Closed
scotts wants to merge 1 commit into
Closed
Conversation
Summary: **Problem**: Newer CUDA Runtime cbids — anything CUPTI added from CUDA 12.2 onward — render as `INVALID` in PyTorch profiler traces, and most CUDA Driver cbids render as `unknown`. This makes recent CUDA APIs invisible in profile timelines unless someone manually updates kineto's name tables. **Why**: Kineto resolves cbids through two hand-maintained structures: a positional array of ~446 Runtime names in `cupti_strings.cpp` and a hash map of ~8 explicitly-registered Driver names in `CuptiCbidRegistry`. Anything outside those tables fails to resolve. New cbids are added at the end of the CUPTI enums with every CUDA release, but the parallel name tables aren't kept in sync. Updating the table is manual and error-prone. **Fix**: CUPTI ships [`cuptiGetCallbackName(domain, cbid, &name)`](https://docs.nvidia.com/cupti/api/group__CUPTI__CALLBACK__API.html#group__cupti__callback__api_1ga0fe2357995aa7861a37e5896c6a18635), which returns the authoritative name for any cbid the loaded CUPTI knows about, in either domain. Calling it directly eliminates both failure modes permanently and removes ~500 lines of hand-maintained name data. The same helper backs the new `driverCbidName()` and `runtimeCbidName()` (which now returns `std::string`). CUPTI returns identifiers with the CUDA-version-introduced suffix attached (e.g. `cudaLaunchKernel_v7000`). To preserve the existing trace-label convention, the implementation strips a trailing `_v` followed by 4+ digits — unambiguous because the lowest CUDA-version suffix is `_v3020` (CUDA 3.2). Single-digit API-generation suffixes like `_v2`/`_v3` are preserved; `cudaStreamGetCaptureInfo_v3_v12030` correctly becomes `cudaStreamGetCaptureInfo_v3`. `CuptiCbidRegistry` keeps its flow-correlation, blocklist, and registration tracking. Only the name-lookup path is removed; `DriverActivity::name()` now calls `driverCbidName()` directly. Differential Revision: D104926326
|
@scotts has exported this pull request. If you are a Meta employee, you can view the originating Diff in D104926326. |
|
This pull request has been merged in e07e121. |
pytorchmergebot
pushed a commit
to pytorch/pytorch
that referenced
this pull request
May 22, 2026
Includes the following commits: - ci: declare workflow-level `contents: read` on 5 workflows (pytorch/kineto#1404) 5902263 - Remove deprecated `REQUEST_TIMESTAMP` config key (pytorch/kineto#1409) 55883de - Fix intermittent Mac CI failure from conda channel reset (pytorch/kineto#1407) ee27b5c - Add nlohmann/json as a top-level third_party submodule (pytorch/kineto#1406) c044281 - Remove SIGUSR2 on-demand profiling path (pytorch/kineto#1408) 471ed38 - Fix ROCm HtoD memcpy stream attribution (pytorch/kineto#1398) 799b5f4 - Fix UST_LOGGER_MARK_COMPLETED build failure in manifold_trace_logger (pytorch/kineto#1389) 60967ce - Remove `DefaultTimeConverterIsIdentity` test (pytorch/kineto#1401) 81d31cd - Re-enable most PyTorch tests (pytorch/kineto#1403) 212f9a5 - Daily `arc lint --take CLANGFORMAT` (pytorch/kineto#1402) 6481fac - Resolve CUPTI cbid names via cuptiGetCallbackName (pytorch/kineto#1400) e07e121 - XPUPTI: Fix ts=0 trace events on Windows (pytorch/kineto#1381) 4c8d01c - Remove LIBKINETO_NO* compatibility shim (pytorch/kineto#1399) ea8bc18 - Upgrade Kineto to C++20 (pytorch/kineto#1397) 77e2b46 - Update the rocm api filtering (pytorch/kineto#1392) e0ac578 Pull Request resolved: #184784 Approved by: https://github.com/NicolasHug, https://github.com/malfet
pytorchmergebot
pushed a commit
to khushi-411/pytorch
that referenced
this pull request
May 24, 2026
Includes the following commits: - ci: declare workflow-level `contents: read` on 5 workflows (pytorch/kineto#1404) 5902263 - Remove deprecated `REQUEST_TIMESTAMP` config key (pytorch/kineto#1409) 55883de - Fix intermittent Mac CI failure from conda channel reset (pytorch/kineto#1407) ee27b5c - Add nlohmann/json as a top-level third_party submodule (pytorch/kineto#1406) c044281 - Remove SIGUSR2 on-demand profiling path (pytorch/kineto#1408) 471ed38 - Fix ROCm HtoD memcpy stream attribution (pytorch/kineto#1398) 799b5f4 - Fix UST_LOGGER_MARK_COMPLETED build failure in manifold_trace_logger (pytorch/kineto#1389) 60967ce - Remove `DefaultTimeConverterIsIdentity` test (pytorch/kineto#1401) 81d31cd - Re-enable most PyTorch tests (pytorch/kineto#1403) 212f9a5 - Daily `arc lint --take CLANGFORMAT` (pytorch/kineto#1402) 6481fac - Resolve CUPTI cbid names via cuptiGetCallbackName (pytorch/kineto#1400) e07e121 - XPUPTI: Fix ts=0 trace events on Windows (pytorch/kineto#1381) 4c8d01c - Remove LIBKINETO_NO* compatibility shim (pytorch/kineto#1399) ea8bc18 - Upgrade Kineto to C++20 (pytorch/kineto#1397) 77e2b46 - Update the rocm api filtering (pytorch/kineto#1392) e0ac578 Pull Request resolved: pytorch#184784 Approved by: https://github.com/NicolasHug, https://github.com/malfet
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Problem: Newer CUDA Runtime cbids — anything CUPTI added from CUDA 12.2 onward — render as
INVALIDin PyTorch profiler traces, and most CUDA Driver cbids render asunknown. This makes recent CUDA APIs invisible in profile timelines unless someone manually updates kineto's name tables.Why: Kineto resolves cbids through two hand-maintained structures: a positional array of ~446 Runtime names in
cupti_strings.cppand a hash map of ~8 explicitly-registered Driver names inCuptiCbidRegistry. Anything outside those tables fails to resolve. New cbids are added at the end of the CUPTI enums with every CUDA release, but the parallel name tables aren't kept in sync. Updating the table is manual and error-prone.Fix: CUPTI ships
cuptiGetCallbackName(domain, cbid, &name), which returns the authoritative name for any cbid the loaded CUPTI knows about, in either domain. Calling it directly eliminates both failure modes permanently and removes ~500 lines of hand-maintained name data. The same helper backs the newdriverCbidName()andruntimeCbidName()(which now returnsstd::string).CUPTI returns identifiers with the CUDA-version-introduced suffix attached (e.g.
cudaLaunchKernel_v7000). To preserve the existing trace-label convention, the implementation strips a trailing_vfollowed by 4+ digits — unambiguous because the lowest CUDA-version suffix is_v3020(CUDA 3.2). Single-digit API-generation suffixes like_v2/_v3are preserved;cudaStreamGetCaptureInfo_v3_v12030correctly becomescudaStreamGetCaptureInfo_v3.CuptiCbidRegistrykeeps its flow-correlation, blocklist, and registration tracking. Only the name-lookup path is removed;DriverActivity::name()now callsdriverCbidName()directly.Differential Revision: D104926326