Skip to content

Resolve CUPTI cbid names via cuptiGetCallbackName#1400

Closed
scotts wants to merge 1 commit into
pytorch:mainfrom
scotts:export-D104926326
Closed

Resolve CUPTI cbid names via cuptiGetCallbackName#1400
scotts wants to merge 1 commit into
pytorch:mainfrom
scotts:export-D104926326

Conversation

@scotts
Copy link
Copy Markdown
Contributor

@scotts scotts commented May 13, 2026

Summary:
Problem: Newer CUDA Runtime cbids — anything CUPTI added from CUDA 12.2 onward — render as INVALID in PyTorch profiler traces, and most CUDA Driver cbids render as unknown. This makes recent CUDA APIs invisible in profile timelines unless someone manually updates kineto's name tables.

Why: Kineto resolves cbids through two hand-maintained structures: a positional array of ~446 Runtime names in cupti_strings.cpp and a hash map of ~8 explicitly-registered Driver names in CuptiCbidRegistry. Anything outside those tables fails to resolve. New cbids are added at the end of the CUPTI enums with every CUDA release, but the parallel name tables aren't kept in sync. Updating the table is manual and error-prone.

Fix: CUPTI ships cuptiGetCallbackName(domain, cbid, &name), which returns the authoritative name for any cbid the loaded CUPTI knows about, in either domain. Calling it directly eliminates both failure modes permanently and removes ~500 lines of hand-maintained name data. The same helper backs the new driverCbidName() and runtimeCbidName() (which now returns std::string).

CUPTI returns identifiers with the CUDA-version-introduced suffix attached (e.g. cudaLaunchKernel_v7000). To preserve the existing trace-label convention, the implementation strips a trailing _v followed by 4+ digits — unambiguous because the lowest CUDA-version suffix is _v3020 (CUDA 3.2). Single-digit API-generation suffixes like _v2/_v3 are preserved; cudaStreamGetCaptureInfo_v3_v12030 correctly becomes cudaStreamGetCaptureInfo_v3.

CuptiCbidRegistry keeps its flow-correlation, blocklist, and registration tracking. Only the name-lookup path is removed; DriverActivity::name() now calls driverCbidName() directly.

Differential Revision: D104926326

Summary:
**Problem**: Newer CUDA Runtime cbids — anything CUPTI added from CUDA 12.2 onward — render as `INVALID` in PyTorch profiler traces, and most CUDA Driver cbids render as `unknown`. This makes recent CUDA APIs invisible in profile timelines unless someone manually updates kineto's name tables.

**Why**: Kineto resolves cbids through two hand-maintained structures: a positional array of ~446 Runtime names in `cupti_strings.cpp` and a hash map of ~8 explicitly-registered Driver names in `CuptiCbidRegistry`. Anything outside those tables fails to resolve. New cbids are added at the end of the CUPTI enums with every CUDA release, but the parallel name tables aren't kept in sync. Updating the table is manual and error-prone.

**Fix**: CUPTI ships [`cuptiGetCallbackName(domain, cbid, &name)`](https://docs.nvidia.com/cupti/api/group__CUPTI__CALLBACK__API.html#group__cupti__callback__api_1ga0fe2357995aa7861a37e5896c6a18635), which returns the authoritative name for any cbid the loaded CUPTI knows about, in either domain. Calling it directly eliminates both failure modes permanently and removes ~500 lines of hand-maintained name data. The same helper backs the new `driverCbidName()` and `runtimeCbidName()` (which now returns `std::string`).

CUPTI returns identifiers with the CUDA-version-introduced suffix attached (e.g. `cudaLaunchKernel_v7000`). To preserve the existing trace-label convention, the implementation strips a trailing `_v` followed by 4+ digits — unambiguous because the lowest CUDA-version suffix is `_v3020` (CUDA 3.2). Single-digit API-generation suffixes like `_v2`/`_v3` are preserved; `cudaStreamGetCaptureInfo_v3_v12030` correctly becomes `cudaStreamGetCaptureInfo_v3`.

`CuptiCbidRegistry` keeps its flow-correlation, blocklist, and registration tracking. Only the name-lookup path is removed; `DriverActivity::name()` now calls `driverCbidName()` directly.

Differential Revision: D104926326
@meta-cla meta-cla Bot added the cla signed label May 13, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented May 13, 2026

@scotts has exported this pull request. If you are a Meta employee, you can view the originating Diff in D104926326.

@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented May 14, 2026

This pull request has been merged in e07e121.

pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request May 22, 2026
Includes the following commits:

- ci: declare workflow-level `contents: read` on 5 workflows (pytorch/kineto#1404) 5902263
- Remove deprecated `REQUEST_TIMESTAMP` config key (pytorch/kineto#1409) 55883de
- Fix intermittent Mac CI failure from conda channel reset (pytorch/kineto#1407) ee27b5c
- Add nlohmann/json as a top-level third_party submodule (pytorch/kineto#1406) c044281
- Remove SIGUSR2 on-demand profiling path (pytorch/kineto#1408) 471ed38
- Fix ROCm HtoD memcpy stream attribution (pytorch/kineto#1398) 799b5f4
- Fix UST_LOGGER_MARK_COMPLETED build failure in manifold_trace_logger (pytorch/kineto#1389) 60967ce
- Remove `DefaultTimeConverterIsIdentity` test (pytorch/kineto#1401) 81d31cd
- Re-enable most PyTorch tests (pytorch/kineto#1403) 212f9a5
- Daily `arc lint --take CLANGFORMAT` (pytorch/kineto#1402) 6481fac
- Resolve CUPTI cbid names via cuptiGetCallbackName (pytorch/kineto#1400) e07e121
- XPUPTI: Fix ts=0 trace events on Windows (pytorch/kineto#1381) 4c8d01c
- Remove LIBKINETO_NO* compatibility shim (pytorch/kineto#1399) ea8bc18
- Upgrade Kineto to C++20 (pytorch/kineto#1397) 77e2b46
- Update the rocm api filtering (pytorch/kineto#1392) e0ac578
Pull Request resolved: #184784
Approved by: https://github.com/NicolasHug, https://github.com/malfet
pytorchmergebot pushed a commit to khushi-411/pytorch that referenced this pull request May 24, 2026
Includes the following commits:

- ci: declare workflow-level `contents: read` on 5 workflows (pytorch/kineto#1404) 5902263
- Remove deprecated `REQUEST_TIMESTAMP` config key (pytorch/kineto#1409) 55883de
- Fix intermittent Mac CI failure from conda channel reset (pytorch/kineto#1407) ee27b5c
- Add nlohmann/json as a top-level third_party submodule (pytorch/kineto#1406) c044281
- Remove SIGUSR2 on-demand profiling path (pytorch/kineto#1408) 471ed38
- Fix ROCm HtoD memcpy stream attribution (pytorch/kineto#1398) 799b5f4
- Fix UST_LOGGER_MARK_COMPLETED build failure in manifold_trace_logger (pytorch/kineto#1389) 60967ce
- Remove `DefaultTimeConverterIsIdentity` test (pytorch/kineto#1401) 81d31cd
- Re-enable most PyTorch tests (pytorch/kineto#1403) 212f9a5
- Daily `arc lint --take CLANGFORMAT` (pytorch/kineto#1402) 6481fac
- Resolve CUPTI cbid names via cuptiGetCallbackName (pytorch/kineto#1400) e07e121
- XPUPTI: Fix ts=0 trace events on Windows (pytorch/kineto#1381) 4c8d01c
- Remove LIBKINETO_NO* compatibility shim (pytorch/kineto#1399) ea8bc18
- Upgrade Kineto to C++20 (pytorch/kineto#1397) 77e2b46
- Update the rocm api filtering (pytorch/kineto#1392) e0ac578
Pull Request resolved: pytorch#184784
Approved by: https://github.com/NicolasHug, https://github.com/malfet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant