Skip to content

Update ROCPROFILER_CALLBACK_* references to ROCPROFILER_BUFFER_*#1295

Closed
ryanzhang22 wants to merge 1 commit intopytorch:mainfrom
ryanzhang22:export-D96124233
Closed

Update ROCPROFILER_CALLBACK_* references to ROCPROFILER_BUFFER_*#1295
ryanzhang22 wants to merge 1 commit intopytorch:mainfrom
ryanzhang22:export-D96124233

Conversation

@ryanzhang22
Copy link
Copy Markdown
Contributor

Summary:
While debugging the test_disable_external_correlation profiler test, I found a bug where ROCm profiles didn't seem to produce "gpu_memcpy" events, leading to a validation error (see code ref). The payload here is doing .cuda() calls which should trigger the memcpy.

In the RocprofLogger.cpp file, we request Rocprofiler to return 'ROCPROFILER_BUFFER_TRACING_MEMORY_COPY' events thru rocprofiler_configure_buffer_tracing_service. However throughout the rest of the Rocprof logic we are expecting ROCPROFILER_CALLBACK_TRACING_MEMORY_COPY calls. In the switch statements modified in this diff, this causes the mem copy calls to fall back to kernel type by default -- so gpu_memcpy events are missing from the gpu trace + we're unable to get the kind of mem copy.

Differential Revision: D96124233

@meta-cla meta-cla bot added the cla signed label Mar 11, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Mar 11, 2026

@ryanzhang22 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D96124233.

ryanzhang22 added a commit to ryanzhang22/kineto that referenced this pull request Mar 11, 2026
…orch#1295)

Summary:

While debugging the test_disable_external_correlation profiler test, I found a bug where ROCm profiles didn't seem to produce "gpu_memcpy" events, leading to a validation error (see [code ref](https://www.internalfb.com/code/fbsource/[475c7d69dbbe]/fbcode/caffe2/test/profiler/test_profiler.py?lines=2392%2C2419)). The payload here is doing .cuda() calls which should trigger the memcpy.

In the RocprofLogger.cpp file, we request Rocprofiler to return 'ROCPROFILER_BUFFER_TRACING_MEMORY_COPY' events thru [`rocprofiler_configure_buffer_tracing_service`](https://www.internalfb.com/code/fbsource/[475c7d69dbbe]/fbcode/kineto/libkineto/src/RocprofLogger.cpp?lines=427). However throughout the rest of the Rocprof logic we are expecting ROCPROFILER_CALLBACK_TRACING_MEMORY_COPY calls. In the switch statements modified in this diff, this causes the mem copy calls to fall back to kernel type by default -- so gpu_memcpy events are missing from the gpu trace + we're unable to get the kind of mem copy.

Differential Revision: D96124233
@ryanzhang22 ryanzhang22 force-pushed the export-D96124233 branch 2 times, most recently from 85002c0 to ee0e447 Compare March 11, 2026 13:56
ryanzhang22 added a commit to ryanzhang22/kineto that referenced this pull request Mar 11, 2026
…orch#1295)

Summary:
Pull Request resolved: pytorch#1295

While debugging the test_disable_external_correlation profiler test, I found a bug where ROCm profiles didn't seem to produce "gpu_memcpy" events, leading to a validation error (see [code ref](https://www.internalfb.com/code/fbsource/[475c7d69dbbe]/fbcode/caffe2/test/profiler/test_profiler.py?lines=2392%2C2419)). The payload here is doing .cuda() calls which should trigger the memcpy.

In the RocprofLogger.cpp file, we request Rocprofiler to return 'ROCPROFILER_BUFFER_TRACING_MEMORY_COPY' events thru [`rocprofiler_configure_buffer_tracing_service`](https://www.internalfb.com/code/fbsource/[475c7d69dbbe]/fbcode/kineto/libkineto/src/RocprofLogger.cpp?lines=427). However throughout the rest of the Rocprof logic we are expecting ROCPROFILER_CALLBACK_TRACING_MEMORY_COPY calls. In the switch statements modified in this diff, this causes the mem copy calls to fall back to kernel type by default -- so gpu_memcpy events are missing from the gpu trace + we're unable to get the kind of mem copy.

Reviewed By: scotts

Differential Revision: D96124233
…orch#1295)

Summary:

While debugging the test_disable_external_correlation profiler test, I found a bug where ROCm profiles didn't seem to produce "gpu_memcpy" events, leading to a validation error (see [code ref](https://www.internalfb.com/code/fbsource/[475c7d69dbbe]/fbcode/caffe2/test/profiler/test_profiler.py?lines=2392%2C2419)). The payload here is doing .cuda() calls which should trigger the memcpy.

In the RocprofLogger.cpp file, we request Rocprofiler to return 'ROCPROFILER_BUFFER_TRACING_MEMORY_COPY' events thru [`rocprofiler_configure_buffer_tracing_service`](https://www.internalfb.com/code/fbsource/[475c7d69dbbe]/fbcode/kineto/libkineto/src/RocprofLogger.cpp?lines=427). However throughout the rest of the Rocprof logic we are expecting ROCPROFILER_CALLBACK_TRACING_MEMORY_COPY calls. In the switch statements modified in this diff, this causes the mem copy calls to fall back to kernel type by default -- so gpu_memcpy events are missing from the gpu trace + we're unable to get the kind of mem copy.

Reviewed By: scotts

Differential Revision: D96124233
@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Mar 12, 2026

This pull request has been merged in 03ab8cb.

pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Mar 19, 2026
#177745)

1. Validates that some important keys are present in the metadata for kernel events in the output json. Previously we had an issue where "grid" was not being returned in JSON, so we check that all of "device", "stream", "correlation", "grid", "block" are present.
2. Modifies the `payload()` used in the profiler tests to configure tensor size. We were experiencing issues on ROCm `test_disable_external_correlation` because gpu_memcpy events were not showing up in the trace. Half of this was fixed in pytorch/kineto#1295, but gpu_memcpy is only triggered when the kernel being copied is larger than some size, so we increase the tensor size to fix the test.
Pull Request resolved: #177745
Approved by: https://github.com/jiannanWang, https://github.com/divyanshk
ryanzhang22 added a commit to ryanzhang22/pytorch that referenced this pull request Mar 19, 2026
pytorch#177745)

1. Validates that some important keys are present in the metadata for kernel events in the output json. Previously we had an issue where "grid" was not being returned in JSON, so we check that all of "device", "stream", "correlation", "grid", "block" are present.
2. Modifies the `payload()` used in the profiler tests to configure tensor size. We were experiencing issues on ROCm `test_disable_external_correlation` because gpu_memcpy events were not showing up in the trace. Half of this was fixed in pytorch/kineto#1295, but gpu_memcpy is only triggered when the kernel being copied is larger than some size, so we increase the tensor size to fix the test.
Pull Request resolved: pytorch#177745
Approved by: https://github.com/jiannanWang, https://github.com/divyanshk
AaronWang04 pushed a commit to AaronWang04/pytorch that referenced this pull request Mar 31, 2026
pytorch#177745)

1. Validates that some important keys are present in the metadata for kernel events in the output json. Previously we had an issue where "grid" was not being returned in JSON, so we check that all of "device", "stream", "correlation", "grid", "block" are present.
2. Modifies the `payload()` used in the profiler tests to configure tensor size. We were experiencing issues on ROCm `test_disable_external_correlation` because gpu_memcpy events were not showing up in the trace. Half of this was fixed in pytorch/kineto#1295, but gpu_memcpy is only triggered when the kernel being copied is larger than some size, so we increase the tensor size to fix the test.
Pull Request resolved: pytorch#177745
Approved by: https://github.com/jiannanWang, https://github.com/divyanshk
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants