profiler.export_stacks doesn't return stack trace unless experimental_config is provided #100253

NicolayP · 2023-04-28T16:43:13Z

🐛 Describe the bug

Since I upgraded torch from 1.13.0+cu117 to 2.0.0+cu117, the following code isn't logging nor printing the stack trace.

import torch
from torch.profiler import profile, record_function

a = torch.rand(100, 100)
b = torch.rand(100, 100)

with profile(with_stack=True, profile_memory=True) as prof:
    with record_function("model_inference"):
        torch.add(a, b)

prof.export_stacks("myFile", "self_cpu_time_total")

print(prof.key_averages(group_by_stack_n=5))

for 1.13.0+cu117 the file is logged correctly:

test_prof.py(8):_<module>;torch/autograd/profiler.py(478):___init__;<built-in_method_zeros_of_type_object_at_0x7f55776329c0> 45
test_prof.py(8):_<module>;torch/autograd/profiler.py(478):___init__;<built-in_method_zeros_of_type_object_at_0x7f55776329c0> 30
test_prof.py(8):_<module>;torch/autograd/profiler.py(478):___init__;<built-in_method_zeros_of_type_object_at_0x7f55776329c0> 5
test_prof.py(8):_<module>;torch/autograd/profiler.py(487):___enter__;torch/_ops.py(437):___call__;<built-in_method__record_function_enter_of_PyCapsule_object_at_0x7f549a9189f0> 143
test_prof.py(8):_<module>;torch/autograd/profiler.py(487):___enter__;torch/_ops.py(437):___call__;<built-in_method__record_function_enter_of_PyCapsule_object_at_0x7f549a9189f0> 3
test_prof.py(8):_<module>;<built-in_method_add_of_type_object_at_0x7f55776329c0> 85
test_prof.py(8):_<module>;torch/profiler/profiler.py(475):___exit__;torch/profiler/profiler.py(484):_stop;torch/profiler/profiler.py(511):__transit_action;torch/profiler/profiler.py(117):_stop_trace;torch/autograd/profiler.py(223):___exit__;torch/cuda/__init__.py(556):_synchronize;torch/cuda/__init__.py(201):__lazy_init;<built-in_function__cuda_init> 1269
test_prof.py(8):_<module>;torch/profiler/profiler.py(475):___exit__;torch/profiler/profiler.py(484):_stop;torch/profiler/profiler.py(511):__transit_action;torch/profiler/profiler.py(117):_stop_trace;torch/autograd/profiler.py(223):___exit__;torch/cuda/__init__.py(556):_synchronize;torch/cuda/__init__.py(201):__lazy_init;torch/cuda/__init__.py(108):__check_capability;torch/cuda/__init__.py(344):_get_device_capability;torch/cuda/__init__.py(361):_get_device_properties;<built-in_method__get_device_properties_of_PyCapsule_object_at_0x7f5597a3d8d0> 175
test_prof.py(8):_<module>;torch/profiler/profiler.py(475):___exit__;torch/profiler/profiler.py(484):_stop;torch/profiler/profiler.py(511):__transit_action;torch/profiler/profiler.py(117):_stop_trace;torch/autograd/profiler.py(223):___exit__;torch/cuda/__init__.py(556):_synchronize;<built-in_function__cuda_synchronize> 252937

for 2.0.0+117cu however, I get an empty file. Is there something I'm missing?

Versions

For the env with torch 1.13.0

Collecting environment information...
PyTorch version: 1.13.0+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.6 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.10.2
Libc version: glibc-2.27

Python version: 3.9.15 (main, Nov 24 2022, 14:31:59)  [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.4.0-147-generic-x86_64-with-glibc2.27
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1650
Nvidia driver version: 525.105.17
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.8.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.8.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.8.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.8.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.8.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.8.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.8.1
/usr/local/cuda-11.7/targets/x86_64-linux/lib/libcudnn.so.8
/usr/local/cuda-11.7/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8
/usr/local/cuda-11.7/targets/x86_64-linux/lib/libcudnn_adv_train.so.8
/usr/local/cuda-11.7/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8
/usr/local/cuda-11.7/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8
/usr/local/cuda-11.7/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8
/usr/local/cuda-11.7/targets/x86_64-linux/lib/libcudnn_ops_train.so.8
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              16
On-line CPU(s) list: 0-15
Thread(s) per core:  2
Core(s) per socket:  8
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               158
Model name:          Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Stepping:            13
CPU MHz:             964.671
CPU max MHz:         5000.0000
CPU min MHz:         800.0000
BogoMIPS:            4800.00
Virtualisation:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            16384K
NUMA node0 CPU(s):   0-15
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities

Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] torch==1.13.0
[pip3] torchaudio==0.13.0
[pip3] torchvision==0.14.0
[conda] cudatoolkit               11.0.3              h88f8997_10    conda-forge
[conda] numpy                     1.23.5                   pypi_0    pypi
[conda] torch                     1.13.0                   pypi_0    pypi
[conda] torchaudio                0.13.0                   pypi_0    pypi
[conda] torchvision               0.14.0                   pypi_0    pypi

For torch 2.0.0:

Collecting environment information...
PyTorch version: 2.0.0+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.6 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.26.3
Libc version: glibc-2.27

Python version: 3.9.16 (main, Mar  8 2023, 14:00:05)  [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.4.0-147-generic-x86_64-with-glibc2.27
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1650
Nvidia driver version: 525.105.17
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.8.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.8.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.8.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.8.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.8.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.8.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.8.1
/usr/local/cuda-11.7/targets/x86_64-linux/lib/libcudnn.so.8
/usr/local/cuda-11.7/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8
/usr/local/cuda-11.7/targets/x86_64-linux/lib/libcudnn_adv_train.so.8
/usr/local/cuda-11.7/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8
/usr/local/cuda-11.7/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8
/usr/local/cuda-11.7/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8
/usr/local/cuda-11.7/targets/x86_64-linux/lib/libcudnn_ops_train.so.8
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              16
On-line CPU(s) list: 0-15
Thread(s) per core:  2
Core(s) per socket:  8
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               158
Model name:          Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Stepping:            13
CPU MHz:             1672.804
CPU max MHz:         5000.0000
CPU min MHz:         800.0000
BogoMIPS:            4800.00
Virtualisation:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            16384K
NUMA node0 CPU(s):   0-15
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities

Versions of relevant libraries:
[pip3] numpy==1.24.3
[pip3] torch==2.0.0
[pip3] torchvision==0.15.1
[pip3] triton==2.0.0
[conda] numpy                     1.24.3                   pypi_0    pypi
[conda] torch                     2.0.0                    pypi_0    pypi
[conda] torchvision               0.15.1                   pypi_0    pypi
[conda] triton                    2.0.0                    pypi_0    pypi

cc @ezyang @gchanan @zou3519 @robieta @chaekit @aaronenyeshi @ngimel @nbcsm @guotuofeng @guyang3532 @gaoteng-git @tiffzhaofb @dzhulgakov @davidberard98

The text was updated successfully, but these errors were encountered:

Zeyad-Abdelreheem · 2023-04-29T17:28:59Z

I traced your code and found the following, the issue is in the profile object (prof)
exactly in this line

the above line gives 2 different results in the 2 torch versions, it should return the list of events that will be parsed to create the stack trace you need.

It was working fine in torch version = 1.13.0 but in 2.0.0 it seems to be not working I couldn't see its source code.

Here is a test:-

In version 2.0.0 the returned list of events can't be parsed by the subsequent functions, so the stack traces are all empty which resulted in giving you an empty file.

I traced all other functions and found they are exactly the same between the 2 versions.

NicolayP · 2023-04-30T15:41:31Z

Hi @Zeyad-Abdelreheem thanks, at least i can eliminate a problem on my side if it's reproducable. Any idea who to @ to get help ?

Zeyad-Abdelreheem · 2023-04-30T15:57:15Z

Not sure, probably @albanD or @apaszke.

albanD · 2023-05-01T16:59:26Z

Hey!
We have a process to triage issue and automatic cc are added once the right label is added by the triager.
Hopefully someone from the profiler team can help cc @chaekit in particular.

Also tentative high priority.

nihaljn · 2023-05-27T06:08:57Z

Any updates on this?

malfet · 2023-06-05T17:17:03Z

Will take an ai to find a PoC and figure out how it missed regression testing during the release.

aaronenyeshi · 2023-06-05T19:31:57Z

Hi @NicolayP, our profiler team has recorded this issue, and will assign a POC to look into the issue. There was a significant rework to the profiler in the past year, and this regression may have been caused by that. Sorry for the inconvenience.

davidberard98 · 2023-06-06T02:49:16Z

Bisect still in progress but looks likely to be from #86263, will take a closer look tomorrow after the bisect finishes.

davidberard98 · 2023-06-07T03:09:38Z

It looks like you can turn this back on with this:

...
with profile(with_stack=True, profile_memory=True, experimental_config=torch._C._profiler._ExperimentalConfig(verbose=True)) as prof:
...

However, this doesn't show all the same information as it used to; I'll continue investigating

davidberard98 · 2023-06-07T23:57:41Z

@nihaljn @NicolayP , for now you can use the experimental_config described above. We'll consider rewriting export_stacks to avoid needing this extra config.

yundai424 · 2023-08-11T00:03:46Z

hi @davidberard98 friendly checking in a bit, do we by chance have any update on this issue? Thanks!

davidberard98 · 2023-08-11T00:12:16Z

We don't have any immediate plans to change this behavior; you should use the experimental_config option if you want this feature.

pytorch/pytorch#100253

srdecny · 2023-11-01T15:50:57Z

It seems like with torch==2.1.0+cu118, the suggested experimental_config workaround no longer works and yields an empty file when calling export_stacks.

oliver-batchelor · 2024-01-13T11:31:47Z

Any update on this? I've been running into the same issue in both 2.0 and 2.1. No stack information

MilkClouds · 2024-02-13T04:53:14Z

No stack information on torch 2.2.0, but experimental_config works.

fzyzcjy · 2024-03-22T06:05:19Z

Hi, is there any updates? This bug seems to still be there with latest pytorch.

mx · 2024-04-20T00:03:33Z

Can confirm that on 2.2.0, even with the experimental_config as described above, output stack traces are empty.

DuaneNielsen · 2024-04-21T19:47:56Z

This worked on pytorch 2.2.2+cu121

    def trace_handler(p):
        output = p.key_averages(group_by_stack_n=5).table(sort_by="self_cuda_memory_usage", row_limit=20)
        print(output)
        p.export_chrome_trace("./trace_" + str(p.step_num) + ".json")


    with profile(
            activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
            schedule=torch.profiler.schedule(
                wait=1,
                warmup=1,
                active=2),
            on_trace_ready=trace_handler,
            profile_memory=True,
            with_stack=True,
            experimental_config=torch._C._profiler._ExperimentalConfig(verbose=True)
    ) as p:

I have to say... there is a huge difference between knowing you have 3.8Gb allocated from aten::empty.. and knowing you have 3.8Gb allocated from torch/nn/functional.py(2451): batch_norm

The latter is a lot more actionable that the former.

botbw · 2024-04-30T12:43:27Z

It seems experimental_config is still needed when using torch==2.3.0

Linusnie · 2024-06-02T12:40:31Z

For me it's not working either with or without experimental config on 2.3.0+cu118

Summary: Many users have been complaining that with stack does not work on its own as described in the our pytorch tutorials. If used it returns an empty python stack. It only returns a stack if JIT is enabled. pytorch#89406 pytorch#95238 pytorch#100253 pytorch#117515 Test Plan: Edited unit test to only contain with_stack Differential Revision: D58149768

ngimel added the oncall: profiler profiler-related issues (cpu, gpu, kineto) label May 1, 2023

albanD added the high priority label May 1, 2023

pytorch-bot bot added the triage review label May 1, 2023

albanD removed the oncall: profiler profiler-related issues (cpu, gpu, kineto) label Jun 5, 2023

malfet added the module: regression It used to work, and now it doesn't label Jun 5, 2023

janeyx99 added oncall: profiler profiler-related issues (cpu, gpu, kineto) and removed triage review labels Jun 5, 2023

davidberard98 self-assigned this Jun 5, 2023

davidberard98 changed the title ~~Profiler doesn't return stack trace~~ profiler.export_stacks doesn't return stack trace unless experimental_config is provided Jun 7, 2023

npuichigo mentioned this issue Aug 25, 2023

PyTorch profile issues summary #107955

Closed

albertz added a commit to rwth-i6/returnn that referenced this issue Sep 21, 2023

demo RF PT bench, fix profiling

30c4eb8

pytorch/pytorch#100253

yiftachbeer mentioned this issue Sep 28, 2023

profiler doesn't work with multiple workers #109969

Closed

sraikund16 mentioned this issue Jun 4, 2024

[Profiler] Fix usage of with_stack to not need ExperimentalConfig #127951

Closed

sraikund16 closed this as completed Jun 5, 2024

sraikund16 reopened this Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

profiler.export_stacks doesn't return stack trace unless experimental_config is provided #100253

profiler.export_stacks doesn't return stack trace unless experimental_config is provided #100253

NicolayP commented Apr 28, 2023 •

edited by pytorch-bot bot

Zeyad-Abdelreheem commented Apr 29, 2023

NicolayP commented Apr 30, 2023

Zeyad-Abdelreheem commented Apr 30, 2023

albanD commented May 1, 2023

nihaljn commented May 27, 2023

malfet commented Jun 5, 2023

aaronenyeshi commented Jun 5, 2023 •

edited

davidberard98 commented Jun 6, 2023

davidberard98 commented Jun 7, 2023

davidberard98 commented Jun 7, 2023

yundai424 commented Aug 11, 2023

davidberard98 commented Aug 11, 2023

srdecny commented Nov 1, 2023

oliver-batchelor commented Jan 13, 2024

MilkClouds commented Feb 13, 2024

fzyzcjy commented Mar 22, 2024

mx commented Apr 20, 2024

DuaneNielsen commented Apr 21, 2024

botbw commented Apr 30, 2024 •

edited

Linusnie commented Jun 2, 2024

profiler.export_stacks doesn't return stack trace unless experimental_config is provided #100253

profiler.export_stacks doesn't return stack trace unless experimental_config is provided #100253

Comments

NicolayP commented Apr 28, 2023 • edited by pytorch-bot bot

🐛 Describe the bug

Versions

Zeyad-Abdelreheem commented Apr 29, 2023

NicolayP commented Apr 30, 2023

Zeyad-Abdelreheem commented Apr 30, 2023

albanD commented May 1, 2023

nihaljn commented May 27, 2023

malfet commented Jun 5, 2023

aaronenyeshi commented Jun 5, 2023 • edited

davidberard98 commented Jun 6, 2023

davidberard98 commented Jun 7, 2023

davidberard98 commented Jun 7, 2023

yundai424 commented Aug 11, 2023

davidberard98 commented Aug 11, 2023

srdecny commented Nov 1, 2023

oliver-batchelor commented Jan 13, 2024

MilkClouds commented Feb 13, 2024

fzyzcjy commented Mar 22, 2024

mx commented Apr 20, 2024

DuaneNielsen commented Apr 21, 2024

botbw commented Apr 30, 2024 • edited

Linusnie commented Jun 2, 2024

NicolayP commented Apr 28, 2023 •

edited by pytorch-bot bot

aaronenyeshi commented Jun 5, 2023 •

edited

botbw commented Apr 30, 2024 •

edited