-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profiler does not record CUDA times #124547
Comments
@mmeendez8 This is because you are not allocating any of the data onto the device. If you use
You will see the following chart:
|
Closing because this is just a user error |
I am sorry @sraikund16 I copied the CPU example from my code instead of the CUDA one. This is what I was trying: import torch
from torch.profiler import ProfilerActivity, schedule
from torch import Tensor
def my_normalize(input: Tensor, mean: Tensor, std: Tensor):
mean = mean.view(-1, 1, 1)
std = std.view(-1, 1, 1)
return (input - mean) / std
device = torch.device("cuda")
image_cuda = image.to(device,)
mean_cuda = mean.to(device)
std_cuda = std.to(device)
with torch.profiler.profile(
activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
schedule=schedule(wait=1, warmup=9, active=90, repeat=1),
record_shapes=True,
) as prof:
for i in range(1000):
r = my_normalize(image_cuda, mean_cuda, std_cuda)
prof.step()
print(prof.key_averages().table(sort_by="self_cpu_time_total", row_limit=10)) And this is the output I get:
Also if I change to the experimental profiler (as I saw mentioned in a couple issues) and run this code: import torch
from torch.profiler import ProfilerActivity, schedule
from torch import Tensor
def my_normalize(input: Tensor, mean: Tensor, std: Tensor):
mean = mean.view(-1, 1, 1)
std = std.view(-1, 1, 1)
return (input - mean) / std
device = torch.device("cuda")
image_cuda = image.to(device,)
mean_cuda = mean.to(device)
std_cuda = std.to(device)
with profile(with_stack=True,
profile_memory=True,
experimental_config=torch._C._profiler._ExperimentalConfig(verbose=True),
schedule=schedule(wait=1, warmup=9, active=90, repeat=1)) as prof:
for i in range(1000):
r = my_normalize(image_cuda, mean_cuda, std_cuda)
prof.step()
print(prof.key_averages().table(sort_by="self_cpu_time_total", row_limit=10)) I get a different output that includes CUDA memory usage but not CUDA times:
|
I have the same problem as you, have you solved this problem? |
Not at all @lu-renjie |
@mmeendez8 Sorry for missing this. I ran the following block of code:
and this was the result:
|
已收到,谢谢。
|
@mmeendez8 on second thought it sounds like you may have not installed kineto since the same code block works for me. Please check here: https://github.com/pytorch/kineto |
🐛 Describe the bug
I got the following result:
Versions
cc @robieta @chaekit @aaronenyeshi @guotuofeng @guyang3532 @dzhulgakov @davidberard98 @briancoutinho @sraikund16 @sanrise
The text was updated successfully, but these errors were encountered: