CUDA irfft may be doing unnecessary cloning of input #38413
Labels
module: cuda
Related to torch.cuda, and CUDA support in general
module: fft
module: performance
Issues related to performance, either of kernel code or framework glue
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Context:
CuFFTPlanCache.h
pytorch/aten/src/ATen/native/cuda/CuFFTPlanCache.h
Lines 177 to 192 in 899a075
We should figure out why the previous check doesn't detect all the cases, whether it is a bug in our check or in cuFFT. I don't have access to a T4 so I write this issue to document the situation in case anyone wants to take a look.
cc @ngimel @mruberry @peterbell10 @VitalyFedyunin
The text was updated successfully, but these errors were encountered: