CUDA irfft may be doing unnecessary cloning of input #38413

ssnl · 2020-05-13T18:19:09Z

Context:

In cuFFT,

Out-of-place complex-to-real FFT will overwrite input buffer if custom strides are set by the user.

We therefore had this check in CuFFTPlanCache.h

pytorch/aten/src/ATen/native/cuda/CuFFTPlanCache.h

Lines 177 to 192 in 899a075

    
           // Note that this is before the actual cloning. This is intentional so we can 
        
           // check for advanced data layout with complex-to-real transform. cuFFT 
        
           // out-of-place complex-to-real transforms with advanced layout may overwrite 
        
           // input, and we need to clone the input. 
        
           // 
        
           // This just needs contiguity in cases except for twosided real-to-complex 
        
           // transform where we won't have simple data layout as output is two sided. 
        
           // 
        
           // See NOTE [ cuFFT Embedded Strides ] in native/cuda/SpectralOps.cu. 
        
           bool simple_layout = !(!complex_input && complex_output && !onesided) &&  // not twosided R2C 
        
                                (clone_input || input.is_contiguous());              // contiguous 
        
           if (!simple_layout && complex_input && !complex_output) { 
        
             clone_input = true; 
        
             simple_layout = true; 
        
           }

Users still reported irfft input being modified on T4 Multidimensional CUDA irfft modifies input #34551
An unconditional input cloning for irfft seemed to have fixed it Fix input overwriting in irfft #35219

We should figure out why the previous check doesn't detect all the cases, whether it is a bug in our check or in cuFFT. I don't have access to a T4 so I write this issue to document the situation in case anyone wants to take a look.

cc @ngimel @mruberry @peterbell10 @VitalyFedyunin

The text was updated successfully, but these errors were encountered:

peterbell10 · 2020-10-14T17:08:04Z

In cuFFT,

Out-of-place complex-to-real FFT will overwrite input buffer if custom strides are set by the user.

Today, that same documentation says:

Out-of-place complex-to-real FFT will always overwrite input buffer.

So, it seems this was just an error in the CuFFT documentation.

ngimel · 2020-10-14T17:26:41Z

Thanks for checking, @peterbell10 ! Closing the issue, the clone looks to be necessary.

ssnl added module: cuda Related to torch.cuda, and CUDA support in general module: operators labels May 13, 2020

jerryzh168 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 15, 2020

mruberry added module: fft module: performance Issues related to performance, either of kernel code or framework glue and removed module: operators (deprecated) labels Oct 7, 2020

mruberry mentioned this issue Oct 7, 2020

torch.fft tracking issue #42175

Closed

37 tasks

ngimel closed this as completed Oct 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA irfft may be doing unnecessary cloning of input #38413

CUDA irfft may be doing unnecessary cloning of input #38413

ssnl commented May 13, 2020 •

edited by pytorch-probot bot

peterbell10 commented Oct 14, 2020

ngimel commented Oct 14, 2020

CUDA irfft may be doing unnecessary cloning of input #38413

CUDA irfft may be doing unnecessary cloning of input #38413

Comments

ssnl commented May 13, 2020 • edited by pytorch-probot bot

peterbell10 commented Oct 14, 2020

ngimel commented Oct 14, 2020

ssnl commented May 13, 2020 •

edited by pytorch-probot bot