Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA irfft may be doing unnecessary cloning of input #38413

Closed
ssnl opened this issue May 13, 2020 · 2 comments
Closed

CUDA irfft may be doing unnecessary cloning of input #38413

ssnl opened this issue May 13, 2020 · 2 comments
Labels
module: cuda Related to torch.cuda, and CUDA support in general module: fft module: performance Issues related to performance, either of kernel code or framework glue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@ssnl
Copy link
Collaborator

ssnl commented May 13, 2020

Context:

  1. In cuFFT,

    Out-of-place complex-to-real FFT will overwrite input buffer if custom strides are set by the user.

  2. We therefore had this check in CuFFTPlanCache.h
    // Note that this is before the actual cloning. This is intentional so we can
    // check for advanced data layout with complex-to-real transform. cuFFT
    // out-of-place complex-to-real transforms with advanced layout may overwrite
    // input, and we need to clone the input.
    //
    // This just needs contiguity in cases except for twosided real-to-complex
    // transform where we won't have simple data layout as output is two sided.
    //
    // See NOTE [ cuFFT Embedded Strides ] in native/cuda/SpectralOps.cu.
    bool simple_layout = !(!complex_input && complex_output && !onesided) && // not twosided R2C
    (clone_input || input.is_contiguous()); // contiguous
    if (!simple_layout && complex_input && !complex_output) {
    clone_input = true;
    simple_layout = true;
    }
  3. Users still reported irfft input being modified on T4 Multidimensional CUDA irfft modifies input #34551
  4. An unconditional input cloning for irfft seemed to have fixed it Fix input overwriting in irfft #35219

We should figure out why the previous check doesn't detect all the cases, whether it is a bug in our check or in cuFFT. I don't have access to a T4 so I write this issue to document the situation in case anyone wants to take a look.

cc @ngimel @mruberry @peterbell10 @VitalyFedyunin

@ssnl ssnl added module: cuda Related to torch.cuda, and CUDA support in general module: operators labels May 13, 2020
@jerryzh168 jerryzh168 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 15, 2020
@mruberry mruberry added module: fft module: performance Issues related to performance, either of kernel code or framework glue and removed module: operators (deprecated) labels Oct 7, 2020
@mruberry mruberry mentioned this issue Oct 7, 2020
37 tasks
@peterbell10
Copy link
Collaborator

  1. In cuFFT,

Out-of-place complex-to-real FFT will overwrite input buffer if custom strides are set by the user.

Today, that same documentation says:

Out-of-place complex-to-real FFT will always overwrite input buffer.

So, it seems this was just an error in the CuFFT documentation.

@ngimel
Copy link
Collaborator

ngimel commented Oct 14, 2020

Thanks for checking, @peterbell10 ! Closing the issue, the clone looks to be necessary.

@ngimel ngimel closed this as completed Oct 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: cuda Related to torch.cuda, and CUDA support in general module: fft module: performance Issues related to performance, either of kernel code or framework glue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

5 participants