The motivation for not fusioning fff(k) into the kernel #19

Doraemonzzz · 2023-03-08T15:53:54Z

Thanks for your great work. Here I want to ask why fft(k) is not fused into the kernel, is it a performance issue?
I mean why is it implemented as follows:

def fftconv_fast(u, k, D, dropout_mask):
     """Fuse padding + rfft + pointwise mult + ifft + multiply with D + gelu + dropout
     """
     seqlen = u.shape[-1]
     fft_size = 2 * seqlen
     k_f = torch.fft.rfft(k, n=fft_size)
     out = fftconv_fwd(u, k_f, D, dropout_mask, fft_size)
     return out

instead of:

def fftconv_fast(u, k, D, dropout_mask):
     """Fuse padding + rfft + pointwise mult + ifft + multiply with D + gelu + dropout
     """
     seqlen = u.shape[-1]
     fft_size = 2 * seqlen
     out = fftconv_fwd(u, k, D, dropout_mask, fft_size)
     return out

The text was updated successfully, but these errors were encountered:

DanFu09 · 2023-03-08T17:42:38Z

The shape of k is (H, L) so the IO cost isn’t too bad to compute it outside. The shape of u is (B, H, L) so that dominates the IO cost. We also parallelize by B and H so it would naively require extra computation.

…

On Wed, Mar 8, 2023 at 11:54 AM Doraemonzzz ***@***.***> wrote: Thanks for your great work. Here I want to ask why fft(k) is not fused into the kernel, is it a performance issue? I mean why is it implemented as follows: def fftconv_fast(u, k, D, dropout_mask): """Fuse padding + rfft + pointwise mult + ifft + multiply with D + gelu + dropout """ seqlen = u.shape[-1] fft_size = 2 * seqlen k_f = torch.fft.rfft(k, n=fft_size) out = fftconv_fwd(u, k_f, D, dropout_mask, fft_size) return out instead of: def fftconv_fast(u, k, D, dropout_mask): """Fuse padding + rfft + pointwise mult + ifft + multiply with D + gelu + dropout """ seqlen = u.shape[-1] fft_size = 2 * seqlen out = fftconv_fwd(u, k, D, dropout_mask, fft_size) return out — Reply to this email directly, view it on GitHub <#19>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABDDIITMIV5Z6B7BQAZJDTTW3CTR3ANCNFSM6AAAAAAVT6CM34> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Doraemonzzz · 2023-03-09T06:14:16Z

Make sense. Thanks.

Doraemonzzz closed this as completed Mar 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The motivation for not fusioning fff(k) into the kernel #19

The motivation for not fusioning fff(k) into the kernel #19

Doraemonzzz commented Mar 8, 2023

DanFu09 commented Mar 8, 2023 via email

Doraemonzzz commented Mar 9, 2023

The motivation for not fusioning fff(k) into the kernel #19

The motivation for not fusioning fff(k) into the kernel #19

Comments

Doraemonzzz commented Mar 8, 2023

DanFu09 commented Mar 8, 2023 via email

Doraemonzzz commented Mar 9, 2023