The fp32fft option #2

liuzhuang13 · 2021-07-10T13:39:31Z

Hello, thanks for your nice work!

I wonder what does the option fp32fft do. In my experiments the input and output to the fft function are already torch.float32, so I'm not sure why there is an option for converting to fp32. Thanks in advance

raoyongming · 2021-07-10T14:34:47Z

Hi, thanks for pointing out it.

I just check the dtype here. It's true that the input and output are already float32.

We have this option since we tried a linear -> fft -> global filter -> ifft architecture in our early experiments, where the tensors after linear will be converted to fp16 since we use Automatic Mixed Precision training following the implementation of DeiT. This fc will not improve the performance so we didn't use the architecture in our final models.

It seems the LayerNorm layer will convert the input back to fp32, so the fp32 option is redundant here. But it may be useful if the architecture is slightly modified.

liuzhuang13 · 2021-07-12T07:45:21Z

Thanks for your timely answer! Do you find performance difference in your original linear -> fft -> global filter -> ifft structure, between using fp32fft and not using fp32fft?

raoyongming · 2021-07-13T12:31:02Z

Oops, it seems the fft functions don't support complex fp16 tensors. My logs show fp32 is slightly better than fp16, but it may come from the differences between identical runs. The inputs should always be converted to fp32 when using fft functions. I have updated the code to avoid further confusion.

liuzhuang13 · 2021-07-14T09:22:08Z

Hello, I tested and it seems torch fft functions indeed don't support float16. But given that fft functions don't support fp16 inputs, how did you get a result with fp16, that is slightly worse than fp32? Thanks

raoyongming · 2021-07-15T01:22:17Z

I suspect there is an inconsistency between my old logs and the actual implementation. Maybe I ran two identical experiments in this case and the differences may come from the randomness during training. Since the above-mentioned model and the fp32fft option are only used in our early experiments, I didn't re-run the experiments to check this result. I think the correct implementation is always converting the input to fp32/fp64 before using the fft functions, and I have removed the option from our code. So sorry for the confusion.

liuzhuang13 closed this as completed Jul 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The fp32fft option #2

The fp32fft option #2

liuzhuang13 commented Jul 10, 2021

raoyongming commented Jul 10, 2021

liuzhuang13 commented Jul 12, 2021

raoyongming commented Jul 13, 2021

liuzhuang13 commented Jul 14, 2021

raoyongming commented Jul 15, 2021 •

edited

The fp32fft option #2

The fp32fft option #2

Comments

liuzhuang13 commented Jul 10, 2021

raoyongming commented Jul 10, 2021

liuzhuang13 commented Jul 12, 2021

raoyongming commented Jul 13, 2021

liuzhuang13 commented Jul 14, 2021

raoyongming commented Jul 15, 2021 • edited

raoyongming commented Jul 15, 2021 •

edited