Added FFTMod #67

Chillee · 2019-04-23T03:14:22Z

I think I've seen a lot of people have this in their library. Allows us to perform FFT over arbitrary moduli without needing to do finicking with primes close to a power of 2 and such. Really, if we ever need more space, I don't think there's anywhere NTT can be used that FFT-MOD can't. An optimized FFT-MOD is only about twice as slow as an optimized NTT.

On the other hand, NTT is pretty cool, and it is seriously nice to not need to think about precision at all.

Here's an example exemplifying the precision issues (I'll turn it into a fuzz test later). Basically, as long as the coefficients don't surpass ~1e15 FFT is fine. Higher though, and you start to see precision issues.
https://ideone.com/lUrDCT

It is kind of ugly though :'(. I tried my best to shorten it, but it's still uglier than I'd like.

This is somewhat WIP - I still need to add a description about it. But I'd appreciate feedback on the necessity as well as the code itself.

simonlindholm · 2019-04-23T10:35:56Z

Here's an example exemplifying the precision issues (I'll turn it into a fuzz test later). Basically, as long as the coefficients don't surpass ~1e15 FFT is fine. Higher though, and you start to see precision issues.

So basically, if MOD ~ 1e9, we start seeing precision issues at around n ~ 1e6? That's uncomfortably small :( Especially so since this is at random inputs; the worst case may be, well, worse. I know I researched this before, and most theoretical guarantees were pretty weak -- the FFT currently pessimistically says the following: "For integers rounding works if $(|a| + |b|)\max(a, b) < ~ 10^9$, or in theory maybe $10^6$." And it would take an annoying amount of research to determine whether one can actually get closer to the worst case...

It is kind of ugly though :'(. I tried my best to shorten it, but it's still uglier than I'd like.

Yeah. :/ Is there any chance of shortening it by making it call the existing conv method?

One alternative, though it loses on code size, is to say that if NTT * 2 ~ FFT-MOD, then can't we run two NTT's and use CRT for the result for about the same performance? It avoids the precision problems, and with luck it might be relatively short since it can reuse a lot of existing KACTL code. (This is currently noted in the FFT description, but the implementation is left as an exercise to the reader.)

necessity

I don't know, it's tricky making these sorts of decisions... vague preference towards keeping just the NTT and existing FFT description, but it depends on code size and stuff... Though even if we don't include this, it's always nice to have a high-quality implementation that's not included in the pdf, which enterprising teams can uncomment if they think it's important to them.

content/numerical/FastFourierTransform.h

Chillee · 2019-04-24T02:18:18Z

I'll check out how a CRT + NTT based FFTMod equivalent might look like.

Chillee · 2019-04-25T03:37:00Z

Unluckily, CRT+NTT is 3x slower. Even though our NTT is very fast, the bottleneck isn't the NTT, it's the CRT with 128 bit numbers.

The benchmarks look like this:

254ms elapsed [neal]
201ms elapsed [mine]

722ms elapsed [ntt crt]
    283ms elapsed [ntt portion]
    436ms elapsed [crt portion]

~~Which is a shame, since it's actually quite easy to implement.~~
Actually, implementing it in KACTL's fashion is actually fairly tricky, due to the need to template our mods. We would probably need to add a templated mod to modpow, to ntt, and to conv. In addition, we'd need to make CRT take in __int128_t (the best way would probably be a template for type as well).

Chillee · 2019-04-25T03:51:28Z

So, TL;DR: I think we should still include FFT-MOD. I'll think more about whether we can simplify it.

Chillee · 2019-04-25T05:02:11Z

Yeah. :/ Is there any chance of shortening it by making it call the existing conv method?

Unluckily, I doubt it. Not without significant runtime losses at least (probably 2x). The issue is that if we do a bunch of repeated calls to existing conv, we're gonna be redoing a bunch of FFT's (twice for each polynomial). A total of 8 FFT's for reusing existing conv vs the current 4 FFT's.

Chillee · 2019-04-25T05:25:12Z

So I tried it out and it would look look like this

template <int MOD> vector<ll> convMod(const vi &a, const vi &b) {
    if (a.empty() || b.empty()) return {};
    ll cut = sqrt(MOD);
    vector<double> ah(sz(a)), al(ah), bh(sz(b)), bl(bh);
    rep(i, 0, sz(a)) ah[i] = a[i] / cut, al[i] = a[i] % cut;
    rep(i, 0, sz(b)) bh[i] = b[i] / cut, bl[i] = b[i] % cut;
    auto h = conv(ah, bh), l = conv(al, bl);
    auto m1 = conv(ah, bl), m2 = conv(al, bh);
    vector<ll> res(h.size());
    for (int i = 0; i < res.size(); i++) {
        ll av = round(h[i]);
        ll bv = round(m1[i]) + round(m2[i]);
        ll cv = round(l[i]);
        av %= MOD, bv %= MOD, cv %= MOD;
        res[i] = (av * cut * cut + bv * cut + cv) % MOD;
    }
    return res;
}

Significantly shorter, but still kind of weighty. Also ~2x slower.

241ms elapsed [mine]
497ms elapsed [reuse]

Chillee · 2019-04-25T06:57:16Z

One thing I did was to move the root computation from the conv routines to the FFT routine itself. Computing the roots should only take a small percentage (~5%) of the total time needed, so this isn't much of a loss. In regular FFT, this should correspond to a 5% slowdown since we would be computing roots unnecessarily once. In FFT-MOD this should correspond to a 15% slowdown (I actually get 243 vs 212 ms).

I think this is a worthy tradeoff.

simonlindholm · 2019-04-25T15:39:32Z

That seems reasonable. (I imagine it's a larger percentage for small n, though.) I think convMod should probably go in a separate file. Also, can you fix the -Wconversion warnings?

Chillee · 2019-04-29T05:57:11Z

I did some proper benchmarking and it seems like recalculating the roots (with long double calculations) slows it down by 25%. If we're only using double it slows it down by ~10%.

I think that's decently substantial and worth fixing (if we can do it with just only a few lines).

simonlindholm

Code mostly looks good, still need to look into precision issues and description wordsmithing

content/numerical/FastFourierTransform.h

10% slower, but reused across FFT's so should be fine for any reasonable problem.

simonlindholm · 2019-06-22T14:18:19Z

https://github.com/simonlindholm/fft-precision/blob/master/fft-precision.md has a bunch of analysis of FFT precision. I'll try to merge this soon.

Chillee · 2019-06-24T21:43:12Z

:O. Awesome!

Added FFTMod

6011aa5

Chillee changed the title ~~Added FFTMod as well~~ Added FFTMod Apr 23, 2019

simonlindholm reviewed Apr 23, 2019

View reviewed changes

content/numerical/FastFourierTransform.h Outdated Show resolved Hide resolved

Shortened a bit

ba0cc27

Chillee added 2 commits April 25, 2019 02:40

Shortened FFTMod and refactored roots code into FFT itself

bba7a32

Updated formatting

a2b9413

removed another 2 lines

e83e65b

Chillee added 11 commits April 25, 2019 16:05

Moved FFTMod to different file and fixed -Wconversion errors

5b5a4f4

Updated headers for FFTMod

12f0a0a

Updated header

75ba4da

Moved numerical precision commnets to description

81e2ae8

Merge branch 'master' into fftmod

77adbd4

Fixed typo

5243182

Merge branch 'fftmod' of github.com:Chillee/kactl into fftmod

ba5bc7d

Made things fit within 63 columns

ace08ab

Fixed some formatting issues

7d26dee

Fixed rep space issues

7c16aeb

Fixed spacing issues

c77b038

Chillee force-pushed the fftmod branch 4 times, most recently from b886871 to 951f07d Compare April 26, 2019 18:38

Fixed the extraneous use of Cd

dcb957b

Chillee added 2 commits April 29, 2019 02:02

Fixed some issues and simplified input API

93b71fd

Removed n from function parameters

5526de4

simonlindholm reviewed Apr 29, 2019

View reviewed changes

content/numerical/FastFourierTransform.h Outdated Show resolved Hide resolved

content/numerical/FastFourierTransform.h Outdated Show resolved Hide resolved

content/numerical/FastFourierTransform.h Outdated Show resolved Hide resolved

Chillee and others added 16 commits April 29, 2019 21:50

Fixed formatting issue

f37262a

Updated in response to comments

5136a02

Higher precision FFT

7ec3cfc

10% slower, but reused across FFT's so should be fine for any reasonable problem.

Shorter description

7d00189

Merge branch 'master' into fftmod

c104ed3

Preliminary description updates

1d527d1

Fix FFTMod after root computation updates

a36aee5

Shave off a few chars

675d08c

Naive FFTMod fuzz-test

6e161a9

Remove recommendation about CRT

5bd55d9

Update theoretical bounds

94c31ef

Remove complex<long double> typedef

b8d00da

Update test

2a2957b

Proof of FFT-MOD bound was buggy :(

12c374d

Description updates

3f09df5

Comment about long double perf

df8c82c

simonlindholm added 4 commits June 24, 2019 01:57

Update FFT fuzz-test

3cfc69e

Don't subtly break FFT-MOD on mod > 2^32

597b5fa

Merge branch 'master' into fftmod

4251223

Fuzz-test consistency

0fa3607

simonlindholm merged commit b3be665 into kth-competitive-programming:master Jun 24, 2019

simonlindholm added the addition label Nov 8, 2019

hockyy pushed a commit to hockyy/kactl that referenced this pull request Oct 24, 2023

Added FFTMod (kth-competitive-programming#67)

5561a7d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added FFTMod #67

Added FFTMod #67

Chillee commented Apr 23, 2019 •

edited

simonlindholm commented Apr 23, 2019

Chillee commented Apr 24, 2019 •

edited

Chillee commented Apr 25, 2019 •

edited

Chillee commented Apr 25, 2019

Chillee commented Apr 25, 2019 •

edited

Chillee commented Apr 25, 2019

Chillee commented Apr 25, 2019 •

edited

simonlindholm commented Apr 25, 2019

Chillee commented Apr 29, 2019 •

edited

simonlindholm left a comment

simonlindholm commented Jun 22, 2019

Chillee commented Jun 24, 2019 •

edited

Added FFTMod #67

Added FFTMod #67

Conversation

Chillee commented Apr 23, 2019 • edited

simonlindholm commented Apr 23, 2019

Chillee commented Apr 24, 2019 • edited

Chillee commented Apr 25, 2019 • edited

Chillee commented Apr 25, 2019

Chillee commented Apr 25, 2019 • edited

Chillee commented Apr 25, 2019

Chillee commented Apr 25, 2019 • edited

simonlindholm commented Apr 25, 2019

Chillee commented Apr 29, 2019 • edited

simonlindholm left a comment

Choose a reason for hiding this comment

simonlindholm commented Jun 22, 2019

Chillee commented Jun 24, 2019 • edited

Chillee commented Apr 23, 2019 •

edited

Chillee commented Apr 24, 2019 •

edited

Chillee commented Apr 25, 2019 •

edited

Chillee commented Apr 25, 2019 •

edited

Chillee commented Apr 25, 2019 •

edited

Chillee commented Apr 29, 2019 •

edited

Chillee commented Jun 24, 2019 •

edited