New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added FFTMod #67
Added FFTMod #67
Conversation
So basically, if MOD ~ 1e9, we start seeing precision issues at around n ~ 1e6? That's uncomfortably small :( Especially so since this is at random inputs; the worst case may be, well, worse. I know I researched this before, and most theoretical guarantees were pretty weak -- the FFT currently pessimistically says the following: "For integers rounding works if
Yeah. :/ Is there any chance of shortening it by making it call the existing conv method? One alternative, though it loses on code size, is to say that if NTT * 2 ~ FFT-MOD, then can't we run two NTT's and use CRT for the result for about the same performance? It avoids the precision problems, and with luck it might be relatively short since it can reuse a lot of existing KACTL code. (This is currently noted in the FFT description, but the implementation is left as an exercise to the reader.)
I don't know, it's tricky making these sorts of decisions... vague preference towards keeping just the NTT and existing FFT description, but it depends on code size and stuff... Though even if we don't include this, it's always nice to have a high-quality implementation that's not included in the pdf, which enterprising teams can uncomment if they think it's important to them. |
I'll check out how a CRT + NTT based FFTMod equivalent might look like. |
Unluckily, CRT+NTT is 3x slower. Even though our NTT is very fast, the bottleneck isn't the NTT, it's the CRT with 128 bit numbers. The benchmarks look like this:
|
So, TL;DR: I think we should still include FFT-MOD. I'll think more about whether we can simplify it. |
Unluckily, I doubt it. Not without significant runtime losses at least (probably 2x). The issue is that if we do a bunch of repeated calls to existing |
So I tried it out and it would look look like this template <int MOD> vector<ll> convMod(const vi &a, const vi &b) {
if (a.empty() || b.empty()) return {};
ll cut = sqrt(MOD);
vector<double> ah(sz(a)), al(ah), bh(sz(b)), bl(bh);
rep(i, 0, sz(a)) ah[i] = a[i] / cut, al[i] = a[i] % cut;
rep(i, 0, sz(b)) bh[i] = b[i] / cut, bl[i] = b[i] % cut;
auto h = conv(ah, bh), l = conv(al, bl);
auto m1 = conv(ah, bl), m2 = conv(al, bh);
vector<ll> res(h.size());
for (int i = 0; i < res.size(); i++) {
ll av = round(h[i]);
ll bv = round(m1[i]) + round(m2[i]);
ll cv = round(l[i]);
av %= MOD, bv %= MOD, cv %= MOD;
res[i] = (av * cut * cut + bv * cut + cv) % MOD;
}
return res;
} Significantly shorter, but still kind of weighty. Also ~2x slower.
|
One thing I did was to move the root computation from the I think this is a worthy tradeoff. |
That seems reasonable. (I imagine it's a larger percentage for small n, though.) I think convMod should probably go in a separate file. Also, can you fix the -Wconversion warnings? |
b886871
to
951f07d
Compare
I did some proper benchmarking and it seems like recalculating the roots (with I think that's decently substantial and worth fixing (if we can do it with just only a few lines). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code mostly looks good, still need to look into precision issues and description wordsmithing
10% slower, but reused across FFT's so should be fine for any reasonable problem.
https://github.com/simonlindholm/fft-precision/blob/master/fft-precision.md has a bunch of analysis of FFT precision. I'll try to merge this soon. |
:O. Awesome! |
I think I've seen a lot of people have this in their library. Allows us to perform FFT over arbitrary moduli without needing to do finicking with primes close to a power of 2 and such. Really, if we ever need more space, I don't think there's anywhere NTT can be used that FFT-MOD can't. An optimized FFT-MOD is only about twice as slow as an optimized NTT.
On the other hand, NTT is pretty cool, and it is seriously nice to not need to think about precision at all.
Here's an example exemplifying the precision issues (I'll turn it into a fuzz test later). Basically, as long as the coefficients don't surpass ~1e15 FFT is fine. Higher though, and you start to see precision issues.
https://ideone.com/lUrDCT
It is kind of ugly though :'(. I tried my best to shorten it, but it's still uglier than I'd like.
This is somewhat WIP - I still need to add a description about it. But I'd appreciate feedback on the necessity as well as the code itself.