Summary
acb_dft_rad2 (and acb_dft for power-of-two lengths) returns rigorous balls that are
much wider than achievable at the same working precision — about 2490× (~11 bits)
at n = 2¹⁶, with the gap growing ~linearly in n. The cause is the rad2 plan's
root-of-unity table: it is built at the transform's call precision via a multiplicative
recurrence, so the rounding error of the stored roots grows ~linearly with the root index.
Outputs remain rigorous (verified with acb_overlaps), so this is an accuracy/tightness
issue (enhancement), not a correctness bug.
Reproducer
Self-contained; build with cc -O2 repro.c -lflint -lm -o repro && ./repro. It transforms
an exact integer vector (zero input radius, so the output radius is purely the transform's
own error) and varies only the rad2 plan's root precision.
#include <flint/acb.h>
#include <flint/acb_dft.h>
static void fill_pattern(acb_ptr v, slong n) {
for (slong i = 0; i < n; i++) {
arb_set_si(acb_realref(v + i), (i % 19) - 9); /* exact integers: zero input radius, */
arb_set_si(acb_imagref(v + i), (i % 13) - 6); /* so output radius is the FFT's own error */
}
}
static double max_rad(acb_srcptr v, slong n) {
arb_t m, r;
arb_init(m);
arb_init(r);
for (slong i = 0; i < n; i++) {
arb_get_rad_arb(r, acb_realref(v + i));
if (arb_gt(r, m)) arb_set(m, r);
arb_get_rad_arb(r, acb_imagref(v + i));
if (arb_gt(r, m)) arb_set(m, r);
}
double d = arf_get_d(arb_midref(m), ARF_RND_NEAR);
arb_clear(m);
arb_clear(r);
return d;
}
static int all_overlap(acb_srcptr a, acb_srcptr b, slong n) {
for (slong i = 0; i < n; i++)
if (!acb_overlaps(a + i, b + i)) return 0;
return 1;
}
/* (1) isolation: same input and same transform precision; vary only the plan's
root-table precision (prec vs prec+64). */
static void isolation(int e, slong prec) {
slong n = (slong)1 << e;
acb_ptr in = _acb_vec_init(n), wlo = _acb_vec_init(n), whi = _acb_vec_init(n);
fill_pattern(in, n);
acb_dft_rad2_t lo, hi;
acb_dft_rad2_init(lo, e, prec); /* roots at call precision */
acb_dft_rad2_init(hi, e, prec + 64); /* roots 64 bits tighter */
acb_dft_rad2_precomp(wlo, in, lo, prec);
acb_dft_rad2_precomp(whi, in, hi, prec); /* SAME arithmetic precision (prec) */
acb_dft_rad2_clear(lo);
acb_dft_rad2_clear(hi);
double rlo = max_rad(wlo, n), rhi = max_rad(whi, n);
flint_printf(" n=2^%-2d prec=%wd : plan@prec radius=%.3g plan@prec+64 radius=%.3g"
" -> %.0fx looser (overlap=%d)\n",
e, prec, rlo, rhi, rhi > 0 ? rlo / rhi : 0.0, all_overlap(wlo, whi, n));
_acb_vec_clear(in, n);
_acb_vec_clear(wlo, n);
_acb_vec_clear(whi, n);
}
/* (2) the looseness is in the shared root table, not rad2's butterflies:
acb_dft_naive is equally loose; both are ~Nx wider than the same rad2
transform with high-precision plan roots (prec+64). */
static void vs_naive(int e, slong prec) {
slong n = (slong)1 << e;
acb_ptr in = _acb_vec_init(n), wn = _acb_vec_init(n), wr = _acb_vec_init(n), whi = _acb_vec_init(n);
fill_pattern(in, n);
acb_dft_naive(wn, in, n, prec);
acb_dft_rad2_t lo, hi;
acb_dft_rad2_init(lo, e, prec);
acb_dft_rad2_init(hi, e, prec + 64);
acb_dft_rad2_precomp(wr, in, lo, prec);
acb_dft_rad2_precomp(whi, in, hi, prec);
acb_dft_rad2_clear(lo);
acb_dft_rad2_clear(hi);
double rn = max_rad(wn, n), rr = max_rad(wr, n), rh = max_rad(whi, n);
flint_printf(" n=2^%-2d prec=%wd : naive=%.3g rad2=%.3g rad2(plan@prec+64)=%.3g"
" -> naive %.0fx, rad2 %.0fx wider than the high-precision-root reference\n",
e, prec, rn, rr, rh, rh > 0 ? rn / rh : 0.0, rh > 0 ? rr / rh : 0.0);
_acb_vec_clear(in, n);
_acb_vec_clear(wn, n);
_acb_vec_clear(wr, n);
_acb_vec_clear(whi, n);
}
int main(void) {
slong prec = 128;
flint_printf("(1) acb_dft_rad2: plan root precision is the bottleneck "
"(only the plan's root precision differs)\n");
isolation(11, prec);
isolation(12, prec);
isolation(14, prec);
isolation(16, prec);
flint_printf("(2) acb_dft_rad2 vs acb_dft_naive at the same precision\n");
vs_naive(11, prec);
vs_naive(12, prec);
flint_cleanup();
return 0;
}
Output (identical on FLINT 3.0.1 and 3.6.0 / main @ edc75bc)
(1) acb_dft_rad2: plan root precision is the bottleneck (only the plan's root precision differs)
n=2^11 prec=128 : plan@prec radius=1.86e-32 plan@prec+64 radius=2.24e-34 -> 83x looser (overlap=1)
n=2^12 prec=128 : plan@prec radius=1.07e-31 plan@prec+64 radius=5.58e-34 -> 191x looser (overlap=1)
n=2^14 prec=128 : plan@prec radius=1.93e-30 plan@prec+64 radius=3.38e-33 -> 571x looser (overlap=1)
n=2^16 prec=128 : plan@prec radius=5.55e-29 plan@prec+64 radius=2.23e-32 -> 2490x looser (overlap=1)
(2) acb_dft_rad2 vs acb_dft_naive at the same precision
n=2^11 prec=128 : naive=3.09e-32 rad2=1.86e-32 rad2(plan@prec+64)=2.24e-34 -> naive 138x, rad2 83x wider than the high-precision-root reference
n=2^12 prec=128 : naive=9.18e-32 rad2=1.07e-31 rad2(plan@prec+64)=5.58e-34 -> naive 165x, rad2 191x wider than the high-precision-root reference
Expected vs actual
At prec = 128, n = 2¹⁶, exact input:
- actual
acb_dft_rad2 output radius: 5.55e-29
- achievable (same transform, plan roots computed at
prec + 64): 2.23e-32
i.e. ~2490× / ~11 bits of avoidable widening. Part (1) isolates the cause: the only
thing that differs between the two columns is the precision of the plan's root table — the
transform arithmetic runs at prec in both. Part (2) shows acb_dft_naive is equally
loose, so the problem is the shared root table, not the rad2 butterflies.
Root cause
_acb_dft_rad2_init (src/acb_dft/rad2.c) fills the plan's root table at the call precision:
_acb_vec_unit_roots(t->z, -t->n, t->nz, prec); /* prec == the transform's call precision */
and _acb_vec_unit_roots (src/acb/vec_unit_roots.c) builds the table by multiplicative
recurrence — acb_unit_root for a base root, then _acb_vec_set_powers (z[k] = z[k-1]·base),
with symmetry-filling for the remainder. The recurrence accumulates rounding error
∝ index k, so the largest roots carry ~(n/2)·2⁻ᵖʳᵉᶜ of error and the transform
inherits it, giving the observed ~n-linear widening (predicted ~2¹¹≈2000× at n=2¹⁶; observed 2490×).
Workaround (for callers)
Build the precomputed plan at prec + O(log₂ n) (e.g. +32 suffices up to n = 2¹⁶) while
running the transform at prec. The roots are then tighter than the working-precision
arithmetic and the output is as tight as a direct-twiddle FFT, at no extra runtime cost
(the plan is built once).
Suggested fixes
- Compute the rad2/dft plan root table at an internally boosted precision
prec + O(log₂ n) by default, so acb_dft delivers ~prec-accurate results.
- Or make
_acb_vec_set_powers error grow like log k instead of k (binary splitting,
or periodic re-anchoring of the recurrence to freshly-computed roots).
- At minimum, document that precomputed
acb_dft plans should be built at boosted
precision for tight results.
Summary
acb_dft_rad2(andacb_dftfor power-of-two lengths) returns rigorous balls that aremuch wider than achievable at the same working precision — about 2490× (~11 bits)
at n = 2¹⁶, with the gap growing ~linearly in n. The cause is the rad2 plan's
root-of-unity table: it is built at the transform's call precision via a multiplicative
recurrence, so the rounding error of the stored roots grows ~linearly with the root index.
Outputs remain rigorous (verified with
acb_overlaps), so this is an accuracy/tightnessissue (enhancement), not a correctness bug.
Reproducer
Self-contained; build with
cc -O2 repro.c -lflint -lm -o repro && ./repro. It transformsan exact integer vector (zero input radius, so the output radius is purely the transform's
own error) and varies only the rad2 plan's root precision.
Output (identical on FLINT 3.0.1 and 3.6.0 /
main@edc75bc)Expected vs actual
At
prec = 128,n = 2¹⁶, exact input:acb_dft_rad2output radius:5.55e-29prec + 64):2.23e-32i.e. ~2490× / ~11 bits of avoidable widening. Part (1) isolates the cause: the only
thing that differs between the two columns is the precision of the plan's root table — the
transform arithmetic runs at
precin both. Part (2) showsacb_dft_naiveis equallyloose, so the problem is the shared root table, not the rad2 butterflies.
Root cause
_acb_dft_rad2_init(src/acb_dft/rad2.c) fills the plan's root table at the call precision:and
_acb_vec_unit_roots(src/acb/vec_unit_roots.c) builds the table by multiplicativerecurrence —
acb_unit_rootfor a base root, then_acb_vec_set_powers(z[k] = z[k-1]·base),with symmetry-filling for the remainder. The recurrence accumulates rounding error
∝ index k, so the largest roots carry ~
(n/2)·2⁻ᵖʳᵉᶜof error and the transforminherits it, giving the observed ~
n-linear widening (predicted ~2¹¹≈2000× at n=2¹⁶; observed 2490×).Workaround (for callers)
Build the precomputed plan at
prec + O(log₂ n)(e.g.+32suffices up to n = 2¹⁶) whilerunning the transform at
prec. The roots are then tighter than the working-precisionarithmetic and the output is as tight as a direct-twiddle FFT, at no extra runtime cost
(the plan is built once).
Suggested fixes
prec + O(log₂ n)by default, soacb_dftdelivers ~prec-accurate results._acb_vec_set_powerserror grow likelog kinstead ofk(binary splitting,or periodic re-anchoring of the recurrence to freshly-computed roots).
acb_dftplans should be built at boostedprecision for tight results.