Skip to content

acb_dft_rad2 loses ~log2(n) bits of accuracy: rad2 plan root table built at call precision via a recurrence #2709

@edgarcosta

Description

@edgarcosta

Summary

acb_dft_rad2 (and acb_dft for power-of-two lengths) returns rigorous balls that are
much wider than achievable at the same working precision — about 2490× (~11 bits)
at n = 2¹⁶
, with the gap growing ~linearly in n. The cause is the rad2 plan's
root-of-unity table: it is built at the transform's call precision via a multiplicative
recurrence, so the rounding error of the stored roots grows ~linearly with the root index.

Outputs remain rigorous (verified with acb_overlaps), so this is an accuracy/tightness
issue (enhancement), not a correctness bug.

Reproducer

Self-contained; build with cc -O2 repro.c -lflint -lm -o repro && ./repro. It transforms
an exact integer vector (zero input radius, so the output radius is purely the transform's
own error) and varies only the rad2 plan's root precision.

#include <flint/acb.h>
#include <flint/acb_dft.h>

static void fill_pattern(acb_ptr v, slong n) {
  for (slong i = 0; i < n; i++) {
    arb_set_si(acb_realref(v + i), (i % 19) - 9); /* exact integers: zero input radius, */
    arb_set_si(acb_imagref(v + i), (i % 13) - 6); /* so output radius is the FFT's own error */
  }
}
static double max_rad(acb_srcptr v, slong n) {
  arb_t m, r;
  arb_init(m);
  arb_init(r);
  for (slong i = 0; i < n; i++) {
    arb_get_rad_arb(r, acb_realref(v + i));
    if (arb_gt(r, m)) arb_set(m, r);
    arb_get_rad_arb(r, acb_imagref(v + i));
    if (arb_gt(r, m)) arb_set(m, r);
  }
  double d = arf_get_d(arb_midref(m), ARF_RND_NEAR);
  arb_clear(m);
  arb_clear(r);
  return d;
}
static int all_overlap(acb_srcptr a, acb_srcptr b, slong n) {
  for (slong i = 0; i < n; i++)
    if (!acb_overlaps(a + i, b + i)) return 0;
  return 1;
}

/* (1) isolation: same input and same transform precision; vary only the plan's
       root-table precision (prec vs prec+64). */
static void isolation(int e, slong prec) {
  slong n = (slong)1 << e;
  acb_ptr in = _acb_vec_init(n), wlo = _acb_vec_init(n), whi = _acb_vec_init(n);
  fill_pattern(in, n);

  acb_dft_rad2_t lo, hi;
  acb_dft_rad2_init(lo, e, prec);      /* roots at call precision  */
  acb_dft_rad2_init(hi, e, prec + 64); /* roots 64 bits tighter    */
  acb_dft_rad2_precomp(wlo, in, lo, prec);
  acb_dft_rad2_precomp(whi, in, hi, prec); /* SAME arithmetic precision (prec) */
  acb_dft_rad2_clear(lo);
  acb_dft_rad2_clear(hi);

  double rlo = max_rad(wlo, n), rhi = max_rad(whi, n);
  flint_printf("  n=2^%-2d prec=%wd : plan@prec radius=%.3g  plan@prec+64 radius=%.3g"
               "  -> %.0fx looser   (overlap=%d)\n",
               e, prec, rlo, rhi, rhi > 0 ? rlo / rhi : 0.0, all_overlap(wlo, whi, n));

  _acb_vec_clear(in, n);
  _acb_vec_clear(wlo, n);
  _acb_vec_clear(whi, n);
}

/* (2) the looseness is in the shared root table, not rad2's butterflies:
       acb_dft_naive is equally loose; both are ~Nx wider than the same rad2
       transform with high-precision plan roots (prec+64). */
static void vs_naive(int e, slong prec) {
  slong n = (slong)1 << e;
  acb_ptr in = _acb_vec_init(n), wn = _acb_vec_init(n), wr = _acb_vec_init(n), whi = _acb_vec_init(n);
  fill_pattern(in, n);

  acb_dft_naive(wn, in, n, prec);

  acb_dft_rad2_t lo, hi;
  acb_dft_rad2_init(lo, e, prec);
  acb_dft_rad2_init(hi, e, prec + 64);
  acb_dft_rad2_precomp(wr, in, lo, prec);
  acb_dft_rad2_precomp(whi, in, hi, prec);
  acb_dft_rad2_clear(lo);
  acb_dft_rad2_clear(hi);

  double rn = max_rad(wn, n), rr = max_rad(wr, n), rh = max_rad(whi, n);
  flint_printf("  n=2^%-2d prec=%wd : naive=%.3g  rad2=%.3g  rad2(plan@prec+64)=%.3g"
               "  -> naive %.0fx, rad2 %.0fx wider than the high-precision-root reference\n",
               e, prec, rn, rr, rh, rh > 0 ? rn / rh : 0.0, rh > 0 ? rr / rh : 0.0);

  _acb_vec_clear(in, n);
  _acb_vec_clear(wn, n);
  _acb_vec_clear(wr, n);
  _acb_vec_clear(whi, n);
}

int main(void) {
  slong prec = 128;
  flint_printf("(1) acb_dft_rad2: plan root precision is the bottleneck "
               "(only the plan's root precision differs)\n");
  isolation(11, prec);
  isolation(12, prec);
  isolation(14, prec);
  isolation(16, prec);
  flint_printf("(2) acb_dft_rad2 vs acb_dft_naive at the same precision\n");
  vs_naive(11, prec);
  vs_naive(12, prec);
  flint_cleanup();
  return 0;
}

Output (identical on FLINT 3.0.1 and 3.6.0 / main @ edc75bc)

(1) acb_dft_rad2: plan root precision is the bottleneck (only the plan's root precision differs)
  n=2^11 prec=128 : plan@prec radius=1.86e-32  plan@prec+64 radius=2.24e-34  -> 83x looser   (overlap=1)
  n=2^12 prec=128 : plan@prec radius=1.07e-31  plan@prec+64 radius=5.58e-34  -> 191x looser  (overlap=1)
  n=2^14 prec=128 : plan@prec radius=1.93e-30  plan@prec+64 radius=3.38e-33  -> 571x looser  (overlap=1)
  n=2^16 prec=128 : plan@prec radius=5.55e-29  plan@prec+64 radius=2.23e-32  -> 2490x looser (overlap=1)
(2) acb_dft_rad2 vs acb_dft_naive at the same precision
  n=2^11 prec=128 : naive=3.09e-32  rad2=1.86e-32  rad2(plan@prec+64)=2.24e-34  -> naive 138x, rad2 83x wider than the high-precision-root reference
  n=2^12 prec=128 : naive=9.18e-32  rad2=1.07e-31  rad2(plan@prec+64)=5.58e-34  -> naive 165x, rad2 191x wider than the high-precision-root reference

Expected vs actual

At prec = 128, n = 2¹⁶, exact input:

  • actual acb_dft_rad2 output radius: 5.55e-29
  • achievable (same transform, plan roots computed at prec + 64): 2.23e-32

i.e. ~2490× / ~11 bits of avoidable widening. Part (1) isolates the cause: the only
thing that differs between the two columns is the precision of the plan's root table — the
transform arithmetic runs at prec in both. Part (2) shows acb_dft_naive is equally
loose, so the problem is the shared root table, not the rad2 butterflies.

Root cause

_acb_dft_rad2_init (src/acb_dft/rad2.c) fills the plan's root table at the call precision:

_acb_vec_unit_roots(t->z, -t->n, t->nz, prec);   /* prec == the transform's call precision */

and _acb_vec_unit_roots (src/acb/vec_unit_roots.c) builds the table by multiplicative
recurrence
acb_unit_root for a base root, then _acb_vec_set_powers (z[k] = z[k-1]·base),
with symmetry-filling for the remainder. The recurrence accumulates rounding error
∝ index k, so the largest roots carry ~(n/2)·2⁻ᵖʳᵉᶜ of error and the transform
inherits it, giving the observed ~n-linear widening (predicted ~2¹¹≈2000× at n=2¹⁶; observed 2490×).

Workaround (for callers)

Build the precomputed plan at prec + O(log₂ n) (e.g. +32 suffices up to n = 2¹⁶) while
running the transform at prec. The roots are then tighter than the working-precision
arithmetic and the output is as tight as a direct-twiddle FFT, at no extra runtime cost
(the plan is built once).

Suggested fixes

  1. Compute the rad2/dft plan root table at an internally boosted precision
    prec + O(log₂ n) by default, so acb_dft delivers ~prec-accurate results.
  2. Or make _acb_vec_set_powers error grow like log k instead of k (binary splitting,
    or periodic re-anchoring of the recurrence to freshly-computed roots).
  3. At minimum, document that precomputed acb_dft plans should be built at boosted
    precision for tight results.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions