acb_dft_rad2 loses ~log2(n) bits of accuracy: rad2 plan root table built at call precision via a recurrence

### Summary

`acb_dft_rad2` (and `acb_dft` for power-of-two lengths) returns rigorous balls that are
**much wider than achievable at the same working precision** — about **2490× (~11 bits)
at n = 2¹⁶**, with the gap growing **~linearly in n**. The cause is the rad2 plan's
root-of-unity table: it is built at the transform's *call* precision via a multiplicative
recurrence, so the rounding error of the stored roots grows ~linearly with the root index.

Outputs remain rigorous (verified with `acb_overlaps`), so this is an accuracy/tightness
issue (enhancement), not a correctness bug.



### Reproducer

Self-contained; build with `cc -O2 repro.c -lflint -lm -o repro && ./repro`. It transforms
an exact integer vector (zero input radius, so the output radius is purely the transform's
own error) and varies **only** the rad2 plan's root precision.

```c
#include <flint/acb.h>
#include <flint/acb_dft.h>

static void fill_pattern(acb_ptr v, slong n) {
  for (slong i = 0; i < n; i++) {
    arb_set_si(acb_realref(v + i), (i % 19) - 9); /* exact integers: zero input radius, */
    arb_set_si(acb_imagref(v + i), (i % 13) - 6); /* so output radius is the FFT's own error */
  }
}
static double max_rad(acb_srcptr v, slong n) {
  arb_t m, r;
  arb_init(m);
  arb_init(r);
  for (slong i = 0; i < n; i++) {
    arb_get_rad_arb(r, acb_realref(v + i));
    if (arb_gt(r, m)) arb_set(m, r);
    arb_get_rad_arb(r, acb_imagref(v + i));
    if (arb_gt(r, m)) arb_set(m, r);
  }
  double d = arf_get_d(arb_midref(m), ARF_RND_NEAR);
  arb_clear(m);
  arb_clear(r);
  return d;
}
static int all_overlap(acb_srcptr a, acb_srcptr b, slong n) {
  for (slong i = 0; i < n; i++)
    if (!acb_overlaps(a + i, b + i)) return 0;
  return 1;
}

/* (1) isolation: same input and same transform precision; vary only the plan's
       root-table precision (prec vs prec+64). */
static void isolation(int e, slong prec) {
  slong n = (slong)1 << e;
  acb_ptr in = _acb_vec_init(n), wlo = _acb_vec_init(n), whi = _acb_vec_init(n);
  fill_pattern(in, n);

  acb_dft_rad2_t lo, hi;
  acb_dft_rad2_init(lo, e, prec);      /* roots at call precision  */
  acb_dft_rad2_init(hi, e, prec + 64); /* roots 64 bits tighter    */
  acb_dft_rad2_precomp(wlo, in, lo, prec);
  acb_dft_rad2_precomp(whi, in, hi, prec); /* SAME arithmetic precision (prec) */
  acb_dft_rad2_clear(lo);
  acb_dft_rad2_clear(hi);

  double rlo = max_rad(wlo, n), rhi = max_rad(whi, n);
  flint_printf("  n=2^%-2d prec=%wd : plan@prec radius=%.3g  plan@prec+64 radius=%.3g"
               "  -> %.0fx looser   (overlap=%d)\n",
               e, prec, rlo, rhi, rhi > 0 ? rlo / rhi : 0.0, all_overlap(wlo, whi, n));

  _acb_vec_clear(in, n);
  _acb_vec_clear(wlo, n);
  _acb_vec_clear(whi, n);
}

/* (2) the looseness is in the shared root table, not rad2's butterflies:
       acb_dft_naive is equally loose; both are ~Nx wider than the same rad2
       transform with high-precision plan roots (prec+64). */
static void vs_naive(int e, slong prec) {
  slong n = (slong)1 << e;
  acb_ptr in = _acb_vec_init(n), wn = _acb_vec_init(n), wr = _acb_vec_init(n), whi = _acb_vec_init(n);
  fill_pattern(in, n);

  acb_dft_naive(wn, in, n, prec);

  acb_dft_rad2_t lo, hi;
  acb_dft_rad2_init(lo, e, prec);
  acb_dft_rad2_init(hi, e, prec + 64);
  acb_dft_rad2_precomp(wr, in, lo, prec);
  acb_dft_rad2_precomp(whi, in, hi, prec);
  acb_dft_rad2_clear(lo);
  acb_dft_rad2_clear(hi);

  double rn = max_rad(wn, n), rr = max_rad(wr, n), rh = max_rad(whi, n);
  flint_printf("  n=2^%-2d prec=%wd : naive=%.3g  rad2=%.3g  rad2(plan@prec+64)=%.3g"
               "  -> naive %.0fx, rad2 %.0fx wider than the high-precision-root reference\n",
               e, prec, rn, rr, rh, rh > 0 ? rn / rh : 0.0, rh > 0 ? rr / rh : 0.0);

  _acb_vec_clear(in, n);
  _acb_vec_clear(wn, n);
  _acb_vec_clear(wr, n);
  _acb_vec_clear(whi, n);
}

int main(void) {
  slong prec = 128;
  flint_printf("(1) acb_dft_rad2: plan root precision is the bottleneck "
               "(only the plan's root precision differs)\n");
  isolation(11, prec);
  isolation(12, prec);
  isolation(14, prec);
  isolation(16, prec);
  flint_printf("(2) acb_dft_rad2 vs acb_dft_naive at the same precision\n");
  vs_naive(11, prec);
  vs_naive(12, prec);
  flint_cleanup();
  return 0;
}
```

### Output (identical on FLINT 3.0.1 and 3.6.0 / `main` @ `edc75bc`)

```
(1) acb_dft_rad2: plan root precision is the bottleneck (only the plan's root precision differs)
  n=2^11 prec=128 : plan@prec radius=1.86e-32  plan@prec+64 radius=2.24e-34  -> 83x looser   (overlap=1)
  n=2^12 prec=128 : plan@prec radius=1.07e-31  plan@prec+64 radius=5.58e-34  -> 191x looser  (overlap=1)
  n=2^14 prec=128 : plan@prec radius=1.93e-30  plan@prec+64 radius=3.38e-33  -> 571x looser  (overlap=1)
  n=2^16 prec=128 : plan@prec radius=5.55e-29  plan@prec+64 radius=2.23e-32  -> 2490x looser (overlap=1)
(2) acb_dft_rad2 vs acb_dft_naive at the same precision
  n=2^11 prec=128 : naive=3.09e-32  rad2=1.86e-32  rad2(plan@prec+64)=2.24e-34  -> naive 138x, rad2 83x wider than the high-precision-root reference
  n=2^12 prec=128 : naive=9.18e-32  rad2=1.07e-31  rad2(plan@prec+64)=5.58e-34  -> naive 165x, rad2 191x wider than the high-precision-root reference
```

### Expected vs actual

At `prec = 128`, `n = 2¹⁶`, exact input:

- **actual** `acb_dft_rad2` output radius: `5.55e-29`
- **achievable** (same transform, plan roots computed at `prec + 64`): `2.23e-32`

i.e. ~**2490× / ~11 bits** of avoidable widening. Part (1) isolates the cause: the *only*
thing that differs between the two columns is the precision of the plan's root table — the
transform arithmetic runs at `prec` in both. Part (2) shows `acb_dft_naive` is equally
loose, so the problem is the **shared root table**, not the rad2 butterflies.

### Root cause

`_acb_dft_rad2_init` (`src/acb_dft/rad2.c`) fills the plan's root table at the call precision:

```c
_acb_vec_unit_roots(t->z, -t->n, t->nz, prec);   /* prec == the transform's call precision */
```

and `_acb_vec_unit_roots` (`src/acb/vec_unit_roots.c`) builds the table by **multiplicative
recurrence** — `acb_unit_root` for a base root, then `_acb_vec_set_powers` (`z[k] = z[k-1]·base`),
with symmetry-filling for the remainder. The recurrence accumulates rounding error
**∝ index k**, so the largest roots carry ~`(n/2)·2⁻ᵖʳᵉᶜ` of error and the transform
inherits it, giving the observed ~`n`-linear widening (predicted ~2¹¹≈2000× at n=2¹⁶; observed 2490×).

### Workaround (for callers)

Build the precomputed plan at `prec + O(log₂ n)` (e.g. `+32` suffices up to n = 2¹⁶) while
running the transform at `prec`. The roots are then tighter than the working-precision
arithmetic and the output is as tight as a direct-twiddle FFT, at no extra runtime cost
(the plan is built once).

### Suggested fixes

1. Compute the rad2/dft plan root table at an internally boosted precision
   `prec + O(log₂ n)` by default, so `acb_dft` delivers ~`prec`-accurate results.
2. Or make `_acb_vec_set_powers` error grow like `log k` instead of `k` (binary splitting,
   or periodic re-anchoring of the recurrence to freshly-computed roots).
3. At minimum, document that precomputed `acb_dft` plans should be built at boosted
   precision for tight results.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

acb_dft_rad2 loses ~log2(n) bits of accuracy: rad2 plan root table built at call precision via a recurrence #2709

Summary

Reproducer

Output (identical on FLINT 3.0.1 and 3.6.0 / `main` @ `edc75bc`)

Expected vs actual

Root cause

Workaround (for callers)

Suggested fixes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

acb_dft_rad2 loses ~log2(n) bits of accuracy: rad2 plan root table built at call precision via a recurrence #2709

Description

Summary

Reproducer

Output (identical on FLINT 3.0.1 and 3.6.0 / main @ edc75bc)

Expected vs actual

Root cause

Workaround (for callers)

Suggested fixes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Output (identical on FLINT 3.0.1 and 3.6.0 / `main` @ `edc75bc`)