Toom-Cook multiplication with arbitrary number of interpolation points by fredrik-johansson · Pull Request #2536 · flintlib/flint

fredrik-johansson · 2025-12-23T08:14:33Z

Adds _gr_poly_mullow_toom_serial / gr_poly_mullow_toom_serial for Toom-Cook polynomial multiplication with $r$ evaluation/interpolation points for arbitrary $r$.

This can be used to multiply in time $O(N^{1+\varepsilon})$ in rings without a preexisting fast multiplication routine based on FFT or Kronecker substitution, at least when the characteristic is large enough. However, the main intended application is to support memory efficient multiplication in rings which already have fast multiplication by splitting into smaller chunks, as discussed in #2535.

Some sample results below. Explanations:

$N$ is the length of the inputs
$r$ is the Toom-Cook order; "-" corresponds to a direct fast (FFT-based) multiplication with _gr_poly_mul, so essentially equivalent to taking $r = 1$.
"Time, first" is the time in seconds for doing a single multiplication; the ratio in parentheses shows the slowdown of Toom-Cook vs _gr_poly_mul, i.e. lower ratio is better.
"Time, second" is the time for a followup multiplication, which is generally faster as FLINT has done some caching (roots of unity for fft_small, etc.).
For peak memory usage, the ratio in parentheses shows the savings of Toom-Cook over _gr_poly_mul, i.e. higher ratio is better.

$N = 10^8$, integers mod 251 (nmod8):

r	Time, first	Time, second	Peak memory
-	6.703 ( 1.00x)	5.174 ( 1.00x)	7.86 GB ( 1.00x)
3	8.094 ( 1.21x)	7.335 ( 1.42x)	4.30 GB ( 1.83x)
5	8.476 ( 1.26x)	8.094 ( 1.56x)	2.75 GB ( 2.86x)
7	9.459 ( 1.41x)	9.093 ( 1.76x)	2.34 GB ( 3.36x)
11	10.252 ( 1.53x)	10.069 ( 1.95x)	1.56 GB ( 5.04x)
15	11.669 ( 1.74x)	11.501 ( 2.22x)	1.36 GB ( 5.78x)
21	14.215 ( 2.12x)	14.030 ( 2.71x)	1.24 GB ( 6.34x)
41	19.642 ( 2.93x)	19.645 ( 3.80x)	838 MB ( 9.60x)
81	31.112 ( 4.64x)	31.413 ( 6.07x)	617 MB (13.04x)

$N = 10^7$, integers mod 9223372036854775837 (nmod):

r	Time, first	Time, second	Peak memory
-	2.137 ( 1.00x)	1.520 ( 1.00x)	1.99 GB ( 1.00x)
3	2.659 ( 1.24x)	2.328 ( 1.53x)	1.30 GB ( 1.53x)
5	2.747 ( 1.29x)	2.575 ( 1.69x)	854 MB ( 2.39x)
7	3.163 ( 1.48x)	3.005 ( 1.98x)	822 MB ( 2.48x)
11	2.589 ( 1.21x)	2.541 ( 1.67x)	534 MB ( 3.82x)
15	3.083 ( 1.44x)	3.004 ( 1.98x)	518 MB ( 3.93x)
21	3.316 ( 1.55x)	3.278 ( 2.16x)	425 MB ( 4.79x)
41	4.839 ( 2.26x)	4.807 ( 3.16x)	369 MB ( 5.52x)
81	8.090 ( 3.79x)	8.217 ( 5.41x)	340 MB ( 5.99x)

$N = 10^8$, integers mod 1108307720798209 (nmod) (this is an FFT prime):

r	Time, first	Time, second	Peak memory
1	5.313 ( 1.00x)	3.781 ( 1.00x)	7.49 GB ( 1.00x)
3	7.274 ( 1.37x)	6.538 ( 1.73x)	6.73 GB ( 1.11x)
5	8.042 ( 1.51x)	7.681 ( 2.03x)	5.23 GB ( 1.43x)
7	9.749 ( 1.83x)	9.572 ( 2.53x)	4.86 GB ( 1.54x)
11	12.456 ( 2.34x)	12.292 ( 3.25x)	4.11 GB ( 1.82x)
15	15.947 ( 3.00x)	15.850 ( 4.19x)	3.92 GB ( 1.91x)
21	21.538 ( 4.05x)	21.457 ( 5.67x)	3.82 GB ( 1.96x)
41	39.950 ( 7.52x)	39.375 (10.41x)	3.41 GB ( 2.20x)
81	71.582 (13.47x)	72.004 (19.04x)	3.20 GB ( 2.34x)

$N = 10^5$, integers with 10000-bit coefficients (fmpz)

r	Time, first	Time, second	Peak memory
1	5.986 ( 1.00x)	5.291 ( 1.00x)	5.40 GB ( 1.00x)
3	7.259 ( 1.21x)	6.408 ( 1.21x)	3.42 GB ( 1.58x)
5	7.457 ( 1.25x)	7.105 ( 1.34x)	2.37 GB ( 2.28x)
7	8.111 ( 1.35x)	7.982 ( 1.51x)	2.07 GB ( 2.61x)
11	8.925 ( 1.49x)	9.378 ( 1.77x)	1.54 GB ( 3.51x)
15	10.791 ( 1.80x)	11.858 ( 2.24x)	1.40 GB ( 3.86x)
21	11.042 ( 1.84x)	12.107 ( 2.29x)	1.24 GB ( 4.35x)
41	17.378 ( 2.90x)	19.815 ( 3.75x)	1.03 GB ( 5.24x)
81	44.022 ( 7.35x)	47.56 ( 8.99x)	982 MB ( 5.63x)

$N = 10^6$, integers with 1000-bit coefficients (fmpz)

r	Time, first	Time, second	Peak memory
1	6.213 ( 1.00x)	5.416 ( 1.00x)	5.56 GB ( 1.00x)
3	7.669 ( 1.23x)	6.533 ( 1.21x)	3.67 GB ( 1.51x)
5	8.392 ( 1.35x)	7.583 ( 1.40x)	2.74 GB ( 2.03x)
7	8.536 ( 1.37x)	7.946 ( 1.47x)	2.27 GB ( 2.45x)
11	10.31 ( 1.66x)	9.955 ( 1.84x)	1.82 GB ( 3.05x)
15	12.633 ( 2.03x)	12.711 ( 2.35x)	1.65 GB ( 3.37x)
21	12.688 ( 2.04x)	13.374 ( 2.47x)	1.40 GB ( 3.97x)
41	22.8 ( 3.67x)	26.986 ( 4.98x)	1.26 GB ( 4.41x)
81	82.393 (13.26x)	103.886 (19.18x)	1.27 GB ( 4.38x)

Observations:

This works especially well for nmod8. It makes the least sense to use when working with an FFT prime since the memory saving is at most going to be ~2x. In most other cases memory can be reduced at least 5x with just a 2-4x slowdown. It's also worth noting that small $r$ in many cases can give a ~2x reduction in memory with <1.5x slowdown.

In practice, if you want to minimize memory usage, you wouldn't use something like $r = 81$; it should be more efficient to use smaller $r$ together with another recursive round of Toom-Cook.

For fmpz with small coefficients, this algorithm isn't ideal as the evaluation/interpolation coefficients cause some blowup (with 1000-bit coefficients, you can see that $r = 81$ uses more memory than $r = 41$). A better alternative there apart from recursive Toom-Cook would be to do incremental CRT.

fredrik-johansson added 3 commits December 23, 2025 08:19

Add _gr_vec_addmul_scalar_fmpz and specialize some gr_fmpz methods

e2edd0e

Arbitrary-order Toom-Cook multiplication (serial version) for gr_poly

d310e21

fix prototypes

6189ac1

fredrik-johansson added performance new feature generics labels Dec 23, 2025

fredrik-johansson merged commit bbde32c into flintlib:main Dec 25, 2025
13 checks passed

fredrik-johansson deleted the toom branch December 25, 2025 00:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Toom-Cook multiplication with arbitrary number of interpolation points#2536

Toom-Cook multiplication with arbitrary number of interpolation points#2536
fredrik-johansson merged 3 commits intoflintlib:mainfrom
fredrik-johansson:toom

fredrik-johansson commented Dec 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fredrik-johansson commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fredrik-johansson commented Dec 23, 2025 •

edited

Loading