Toom-Cook multiplication with arbitrary number of interpolation points#2536
Merged
fredrik-johansson merged 3 commits intoflintlib:mainfrom Dec 25, 2025
Merged
Toom-Cook multiplication with arbitrary number of interpolation points#2536fredrik-johansson merged 3 commits intoflintlib:mainfrom
fredrik-johansson merged 3 commits intoflintlib:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds$r$ evaluation/interpolation points for arbitrary $r$ .
_gr_poly_mullow_toom_serial/gr_poly_mullow_toom_serialfor Toom-Cook polynomial multiplication withThis can be used to multiply in time$O(N^{1+\varepsilon})$ in rings without a preexisting fast multiplication routine based on FFT or Kronecker substitution, at least when the characteristic is large enough. However, the main intended application is to support memory efficient multiplication in rings which already have fast multiplication by splitting into smaller chunks, as discussed in #2535.
Some sample results below. Explanations:
_gr_poly_mul, so essentially equivalent to taking_gr_poly_mul, i.e. lower ratio is better.fft_small, etc.)._gr_poly_mul, i.e. higher ratio is better.nmod8):nmod):nmod) (this is an FFT prime):fmpz)fmpz)Observations:
This works especially well for$r$ in many cases can give a ~2x reduction in memory with <1.5x slowdown.
nmod8. It makes the least sense to use when working with an FFT prime since the memory saving is at most going to be ~2x. In most other cases memory can be reduced at least 5x with just a 2-4x slowdown. It's also worth noting that smallIn practice, if you want to minimize memory usage, you wouldn't use something like$r = 81$ ; it should be more efficient to use smaller $r$ together with another recursive round of Toom-Cook.
For$r = 81$ uses more memory than $r = 41$ ). A better alternative there apart from recursive Toom-Cook would be to do incremental CRT.
fmpzwith small coefficients, this algorithm isn't ideal as the evaluation/interpolation coefficients cause some blowup (with 1000-bit coefficients, you can see that