Generic nmod_redc and nmod_redc_fast rings; use in nmod_poly_gcd#2527
Merged
fredrik-johansson merged 8 commits intoflintlib:mainfrom Dec 16, 2025
Merged
Generic nmod_redc and nmod_redc_fast rings; use in nmod_poly_gcd#2527fredrik-johansson merged 8 commits intoflintlib:mainfrom
nmod_redc and nmod_redc_fast rings; use in nmod_poly_gcd#2527fredrik-johansson merged 8 commits intoflintlib:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds constructors$\mathbb{Z}/n\mathbb{Z}$ rings with Montgomery representation. Various scalar and method operators are overloaded with fast versions. This is faster than standard
gr_ctx_init_nmod_redcandgr_ctx_init_nmod_redc_fastto createnmodfor some things, but not for everything; the code is also a bit sloppy in places. My main goal in this PR is just to get enough functionality in place for polynomial GCD, while other things can be improved in the future.Indeed, this PR adds
nmod_poly_gcd_euclidean_redc_halfand uses this instead of the standard Euclidean algorithm innmod_poly_gcdfor moduli between 33 and 62 bits. This achieves up to 30% speedup fornmod_poly_gcd:For reference, the cumulative speedup compared to flint-4.3:
It should be noted that
nmod_poly_gcd_hgcdcurrently doesn't profit from this as it specifically calls_gr_poly_gcd_euclideanfor shorter GCDs. That code will need some modification to be able to use overloaded basecases. One could also run the HGCD algorithm in thenmod_redc_fastrepresentation but currently that isn't faster than doing it in thenmodrepresentation.Unfortunately, I don't see how to speed up GCD for moduli with 63 and 64 bits;$a + bc + de$ has to be done with two modular reductions in Montgomery form since the terms aren't homogeneous (there is a factor $R$ to remove from $bc + de$ but not from $a$ ). The modular addition by $a$ works fine in
nmod_redclooks doomed to be slower than ordinarynmodhere. The problem is thatredc_fastusing the branch-free_nmod_addalgorithm, but it has to be done as annmod_addfor 64-bit moduli which kills performance. For 63-bit moduli, it should be OK but probably won't be faster than standardnmod. Ideas welcome.This PR also refactors some code and improves GCD tunings.