Skip to content

Generic nmod_redc and nmod_redc_fast rings; use in nmod_poly_gcd#2527

Merged
fredrik-johansson merged 8 commits intoflintlib:mainfrom
fredrik-johansson:redc
Dec 16, 2025
Merged

Generic nmod_redc and nmod_redc_fast rings; use in nmod_poly_gcd#2527
fredrik-johansson merged 8 commits intoflintlib:mainfrom
fredrik-johansson:redc

Conversation

@fredrik-johansson
Copy link
Copy Markdown
Collaborator

@fredrik-johansson fredrik-johansson commented Dec 15, 2025

This PR adds constructors gr_ctx_init_nmod_redc and gr_ctx_init_nmod_redc_fast to create $\mathbb{Z}/n\mathbb{Z}$ rings with Montgomery representation. Various scalar and method operators are overloaded with fast versions. This is faster than standard nmod for some things, but not for everything; the code is also a bit sloppy in places. My main goal in this PR is just to get enough functionality in place for polynomial GCD, while other things can be improved in the future.

Indeed, this PR adds nmod_poly_gcd_euclidean_redc_half and uses this instead of the standard Euclidean algorithm in nmod_poly_gcd for moduli between 33 and 62 bits. This achieves up to 30% speedup for nmod_poly_gcd:

gcdredc

For reference, the cumulative speedup compared to flint-4.3:

gcdredc2

It should be noted that nmod_poly_gcd_hgcd currently doesn't profit from this as it specifically calls _gr_poly_gcd_euclidean for shorter GCDs. That code will need some modification to be able to use overloaded basecases. One could also run the HGCD algorithm in the nmod_redc_fast representation but currently that isn't faster than doing it in the nmod representation.

Unfortunately, I don't see how to speed up GCD for moduli with 63 and 64 bits; nmod_redc looks doomed to be slower than ordinary nmod here. The problem is that $a + bc + de$ has to be done with two modular reductions in Montgomery form since the terms aren't homogeneous (there is a factor $R$ to remove from $bc + de$ but not from $a$). The modular addition by $a$ works fine in redc_fast using the branch-free _nmod_add algorithm, but it has to be done as an nmod_add for 64-bit moduli which kills performance. For 63-bit moduli, it should be OK but probably won't be faster than standard nmod. Ideas welcome.

This PR also refactors some code and improves GCD tunings.

@fredrik-johansson fredrik-johansson merged commit 808ecb3 into flintlib:main Dec 16, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant