Optimize fullword inner loop in ``nmod_poly_gcd_euclidean`` by fredrik-johansson · Pull Request #2528 · flintlib/flint

fredrik-johansson · 2025-12-16T09:40:49Z

Found two more time savers in _nmod_poly_divrem_q1_preinv1_fullword:

In general r = a + q0*b + q1*c will have high limb larger than n and can even overflow two limbs, but in practice this rarely happens (and almost surely doesn't happen when n is just larger than 2^(FLINT_BITS-1). We can inspect the actual values q0 and q1 before entering the loop to verify that it is safe to do a regular addition instead of a modular addition for the high limb.
Specialize modular reduction knowing that norm == 0.

The asymptotic improvement is about 15%. This is quite nice since it affects the moduli used by fmpz_poly_gcd_modular.

Effect on nmod_poly_gcd:

Optimize fullword inner loop in nmod_poly_gcd_euclidean

59426c5

fredrik-johansson added the performance label Dec 16, 2025

fredrik-johansson merged commit 2a05be4 into flintlib:main Dec 16, 2025
11 of 12 checks passed

fredrik-johansson deleted the gcd8 branch December 16, 2025 18:06

Provide feedback