Skip to content

Optimize fullword inner loop in nmod_poly_gcd_euclidean#2528

Merged
fredrik-johansson merged 1 commit intoflintlib:mainfrom
fredrik-johansson:gcd8
Dec 16, 2025
Merged

Optimize fullword inner loop in nmod_poly_gcd_euclidean#2528
fredrik-johansson merged 1 commit intoflintlib:mainfrom
fredrik-johansson:gcd8

Conversation

@fredrik-johansson
Copy link
Copy Markdown
Collaborator

@fredrik-johansson fredrik-johansson commented Dec 16, 2025

Found two more time savers in _nmod_poly_divrem_q1_preinv1_fullword:

  • In general r = a + q0*b + q1*c will have high limb larger than n and can even overflow two limbs, but in practice this rarely happens (and almost surely doesn't happen when n is just larger than 2^(FLINT_BITS-1). We can inspect the actual values q0 and q1 before entering the loop to verify that it is safe to do a regular addition instead of a modular addition for the high limb.
  • Specialize modular reduction knowing that norm == 0.

The asymptotic improvement is about 15%. This is quite nice since it affects the moduli used by fmpz_poly_gcd_modular.

Effect on nmod_poly_gcd:

gcd64

@fredrik-johansson fredrik-johansson merged commit 2a05be4 into flintlib:main Dec 16, 2025
11 of 12 checks passed
@fredrik-johansson fredrik-johansson deleted the gcd8 branch December 16, 2025 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant