Skip to content

Miscellaneous nmod optimizations and helper functions#2636

Merged
fredrik-johansson merged 8 commits intoflintlib:mainfrom
fredrik-johansson:n2
Apr 17, 2026
Merged

Miscellaneous nmod optimizations and helper functions#2636
fredrik-johansson merged 8 commits intoflintlib:mainfrom
fredrik-johansson:n2

Conversation

@fredrik-johansson
Copy link
Copy Markdown
Collaborator

  • Add n_mulhi and n_mod_barrett, n_mod_barrett_sloppy, n_mod_lemire (plus their respective precomputation methods as public functions), replacing some previous private reimplementations of these functions
  • Extend the internal n_divrem functions in the radix module (not yet made public in this PR, as I'm not yet sure about the interface)
  • Micro-optimize modular reduction for fullword moduli in fft_small (up to 2.5% speedup for polynomial multiplication)
  • Optimize arithmetic in sparse nmod_poly division (affecting polmod operations and fq_nmod arithmetic)
    • make check MOD=fq_nmod_poly (with 10x test multiplier): 5.995 s -> 7.371 s (23% speedup)
    • build/examples/minimal_irreducibles 5 3125 3125: 3.972 s -> 3.256 s (22% speedup)

@vneiger
Copy link
Copy Markdown
Collaborator

vneiger commented Apr 17, 2026

A minor terminology question, to make things consistent: how to call the _sloppy variant? It may be worth thinking about it since this will appear (or already does) in a few other similar places. For example _nmod_poly_evaluate_nmod_precomp_lazy uses a "sloppy" mulmod_shoup (it does it implicitly... this should be made explicit with a dedicated sloppy/lazy mulmod_shoup function at some point).

"sloppy" suggests that the result may be wrong, which is maybe misleading (although strictly speaking the result is wrong in the sense that it is not totally reduced). I've seen in some places "lazy" being used for this, but mostly in articles or notes specifically about NTT/FFT implementations (see e.g. section 3.3 here ). This is maybe also because such reductions or modular multiplications are used in "lazy approaches" where one tries to accumulate computations before reducing. For example within the draft PR #2107 I encounter a lot of such cases so I chose to use names that indicate how lazy the reduction is (e.g. lazy_2 if the function reduces to [0, 2n), or lazy_4_2 if it has input in [0,4n) and output in [0,2n), etc.).

Any suggestion of a third name for this, or any preference between "sloppy" and "lazy"?

@fredrik-johansson
Copy link
Copy Markdown
Collaborator Author

Good question. There is also redc_fast with the same meaning. I could rename barrett_sloppy to barrett_lazy to the reduce the number of adjectives floating around.

@vneiger
Copy link
Copy Markdown
Collaborator

vneiger commented Apr 17, 2026

Good question. There is also redc_fast with the same meaning. I could rename barrett_sloppy to barrett_lazy to the reduce the number of adjectives floating around.

Ok, if we go for lazy, I will spot the few places where I had implicitly used a lazy mulmod_shoup, and make it explicit. (I don't have comments that are really about this PR here, it looks good to me.)

@fredrik-johansson fredrik-johansson merged commit 490c239 into flintlib:main Apr 17, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants