Skip to content

Speed up nmod_addmul and fullword _nmod_vec_scalar_mul_nmod, _nmod_vec_scalar_addmul_nmod#2529

Merged
fredrik-johansson merged 1 commit intoflintlib:mainfrom
fredrik-johansson:nmod10
Dec 17, 2025
Merged

Speed up nmod_addmul and fullword _nmod_vec_scalar_mul_nmod, _nmod_vec_scalar_addmul_nmod#2529
fredrik-johansson merged 1 commit intoflintlib:mainfrom
fredrik-johansson:nmod10

Conversation

@fredrik-johansson
Copy link
Copy Markdown
Collaborator

nmod_addmul is replaced with simpler and faster code which simplifies and speeds up the generic fallback in _nmod_vec_scalar_addmul_nmod. The old code for 64-bit moduli in _nmod_vec_scalar_addmul_nmod was particularly bad; the new code is more than 3x faster.

Also, nmod_redc_mul is used to speed up the 64-bit case in _nmod_vec_scalar_mul_nmod where Shoup can't be used. Asymptotically this is about 20% faster than the old code, with the only restriction is that n needs to be odd to benefit.

Timings on Zen 3:

           vec_mul               vec_scalar_addmul            speedup
bits len   old       new         old        new             mul     addmul
63   1     4.66e-09  4.69e-09    4.21e-09   4.23e-09        0.994   0.995
63   2     6.02e-09  6.02e-09    6.14e-09   6.92e-09        1.000   0.887
63   3     6.77e-09  6.79e-09    7.37e-09   7.89e-09        0.997   0.934
63   4     8.52e-09  8.52e-09    1e-08      9.85e-09        1.000   1.015
63   5     1.04e-08  1.03e-08    1.16e-08   1.12e-08        1.010   1.036
63   6     1.18e-08  1.17e-08    1.27e-08   1.26e-08        1.009   1.008
63   7     1.25e-08  1.26e-08    1.48e-08   1.42e-08        0.992   1.042
63   8     1.46e-08  1.44e-08    1.71e-08   1.59e-08        1.014   1.075
63  10     1.22e-08  1.21e-08    1.53e-08   1.59e-08        1.008   0.962
63  12     1.28e-08  1.28e-08    1.77e-08   1.76e-08        1.000   1.006
63  15     1.58e-08  1.58e-08    2.14e-08   2.14e-08        1.000   1.000
63  18     1.7e-08   1.69e-08    2.53e-08   2.52e-08        1.006   1.004
63  22     1.98e-08  1.99e-08    2.95e-08   2.98e-08        0.995   0.990
63  27     2.38e-08  2.34e-08    3.74e-08   3.73e-08        1.017   1.003
63  33     2.81e-08  2.77e-08    4.51e-08   4.5e-08         1.014   1.002
63  41     3.26e-08  3.25e-08    5.52e-08   5.59e-08        1.003   0.987
63  51     4.01e-08  4.01e-08    6.94e-08   6.88e-08        1.000   1.009
63  63     4.84e-08  4.83e-08    8.41e-08   8.41e-08        1.002   1.000
63  78     5.86e-08  5.86e-08    1.03e-07   1.02e-07        1.000   1.010
63  97     7.24e-08  7.24e-08    1.28e-07   1.28e-07        1.000   1.000
63 121     8.99e-08  9e-08       1.61e-07   1.6e-07         0.999   1.006

64   1     3.9e-09   3.78e-09    4.01e-09   4.14e-09        1.032   0.969
64   2     5.01e-09  5.29e-09    6.01e-09   5.44e-09        0.947   1.105
64   3     5.8e-09   6.43e-09    8.85e-09   6.57e-09        0.902   1.347
64   4     7.18e-09  7.17e-09    1.21e-08   7.8e-09         1.001   1.551
64   5     8.7e-09   8.76e-09    1.47e-08   9.2e-09         0.993   1.598
64   6     1.03e-08  1.02e-08    2.26e-08   1.05e-08        1.010   2.152
64   7     1.11e-08  1.17e-08    2e-08      1.18e-08        0.949   1.695
64   8     1.41e-08  1.39e-08    3.46e-08   1.32e-08        1.014   2.621
64  10     1.6e-08   1.6e-08     3.19e-08   1.6e-08         1.000   1.994
64  12     1.85e-08  1.88e-08    4.1e-08    1.87e-08        0.984   2.193
64  15     2.22e-08  2.19e-08    7.66e-08   2.31e-08        1.014   3.316
64  18     2.7e-08   2.44e-08    8.16e-08   2.72e-08        1.107   3.000
64  22     3.22e-08  2.9e-08     9.9e-08    3.3e-08         1.110   3.000
64  27     3.89e-08  3.64e-08    1.45e-07   4.07e-08        1.069   3.563
64  33     4.71e-08  4.34e-08    1.68e-07   4.83e-08        1.085   3.478
64  41     5.9e-08   5.25e-08    2.21e-07   5.94e-08        1.124   3.721
64  51     7.29e-08  6.26e-08    2.82e-07   7.3e-08         1.165   3.863
64  63     8.93e-08  7.72e-08    3.34e-07   9.48e-08        1.157   3.523
64  78     1.11e-07  9.12e-08    4.3e-07    1.16e-07        1.217   3.707
64  97     1.37e-07  1.14e-07    5.47e-07   1.43e-07        1.202   3.825
64 121     1.72e-07  1.41e-07    6.96e-07   1.77e-07        1.220   3.932

@fredrik-johansson fredrik-johansson merged commit 96250ac into flintlib:main Dec 17, 2025
13 checks passed
@fredrik-johansson fredrik-johansson deleted the nmod10 branch December 17, 2025 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant