Skip to content

Improvements eval/interp at geometric progression for unequal parameters#2518

Merged
vneiger merged 5 commits intoflintlib:mainfrom
vneiger:improvements_geometric_for_unequal_params
Dec 5, 2025
Merged

Improvements eval/interp at geometric progression for unequal parameters#2518
vneiger merged 5 commits intoflintlib:mainfrom
vneiger:improvements_geometric_for_unequal_params

Conversation

@vneiger
Copy link
Copy Markdown
Collaborator

@vneiger vneiger commented Dec 5, 2025

This improves evaluation / interpolation at geometric progression in case of parameters that are not all essentially identical. This also resorts more directly to the fft_small middle product when possible.

For the moment, evaluation is done, and speedup is when one of these occurs:

  • the number of evaluation points is less than the length used for the precomputation data
  • the degree of the evaluated polynomial is less than the number of evaluation points
  • the valuation of the evaluated polynomial is not zero (which should not happen too often, but maybe some specific usages have this)

@vneiger vneiger changed the title Improvements eval/interp at geometric for unequal parameters Improvements eval/interp at geometric progression for unequal parameters Dec 5, 2025
@vneiger
Copy link
Copy Markdown
Collaborator Author

vneiger commented Dec 5, 2025

Example timings, when precomputation has been done on 10 * (nb evaluation points). The speed-up factor is almost 10. The interesting column is the one before last, which measures the time for evaluation excluding precomputations.

BEFORE:

==== nb eval points == 1 * (poly length) ====
len     points |       GEOMETRIC PROGRESSION
len     points |iter    fast    w/ prec precomp
1             1|6.1e-03 1.5e-01 5.6e-02 7.4e-01
2             2|1.3e-02 1.8e-01 1.3e-01 1.4e+00
3             3|2.4e-02 2.4e-01 3.4e-01 2.1e+00
4             4|4.2e-02 3.0e-01 4.8e-01 2.6e+00
6             6|9.9e-02 4.3e-01 8.0e-01 3.9e+00
8             8|2.0e-01 5.7e-01 1.2e+00 5.3e+00
10           10|3.4e-01 6.9e-01 1.6e+00 6.5e+00
12           12|5.1e-01 8.9e-01 3.4e+00 7.8e+00
16           16|9.8e-01 1.4e+00 6.1e+00 1.1e+01
20           20|1.6e+00 1.7e+00 1.5e+01 1.3e+01
30           30|3.9e+00 2.7e+00 2.6e+01 2.0e+01
45           45|8.9e+00 4.6e+00 5.2e+01 2.9e+01
70           70|2.2e+01 8.5e+00 9.5e+01 4.3e+01
100         100|4.6e+01 1.4e+01 1.7e+02 6.6e+01
200         200|1.9e+02 2.4e+01 1.1e+02 1.3e+02
400         400|7.3e+02 4.2e+01 2.1e+02 2.5e+02
800         800|2.9e+03 8.5e+01 4.4e+02 4.8e+02
1600       1600|0.0e+00 1.8e+02 9.4e+02 9.6e+02
3200       3200|0.0e+00 3.7e+02 2.0e+03 1.9e+03
6400       6400|0.0e+00 7.4e+02 4.2e+03 3.8e+03
12800     12800|0.0e+00 1.5e+03 9.1e+03 7.6e+03

AFTER:

==== nbits = 62====
==== nb eval points == 1 * (poly length) ====
len     points |       GEOMETRIC PROGRESSION
len     points |iter    fast    w/ prec precomp
1             1|6.2e-03 1.5e-01 3.2e-02 7.4e-01
2             2|1.3e-02 1.8e-01 4.5e-02 1.4e+00
3             3|2.4e-02 2.5e-01 6.0e-02 2.1e+00
4             4|4.1e-02 2.9e-01 7.6e-02 2.7e+00
6             6|9.8e-02 4.2e-01 1.1e-01 4.0e+00
8             8|2.0e-01 5.5e-01 1.5e-01 5.5e+00
10           10|3.4e-01 6.8e-01 1.9e-01 6.4e+00
12           12|5.1e-01 8.9e-01 3.1e-01 7.6e+00
16           16|1.0e+00 1.3e+00 5.6e-01 1.0e+01
20           20|1.6e+00 1.7e+00 7.1e-01 1.3e+01
30           30|3.8e+00 2.7e+00 1.3e+00 1.9e+01
45           45|9.1e+00 4.6e+00 2.5e+00 2.8e+01
70           70|2.2e+01 8.5e+00 5.3e+00 4.3e+01
100         100|4.6e+01 1.5e+01 9.7e+00 6.2e+01
200         200|1.9e+02 2.4e+01 1.5e+01 1.3e+02
400         400|7.5e+02 4.3e+01 2.5e+01 2.5e+02
800         800|2.9e+03 8.5e+01 5.1e+01 5.0e+02
1600       1600|0.0e+00 1.9e+02 1.1e+02 1.0e+03
3200       3200|0.0e+00 3.8e+02 2.2e+02 2.0e+03
6400       6400|0.0e+00 7.5e+02 4.5e+02 3.9e+03
12800     12800|0.0e+00 1.5e+03 9.7e+02 7.8e+03

@vneiger
Copy link
Copy Markdown
Collaborator Author

vneiger commented Dec 5, 2025

Example timings, when the number of evaluations points is larger than the polynomial degree. This is the time for evaluation, excluding precomputation (which is the same before and after).

len    | 2*len points   |  3*len points    |  4*len points
len    |BEFORE  AFTER   |  BEFORE  AFTER   | BEFORE  AFTER
1      |3.9e-02 3.8e-02 |  4.4e-02 4.2e-02 | 4.9e-02 4.4e-02
2      |6.2e-02 5.4e-02 |  7.7e-02 6.2e-02 | 8.8e-02 6.8e-02
3      |1.0e-01 8.2e-02 |  1.4e-01 1.0e-01 | 1.8e-01 1.2e-01
4      |1.4e-01 1.1e-01 |  1.9e-01 1.4e-01 | 2.4e-01 1.6e-01
6      |2.1e-01 1.6e-01 |  3.0e-01 2.1e-01 | 3.8e-01 2.6e-01
8      |2.9e-01 2.2e-01 |  4.2e-01 2.9e-01 | 5.5e-01 3.5e-01
10     |3.8e-01 2.9e-01 |  5.6e-01 3.8e-01 | 7.4e-01 4.7e-01
12     |7.0e-01 5.0e-01 |  1.1e+00 7.1e-01 | 1.4e+00 9.6e-01
16     |1.2e+00 8.8e-01 |  1.9e+00 1.2e+00 | 2.6e+00 1.6e+00
20     |1.6e+00 1.1e+00 |  2.4e+00 1.6e+00 | 3.2e+00 2.0e+00
30     |3.0e+00 2.2e+00 |  4.5e+00 3.0e+00 | 6.2e+00 3.8e+00
45     |5.8e+00 4.2e+00 |  8.8e+00 5.8e+00 | 1.2e+01 7.3e+00
70     |1.2e+01 8.8e+00 |  2.9e+01 1.2e+01 | 3.9e+01 1.6e+01
100    |3.3e+01 1.6e+01 |  5.0e+01 3.3e+01 | 6.7e+01 4.1e+01
200    |2.1e+01 1.9e+01 |  3.4e+01 2.1e+01 | 4.2e+01 2.8e+01
400    |4.2e+01 3.6e+01 |  6.3e+01 4.3e+01 | 8.4e+01 5.7e+01
800    |8.7e+01 7.1e+01 |  1.3e+02 8.7e+01 | 1.7e+02 1.1e+02
1600   |1.8e+02 1.5e+02 |  2.7e+02 1.8e+02 | 3.5e+02 2.3e+02
3200   |3.8e+02 3.0e+02 |  5.6e+02 3.8e+02 | 7.5e+02 4.9e+02
6400   |8.1e+02 6.3e+02 |  1.2e+03 8.1e+02 | 1.6e+03 1.1e+03
12800  |1.7e+03 1.4e+03 |  2.6e+03 1.7e+03 | 3.3e+03 2.2e+03

@vneiger
Copy link
Copy Markdown
Collaborator Author

vneiger commented Dec 5, 2025

And finally, comparison of the previous implementation using mullow versus the use of fft_small midprod directly. This is a bad idea for small lengths, so I inserted some branching condition that seems to be fairly accurate at picking the fastest variant. This is the version present in the commits to be pushed in a minute.

len   |   npoints == len       |    npoints == 2*len     |    npoints == 3*len
len   |mullow  midprod branch  | mullow  midprod branch  | mullow  midprod branch
1     |3.3e-02 3.0e+00 3.3e-02 | 3.8e-02 3.0e+00 3.9e-02 | 4.3e-02 3.0e+00 4.3e-02
2     |4.5e-02 3.1e+00 4.5e-02 | 5.4e-02 3.0e+00 5.5e-02 | 6.2e-02 3.0e+00 6.4e-02
3     |6.0e-02 2.9e+00 6.0e-02 | 8.1e-02 3.0e+00 8.3e-02 | 1.0e-01 3.0e+00 1.2e-01
4     |7.6e-02 2.9e+00 7.7e-02 | 1.1e-01 3.0e+00 1.1e-01 | 1.3e-01 3.1e+00 1.4e-01
6     |1.1e-01 3.0e+00 1.1e-01 | 1.6e-01 3.1e+00 1.6e-01 | 2.1e-01 3.2e+00 2.1e-01
8     |1.5e-01 3.0e+00 1.5e-01 | 2.2e-01 3.2e+00 2.3e-01 | 2.9e-01 3.3e+00 3.0e-01
10    |1.9e-01 3.0e+00 2.0e-01 | 2.9e-01 3.2e+00 2.9e-01 | 3.8e-01 3.4e+00 3.9e-01
12    |3.2e-01 3.1e+00 3.2e-01 | 5.0e-01 3.3e+00 5.1e-01 | 6.8e-01 3.6e+00 7.1e-01
16    |5.5e-01 3.2e+00 5.6e-01 | 8.7e-01 3.4e+00 8.9e-01 | 1.2e+00 3.6e+00 1.3e+00
20    |7.2e-01 3.4e+00 7.2e-01 | 1.1e+00 3.7e+00 1.2e+00 | 1.6e+00 3.8e+00 1.6e+00
30    |1.4e+00 3.6e+00 1.4e+00 | 2.2e+00 4.0e+00 2.3e+00 | 3.0e+00 4.3e+00 3.1e+00
45    |2.6e+00 3.9e+00 2.6e+00 | 4.2e+00 4.5e+00 4.2e+00 | 5.8e+00 5.1e+00 5.1e+00
70    |5.4e+00 4.5e+00 4.6e+00 | 8.8e+00 8.3e+00 8.5e+00 | 1.2e+01 8.2e+00 8.2e+00
100   |9.7e+00 8.0e+00 8.1e+00 | 1.6e+01 8.4e+00 8.7e+00 | 3.4e+01 9.6e+00 9.7e+00
200   |1.5e+01 1.3e+01 1.3e+01 | 1.9e+01 1.9e+01 1.8e+01 | 2.1e+01 2.0e+01 1.9e+01
400   |2.6e+01 2.3e+01 2.3e+01 | 3.6e+01 3.4e+01 3.4e+01 | 4.3e+01 4.1e+01 4.0e+01
800   |5.3e+01 4.7e+01 4.8e+01 | 7.1e+01 6.4e+01 6.6e+01 | 8.9e+01 8.0e+01 8.1e+01
1600  |1.1e+02 9.7e+01 1.0e+02 | 1.5e+02 1.3e+02 1.4e+02 | 1.8e+02 1.7e+02 1.7e+02
3200  |2.3e+02 2.0e+02 2.1e+02 | 3.0e+02 2.8e+02 2.8e+02 | 3.9e+02 3.5e+02 3.5e+02
6400  |4.7e+02 4.3e+02 4.2e+02 | 6.3e+02 5.8e+02 5.8e+02 | 8.1e+02 7.3e+02 7.3e+02
12800 |1.0e+03 8.7e+02 8.7e+02 | 1.4e+03 1.2e+03 1.2e+03 | 1.7e+03 1.6e+03 1.6e+03

@vneiger
Copy link
Copy Markdown
Collaborator Author

vneiger commented Dec 5, 2025

And, to conclude, the improvement compared to what is in main, when the precomputation is the same as len (otherwise, speed-up is greater).

len   |  npoints == len | npts == 2*len   | npts == 3*len
len   | old     new     | old     new     | old     new
1     | 3.8e-02 3.3e-02 | 4.3e-02 3.9e-02 | 4.9e-02 4.3e-02
2     | 5.2e-02 4.5e-02 | 6.8e-02 5.5e-02 | 7.8e-02 6.4e-02
3     | 6.8e-02 6.0e-02 | 1.1e-01 8.3e-02 | 1.5e-01 1.2e-01
4     | 8.6e-02 7.7e-02 | 1.4e-01 1.1e-01 | 2.0e-01 1.4e-01
6     | 1.2e-01 1.1e-01 | 2.2e-01 1.6e-01 | 3.0e-01 2.1e-01
8     | 1.7e-01 1.5e-01 | 3.0e-01 2.3e-01 | 4.3e-01 3.0e-01
10    | 2.1e-01 2.0e-01 | 3.8e-01 2.9e-01 | 5.6e-01 3.9e-01
12    | 3.5e-01 3.2e-01 | 7.4e-01 5.1e-01 | 1.1e+00 7.1e-01
16    | 5.7e-01 5.6e-01 | 1.2e+00 8.9e-01 | 1.9e+00 1.3e+00
20    | 7.4e-01 7.2e-01 | 1.6e+00 1.2e+00 | 2.4e+00 1.6e+00
30    | 1.4e+00 1.4e+00 | 3.0e+00 2.3e+00 | 4.6e+00 3.1e+00
45    | 2.6e+00 2.6e+00 | 5.9e+00 4.2e+00 | 8.9e+00 5.1e+00
70    | 5.4e+00 4.6e+00 | 1.2e+01 8.5e+00 | 3.0e+01 8.2e+00
100   | 9.8e+00 8.1e+00 | 3.3e+01 8.7e+00 | 4.9e+01 9.7e+00
200   | 1.5e+01 1.3e+01 | 2.1e+01 1.8e+01 | 3.3e+01 1.9e+01
400   | 2.7e+01 2.3e+01 | 4.2e+01 3.4e+01 | 6.4e+01 4.0e+01
800   | 5.4e+01 4.8e+01 | 8.8e+01 6.6e+01 | 1.3e+02 8.1e+01
1600  | 1.1e+02 1.0e+02 | 1.8e+02 1.4e+02 | 2.7e+02 1.7e+02
3200  | 2.4e+02 2.1e+02 | 3.8e+02 2.8e+02 | 5.7e+02 3.5e+02
6400  | 4.7e+02 4.2e+02 | 8.1e+02 5.8e+02 | 1.2e+03 7.3e+02
12800 | 1.0e+03 8.7e+02 | 1.7e+03 1.2e+03 | 2.6e+03 1.6e+03

@vneiger vneiger marked this pull request as ready for review December 5, 2025 14:26
@vneiger vneiger merged commit 65764b5 into flintlib:main Dec 5, 2025
12 checks passed
@albinahlback
Copy link
Copy Markdown
Collaborator

Nice!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants