Optimize `nmod_mat_lu_classical_delayed` by fredrik-johansson · Pull Request #2640 · flintlib/flint

fredrik-johansson · 2026-04-21T07:26:44Z

Applies the same optimizations as #2637 to classical Gaussian elimination for nmod_mat, and adjusts the algorithm selection in nmod_mat_lu. This mainly speeds up linear algebra up to dimension about 100, though for some small moduli there is a decent speedup up to dimension about 1000.

Nearly-optimal cutoffs between classical and recursive LU have been brute-forced for each bit size with and without --with-blas. Very large threshold values (some over 1400) indicate that matrix multiplication is poorly optimized.

Speedup for nmod_mat_lu, without BLAS:

dim \ bits    2      8     20     27     28     32     40     56     60     62     63     64     64(near UWORD_MAX)
       4    1.324  1.263  1.211  1.164  1.200  1.000  1.017  1.009  1.018  1.012  1.074  1.004  1.008
       6    1.349  1.393  1.357  1.251  1.218  1.009  1.003  1.002  1.000  0.981  1.006  0.992  1.004
       8    1.370  1.353  1.341  1.283  1.264  1.208  1.243  1.202  1.198  1.156  1.155  1.074  1.077
      12    1.472  1.445  1.324  1.313  1.269  1.264  1.281  1.262  1.258  1.191  1.224  1.174  1.174
      16    1.503  1.491  1.354  1.331  1.343  1.180  1.248  1.237  1.229  1.215  1.232  1.266  1.245
      24    1.392  1.408  1.285  1.264  1.277  1.214  1.285  1.268  1.264  1.251  1.256  1.273  1.259
      32    1.418  1.394  1.283  1.263  1.269  1.208  1.224  1.198  1.193  1.248  1.208  1.239  1.229
      48    1.496  1.484  1.376  1.379  1.379  1.157  1.200  1.187  1.182  1.182  1.168  1.185  1.173
      64    1.182  1.933  1.472  1.462  1.452  1.123  1.203  1.203  1.202  1.123  1.134  1.156  1.141
      96    1.246  1.626  1.533  1.527  1.527  1.094  1.098  1.102  1.107  1.097  1.070  1.082  1.071
     128    1.128  1.646  1.530  1.541  1.534  1.073  1.093  1.092  1.092  1.055  1.054  1.065  1.055
     192    1.125  1.424  1.905  1.908  1.911  1.055  1.040  1.033  1.033  1.045  1.028  1.034  1.056
     256    0.991  1.366  1.843  1.792  1.827  1.039  1.034  1.031  1.043  1.035  1.024  1.024  1.024
     384    1.039  1.085  1.758  1.718  1.183  1.018  1.019  1.019  1.019  1.018  1.016  1.014  1.014
     512    0.985  1.076  1.648  1.626  1.169  1.000  0.992  0.992  1.008  0.996  1.007  1.003  1.003
     768    1.031  0.985  1.637  1.645  1.067  1.014  0.999  1.001  1.001  0.988  1.009  1.020  1.040
    1024    1.003  0.957  1.697  1.664  1.140  0.995  1.023  1.023  1.029  1.026  1.035  1.018  1.022
    1536    1.086  1.000  1.221  1.212  1.057  1.012  1.022  1.016  1.013  1.010  1.025  1.007  1.015
    2048    1.060  1.009  1.072  1.077  1.057  0.973  1.009  1.005  1.021  1.006  1.006  1.007  1.008

Speedup for nmod_mat_lu, with BLAS:

dim \ bits    2      8     20     27     28     32     40     56     60     62     63     64     64(near UWORD_MAX)
       4    1.150  1.089  1.167  1.149  1.153  1.012  1.023  1.000  1.018  1.004  1.012  1.000  1.008
       6    1.313  1.225  1.215  1.212  1.181  1.003  0.997  1.002  1.002  0.998  1.004  1.002  1.004
       8    1.304  1.276  1.253  1.214  1.256  1.212  1.191  1.181  1.177  1.154  1.155  1.076  1.078
      12    1.466  1.421  1.319  1.290  1.268  1.262  1.267  1.223  1.227  1.199  1.206  1.172  1.167
      16    1.432  1.405  1.334  1.277  1.290  1.223  1.242  1.248  1.235  1.219  1.191  1.264  1.242
      24    1.397  1.364  1.282  1.254  1.261  1.204  1.260  1.250  1.261  1.239  1.256  1.274  1.257
      32    1.415  1.404  1.304  1.286  1.289  1.178  1.209  1.212  1.209  1.217  1.207  1.237  1.227
      48    1.495  1.498  1.386  1.379  1.385  1.164  1.179  1.179  1.175  1.199  1.173  1.191  1.181
      64    1.185  1.894  1.477  1.467  1.472  1.112  1.164  1.164  1.170  1.105  1.148  1.156  1.143
      96    1.245  1.623  1.519  1.521  1.559  1.094  1.076  1.144  1.115  1.090  1.050  1.109  1.105
     128    1.124  1.667  1.519  1.515  1.515  1.084  1.084  1.077  1.083  1.053  1.048  1.056  1.054
     192    1.121  1.419  1.905  1.926  1.891  1.055  1.026  1.033  1.040  1.052  0.984  1.020  1.015
     256    0.972  1.373  1.800  1.810  1.837  1.045  1.026  1.031  1.031  1.024  1.031  1.026  1.022
     384    1.043  1.103  1.767  1.685  1.183  1.029  1.009  1.000  1.009  1.027  0.984  0.993  0.986
     512    0.985  1.127  1.235  1.698  1.176  1.008  0.988  0.992  0.992  1.004  1.003  1.003  1.006
     768    1.012  0.989  1.256  1.643  1.067  1.017  1.000  0.947  0.996  1.011  0.977  1.020  1.010
    1024    0.986  0.983  1.127  1.591  1.091  0.992  1.000  1.006  0.978  1.005  1.005  1.005  1.009
    1536    1.069  0.970  1.200  1.210  1.090  1.055  1.031  1.022  1.032  1.039  1.020  1.018  1.012
    2048    1.073  1.026  1.149  1.036  1.081  1.028  0.997  0.997  0.998  1.011  1.007  1.007  1.009

fredrik-johansson added 2 commits April 21, 2026 08:50

Optimize nmod_mat_lu_classical_delayed

6f8cf18

Improve test coverage

b0d7cea

fredrik-johansson merged commit 77b81af into flintlib:main Apr 21, 2026
12 checks passed

fredrik-johansson deleted the n3 branch April 24, 2026 13:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `nmod_mat_lu_classical_delayed`#2640

Optimize `nmod_mat_lu_classical_delayed`#2640
fredrik-johansson merged 2 commits intoflintlib:mainfrom
fredrik-johansson:n3

fredrik-johansson commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fredrik-johansson commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant