Skip to content

Optimize nmod_mat_lu_classical_delayed#2640

Merged
fredrik-johansson merged 2 commits intoflintlib:mainfrom
fredrik-johansson:n3
Apr 21, 2026
Merged

Optimize nmod_mat_lu_classical_delayed#2640
fredrik-johansson merged 2 commits intoflintlib:mainfrom
fredrik-johansson:n3

Conversation

@fredrik-johansson
Copy link
Copy Markdown
Collaborator

Applies the same optimizations as #2637 to classical Gaussian elimination for nmod_mat, and adjusts the algorithm selection in nmod_mat_lu. This mainly speeds up linear algebra up to dimension about 100, though for some small moduli there is a decent speedup up to dimension about 1000.

Nearly-optimal cutoffs between classical and recursive LU have been brute-forced for each bit size with and without --with-blas. Very large threshold values (some over 1400) indicate that matrix multiplication is poorly optimized.

Speedup for nmod_mat_lu, without BLAS:

dim \ bits    2      8     20     27     28     32     40     56     60     62     63     64     64(near UWORD_MAX)
       4    1.324  1.263  1.211  1.164  1.200  1.000  1.017  1.009  1.018  1.012  1.074  1.004  1.008
       6    1.349  1.393  1.357  1.251  1.218  1.009  1.003  1.002  1.000  0.981  1.006  0.992  1.004
       8    1.370  1.353  1.341  1.283  1.264  1.208  1.243  1.202  1.198  1.156  1.155  1.074  1.077
      12    1.472  1.445  1.324  1.313  1.269  1.264  1.281  1.262  1.258  1.191  1.224  1.174  1.174
      16    1.503  1.491  1.354  1.331  1.343  1.180  1.248  1.237  1.229  1.215  1.232  1.266  1.245
      24    1.392  1.408  1.285  1.264  1.277  1.214  1.285  1.268  1.264  1.251  1.256  1.273  1.259
      32    1.418  1.394  1.283  1.263  1.269  1.208  1.224  1.198  1.193  1.248  1.208  1.239  1.229
      48    1.496  1.484  1.376  1.379  1.379  1.157  1.200  1.187  1.182  1.182  1.168  1.185  1.173
      64    1.182  1.933  1.472  1.462  1.452  1.123  1.203  1.203  1.202  1.123  1.134  1.156  1.141
      96    1.246  1.626  1.533  1.527  1.527  1.094  1.098  1.102  1.107  1.097  1.070  1.082  1.071
     128    1.128  1.646  1.530  1.541  1.534  1.073  1.093  1.092  1.092  1.055  1.054  1.065  1.055
     192    1.125  1.424  1.905  1.908  1.911  1.055  1.040  1.033  1.033  1.045  1.028  1.034  1.056
     256    0.991  1.366  1.843  1.792  1.827  1.039  1.034  1.031  1.043  1.035  1.024  1.024  1.024
     384    1.039  1.085  1.758  1.718  1.183  1.018  1.019  1.019  1.019  1.018  1.016  1.014  1.014
     512    0.985  1.076  1.648  1.626  1.169  1.000  0.992  0.992  1.008  0.996  1.007  1.003  1.003
     768    1.031  0.985  1.637  1.645  1.067  1.014  0.999  1.001  1.001  0.988  1.009  1.020  1.040
    1024    1.003  0.957  1.697  1.664  1.140  0.995  1.023  1.023  1.029  1.026  1.035  1.018  1.022
    1536    1.086  1.000  1.221  1.212  1.057  1.012  1.022  1.016  1.013  1.010  1.025  1.007  1.015
    2048    1.060  1.009  1.072  1.077  1.057  0.973  1.009  1.005  1.021  1.006  1.006  1.007  1.008

Speedup for nmod_mat_lu, with BLAS:

dim \ bits    2      8     20     27     28     32     40     56     60     62     63     64     64(near UWORD_MAX)
       4    1.150  1.089  1.167  1.149  1.153  1.012  1.023  1.000  1.018  1.004  1.012  1.000  1.008
       6    1.313  1.225  1.215  1.212  1.181  1.003  0.997  1.002  1.002  0.998  1.004  1.002  1.004
       8    1.304  1.276  1.253  1.214  1.256  1.212  1.191  1.181  1.177  1.154  1.155  1.076  1.078
      12    1.466  1.421  1.319  1.290  1.268  1.262  1.267  1.223  1.227  1.199  1.206  1.172  1.167
      16    1.432  1.405  1.334  1.277  1.290  1.223  1.242  1.248  1.235  1.219  1.191  1.264  1.242
      24    1.397  1.364  1.282  1.254  1.261  1.204  1.260  1.250  1.261  1.239  1.256  1.274  1.257
      32    1.415  1.404  1.304  1.286  1.289  1.178  1.209  1.212  1.209  1.217  1.207  1.237  1.227
      48    1.495  1.498  1.386  1.379  1.385  1.164  1.179  1.179  1.175  1.199  1.173  1.191  1.181
      64    1.185  1.894  1.477  1.467  1.472  1.112  1.164  1.164  1.170  1.105  1.148  1.156  1.143
      96    1.245  1.623  1.519  1.521  1.559  1.094  1.076  1.144  1.115  1.090  1.050  1.109  1.105
     128    1.124  1.667  1.519  1.515  1.515  1.084  1.084  1.077  1.083  1.053  1.048  1.056  1.054
     192    1.121  1.419  1.905  1.926  1.891  1.055  1.026  1.033  1.040  1.052  0.984  1.020  1.015
     256    0.972  1.373  1.800  1.810  1.837  1.045  1.026  1.031  1.031  1.024  1.031  1.026  1.022
     384    1.043  1.103  1.767  1.685  1.183  1.029  1.009  1.000  1.009  1.027  0.984  0.993  0.986
     512    0.985  1.127  1.235  1.698  1.176  1.008  0.988  0.992  0.992  1.004  1.003  1.003  1.006
     768    1.012  0.989  1.256  1.643  1.067  1.017  1.000  0.947  0.996  1.011  0.977  1.020  1.010
    1024    0.986  0.983  1.127  1.591  1.091  0.992  1.000  1.006  0.978  1.005  1.005  1.005  1.009
    1536    1.069  0.970  1.200  1.210  1.090  1.055  1.031  1.022  1.032  1.039  1.020  1.018  1.012
    2048    1.073  1.026  1.149  1.036  1.081  1.028  0.997  0.997  0.998  1.011  1.007  1.007  1.009

@fredrik-johansson fredrik-johansson merged commit 77b81af into flintlib:main Apr 21, 2026
12 checks passed
@fredrik-johansson fredrik-johansson deleted the n3 branch April 24, 2026 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant