Skip to content

Karp-Markstein radix division#2588

Merged
fredrik-johansson merged 1 commit intoflintlib:mainfrom
fredrik-johansson:radix3
Mar 4, 2026
Merged

Karp-Markstein radix division#2588
fredrik-johansson merged 1 commit intoflintlib:mainfrom
fredrik-johansson:radix3

Conversation

@fredrik-johansson
Copy link
Copy Markdown
Collaborator

Update Newton division for radix to use the Karp-Markstein trick (fusing the multiplication by the numerator with the last Newton iteration, allowing it to be performed with half precision). Typical speedups are 10-20% for division with remainder and 20-30% for division (with maximal speedups of 40% and 90% respectively):

                              speedup       speedup
   ndigits       nlimbs       divrem        tdiv_q
         1          19         1.00x         1.18x
         2          38         1.01x         1.07x
         3          57         0.99x         1.06x
         4          76         0.99x         1.10x
         6         114         1.00x         1.07x
         9         171         1.00x         1.07x
        13         247         1.00x         1.06x
        19         361         0.99x         1.06x
        28         532         1.12x         1.26x
        42         798         1.14x         1.26x
        63        1197         1.21x         1.38x
        94        1786         1.39x         1.91x
       141        2679         1.37x         1.82x
       211        4009         1.14x         1.28x
       316        6004         1.09x         1.19x
       474        9006         1.12x         1.21x
       711       13509         1.12x         1.22x
      1066       20254         1.13x         1.25x
      1599       30381         1.12x         1.26x
      2398       45562         1.13x         1.25x
      3597       68343         1.13x         1.23x
      5395      102505         1.12x         1.24x
      8092      153748         1.13x         1.26x
     12138      230622         1.13x         1.24x
     18207      345933         1.14x         1.22x
     27310      518890         1.13x         1.21x
     40965      778335         1.13x         1.23x
     61447     1167493         1.12x         1.29x
     92170     1751230         1.14x         1.24x
    138255     2626845         1.15x         1.29x
    207382     3940258         1.14x         1.25x
    311073     5910387         1.16x         1.25x
    466609     8865571         1.13x         1.23x
    699913    13298347         1.15x         1.28x
   1049869    19947511         1.16x         1.25x
   1574803    29921257         1.14x         1.24x
   2362204    44881876         1.22x         1.35x
   3543306    67322814         1.20x         1.33x
   5314959   100984221         1.14x         1.21x
   7972438   151476322         1.13x         1.25x
  11958657   227214483         1.13x         1.21x
  17937985   340821715         1.12x         1.22x
  26906977   511232563         1.12x         1.23x
  40360465   766848835         1.12x         1.21x
  60540697  1150273243         1.14x         1.25x
  90811045  1725409855         1.15x         1.25x

Profile outputs:

$ build/radix/profile/p-divrem
   decimal      mpn  decimal        time        time    relative
    digits    limbs    limbs flint_mpn_divrem radix_divrem  time

        19        1        1    1.22e-08    5.22e-09    0.428x
        38        2        2    1.67e-08    6.92e-08    4.144x
        57        3        3    3.94e-08    1.18e-07    2.995x
        76        4        4    4.38e-08    1.63e-07    3.721x
       114        6        6    6.37e-08    2.53e-07    3.972x
       171        9        9    1.14e-07    4.26e-07    3.737x
       247       13       13    1.82e-07    7.42e-07    4.077x
       361       19       19    3.03e-07    1.38e-06    4.554x
       532       28       28    5.34e-07    1.82e-06    3.408x
       798       42       42    1.03e-06    2.91e-06    2.825x
      1197       63       63     2.1e-06    4.77e-06    2.271x
      1786       93       94    3.86e-06    1.08e-05    2.798x
      2679      140      141    7.62e-06    1.96e-05    2.572x
      4009      209      211     1.5e-05    3.63e-05    2.420x
      6004      312      316    2.82e-05    5.59e-05    1.982x
      9006      468      474    5.34e-05    8.21e-05    1.537x
     13509      702      711    0.000105    0.000134    1.276x
     20254     1052     1066    0.000186    0.000197    1.059x
     30381     1577     1599    0.000319    0.000293    0.918x
     45562     2365     2398    0.000484     0.00044    0.909x
     68343     3548     3597    0.000757    0.000665    0.878x
    102505     5321     5395     0.00122     0.00102    0.836x
    153748     7981     8092     0.00184      0.0015    0.815x
    230622    11971    12138     0.00278     0.00238    0.856x
    345933    17956    18207     0.00423     0.00368    0.870x
    518890    26934    27310     0.00637     0.00558    0.876x
    778335    40400    40965      0.0101     0.00858    0.850x
   1167493    60599    61447       0.015       0.013    0.867x
   1751230    90898    92170      0.0239      0.0206    0.862x
   2626845   136347   138255      0.0356      0.0332    0.933x
   3940258   204520   207382      0.0577      0.0511    0.886x
   5910387   306780   311073      0.0902      0.0811    0.899x
   8865571   460169   466609       0.143       0.126    0.881x
  13298347   690253   699913       0.222       0.218    0.982x
  19947511  1035379  1049869       0.339       0.357    1.053x
  29921257  1553067  1574803       0.553       0.537    0.971x
  44881876  2329600  2362204       0.845       0.943    1.116x
  67322814  3494400  3543306       1.382       1.397    1.011x
 100984221  5241599  5314959       2.262        2.49    1.101x
 151476322  7862398  7972438       3.612       3.709    1.027x
 227214483 11793597 11958657        5.53       6.232    1.127x
 340821715 17690395 17937985       8.909       10.08    1.131x
 511232563 26535591 26906977      15.526      15.124    0.974x
 766848835 39803386 40360465      24.433      23.914    0.979x
1150273243 59705079 60540697      39.613      36.201    0.914x
1725409855 89557617 90811045      61.607      59.401    0.964x

$ build/radix/profile/p-tdiv_q 
   decimal      mpn  decimal        time        time    relative
    digits    limbs    limbs flint_mpn_tdiv_q radix_tdiv_q  time

        19        1        1    1.27e-08    5.37e-09    0.423x
        38        2        2    1.74e-08    5.96e-08    3.425x
        57        3        3    4.13e-08    9.55e-08    2.312x
        76        4        4    4.61e-08    1.35e-07    2.928x
       114        6        6    6.62e-08     2.1e-07    3.172x
       171        9        9    1.19e-07     3.5e-07    2.941x
       247       13       13    1.87e-07    6.22e-07    3.326x
       361       19       19    3.22e-07    1.17e-06    3.634x
       532       28       28    5.65e-07    1.43e-06    2.531x
       798       42       42    1.09e-06    2.22e-06    2.037x
      1197       63       63     2.2e-06    3.38e-06    1.536x
      1786       93       94    4.14e-06    5.81e-06    1.403x
      2679      140      141    8.16e-06    1.04e-05    1.275x
      4009      209      211    1.61e-05    2.59e-05    1.609x
      6004      312      316    2.98e-05    4.11e-05    1.379x
      9006      468      474    5.63e-05    6.24e-05    1.108x
     13509      702      711    0.000112    0.000103    0.920x
     20254     1052     1066    0.000151    0.000148    0.980x
     30381     1577     1599    0.000254    0.000223    0.878x
     45562     2365     2398    0.000364    0.000331    0.909x
     68343     3548     3597    0.000584    0.000507    0.868x
    102505     5321     5395    0.000942    0.000768    0.815x
    153748     7981     8092     0.00146     0.00115    0.788x
    230622    11971    12138     0.00218     0.00184    0.844x
    345933    17956    18207     0.00327     0.00283    0.865x
    518890    26934    27310     0.00503     0.00433    0.861x
    778335    40400    40965     0.00771     0.00659    0.855x
   1167493    60599    61447      0.0117     0.00998    0.853x
   1751230    90898    92170      0.0183       0.016    0.874x
   2626845   136347   138255      0.0279      0.0249    0.892x
   3940258   204520   207382      0.0442      0.0389    0.880x
   5910387   306780   311073      0.0692      0.0601    0.868x
   8865571   460169   466609        0.11       0.095    0.864x
  13298347   690253   699913       0.171       0.166    0.971x
  19947511  1035379  1049869       0.259        0.26    1.004x
  29921257  1553067  1574803       0.419       0.402    0.959x
  44881876  2329600  2362204       0.689       0.624    0.906x
  67322814  3494400  3543306       1.076       0.961    0.893x
 100984221  5241599  5314959       1.783       1.793    1.006x
 151476322  7862398  7972438        2.77       2.714    0.980x
 227214483 11793597 11958657        4.33       4.669    1.078x
 340821715 17690395 17937985       6.875       7.258    1.056x
 511232563 26535591 26906977      11.062       11.09    1.003x
 766848835 39803386 40360465      17.786      17.354    0.976x
1150273243 59705079 60540697      28.989      25.786    0.890x
1725409855 89557617 90811045      44.381      43.392    0.978x

This is now getting substantially faster than the mpn/fmpz division code for many sizes; quite likely the mpn/fmpz code can be made 20-30% faster than its current speed by using all the current radix tricks.

@fredrik-johansson fredrik-johansson merged commit 269f4df into flintlib:main Mar 4, 2026
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant