Implement Kinoshita-Li power series composition by fredrik-johansson · Pull Request #2658 · flintlib/flint

fredrik-johansson · 2026-05-03T08:15:12Z

Adds _gr_poly_compose_series_kinoshita_li / gr_poly_compose_series_kinoshita_li for power series composition in $O(M(n) \log n)$ over generic rings using the Kinoshita-Li algorithm (by @EntropyIncreaser). See discussion in #1911.

The threshold where this starts to beat the Brent-Kung baby-step giant-step algorithm seems to be between 100 and 1000, obviously depending on the ring and input; see below for benchmarks. Note that this algorithm might be preferable to Brent-Kung even in cases where it is not faster, as it uses less memory asymptotically ($O(n \log n)$ vs $O(n^{1.5})$ coefficients).

I have not yet switched over any other series composition and reversion functions (e.g. fmpz_poly_compose_series) to use this algorithm.

This code was written using Claude Sonnet. Methodology: I first asked Claude to read the paper and the comments in #1911 and implement the algorithm following FLINT coding conventions, using gr_poly_compose_series_brent_kung as a guideline for the API. I explicitly instructed it to represent the bivariate polynomials in flat format and to implement KS multiplication. The first draft looked OK structurally but failed the tests. When I asked Claude to debug, it asked me for a copy of @vneiger's Sage implementation. With this as a blueprint, Claude was able to fix the bugs. After that, I reviewed the code and prompted Claude to make a series improvements:

Convert from recursive to iterative form
Avoid unnecessary temporary vectors and copying
Fix an unsafe pointer swap of temporary vectors
Avoid excessive zero-padding and tighten allocations
Switch the Kronecker substitution format for bivariate multiplication from Claude's initial choice of x-major to y-major. I did extensive benchmarking to verify that y-major consistently performs better, and sometimes dramatically better with arb whose univariate polynomial multiplication is designed for typical univariate input and poorly tuned for coefficients in the scrambled KS order
(Not kept in the final version): use Harvey's multipoint KS instead of standard KS; on every example I tested, this performed worse, but this may have been due to not being implemented optimally and may be worth retrying in the future
Fix errors in comments

nmod example

Timings in seconds to compute $f(g(x)) \bmod x^n$ with random $f$ and $g$, nmod coefficients mod $p = 2^{63} + 29$. Speedup ratios are relative to Brent-Kung.

        n     Brent-Kung Kinoshita-Li (speedup)
       100      7.79e-05    0.000162  (0.481)
       200      0.000285    0.000347  (0.821)
       400      0.000869    0.000839  (1.036)
       800       0.00254     0.00193  (1.316)
      1600       0.00763     0.00434  (1.758)
      3200        0.0238     0.00974  (2.444)
      6400        0.0776      0.0218  (3.560)
     12800         0.257      0.0487  (5.277)
     25600         0.882       0.115  (7.670)
     51200         3.036       0.266  (11.414)
    102400         10.82       0.646  (16.749)
    204800         37.76       1.556  (24.267)
    409600       140.573       3.736  (37.627)
    819200       497.694      10.274  (48.442)

Excellent speedup with evident quasilinear asymptotics!

fmpz example

Example timings with fmpz coefficients, $f = g = \sum_{k \ge 0} F_k x^k$ (Fibonacci generating function):

        n     Brent-Kung Kinoshita-Li (speedup)
       100      0.000298    0.000443  (0.673)
       200       0.00243     0.00284  (0.856)
       400        0.0153      0.0144  (1.062)
       800         0.108      0.0683  (1.581)
      1600         0.682       0.339  (2.012)
      3200         4.651       1.716  (2.710)
      6400        33.263       8.462  (3.931)
     12800        266.59      43.331  (6.152)

fmpq example

Example with fmpq coefficients, $f = g = \log(1+x)$:

        n     Brent-Kung Kinoshita-Li (speedup)
       100       0.00516      0.0053  (0.974)
       200        0.0333      0.0268  (1.243)
       400         0.235       0.169  (1.391)
       800         1.821       1.046  (1.741)
      1600        15.511       6.986  (2.220)
      3200       153.219      48.735  (3.144)

Note that the implementation has not been optimized specifically for fraction fields; one can presumably gain something by handling denominators less generically.

arb example

Example with arb coefficients, prec = 128, $f = g = \log(1+x)$:

        n     Brent-Kung Kinoshita-Li (speedup)      [x^(n-1)]  (Brent-Kung)      [x^(n-1)]  (Kinoshita-Li)
       100      0.000866     0.00178  (0.487)           [5e+17 +/- 9.13e-20]           [5e+17 +/- 8.90e-20] 
       200       0.00392     0.00606  (0.647)               [2e+37 +/- 7.35]               [2e+37 +/- 6.40] 
       400        0.0144       0.017  (0.847)           [8e+76 +/- 4.50e+40]           [8e+76 +/- 4.14e+40] 
       800        0.0597      0.0603  (0.990)         [2e+156 +/- 2.22e+120]         [2e+156 +/- 1.91e+120] 
      1600         0.218       0.174  (1.253)         [2e+315 +/- 4.61e+279]         [2e+315 +/- 4.32e+279] 
      3200         0.946       0.545  (1.736)         [5e+633 +/- 2.55e+598]         [5e+633 +/- 2.26e+598] 
      6400         4.441       2.008  (2.212)       [8e+1270 +/- 6.44e+1235]       [8e+1270 +/- 6.22e+1235] 
     12800        23.467       8.449  (2.777)       [3e+2545 +/- 5.20e+2510]       [3e+2545 +/- 4.75e+2510] 
     25600       130.642      37.419  (3.491)       [8e+5094 +/- 2.77e+5060]       [8e+5094 +/- 2.74e+5060]

Same polynomials with prec = 3333:

        n     Brent-Kung Kinoshita-Li (speedup)      [x^(n-1)]  (Brent-Kung)      [x^(n-1)]  (Kinoshita-Li)
       100        0.0175       0.023  (0.761)          [5e+17 +/- 1.53e-984]          [5e+17 +/- 1.61e-984] 
       200        0.0621      0.0575  (1.080)          [2e+37 +/- 1.23e-964]          [2e+37 +/- 1.17e-964] 
       400         0.225       0.154  (1.461)          [8e+76 +/- 8.11e-925]          [8e+76 +/- 7.63e-925] 
       800         0.827        0.39  (2.121)         [2e+156 +/- 3.75e-845]         [2e+156 +/- 3.55e-845] 
      1600         2.538       1.008  (2.518)         [2e+315 +/- 8.40e-686]         [2e+315 +/- 8.05e-686] 
      3200         8.114       2.715  (2.989)         [5e+633 +/- 4.32e-367]         [5e+633 +/- 4.20e-367] 
      6400         27.16       8.036  (3.380)        [8e+1270 +/- 1.18e+271]        [8e+1270 +/- 1.15e+271] 
     12800       100.991      24.848  (4.064)       [3e+2545 +/- 8.82e+1545]       [3e+2545 +/- 8.76e+1545]

Same polynomials with prec = 333333:

        n     Brent-Kung Kinoshita-Li (speedup)      [x^(n-1)]  (Brent-Kung)      [x^(n-1)]  (Kinoshita-Li)
       100         3.965       3.735  (1.062)       [5e+17 +/- 1.93e-100324]       [5e+17 +/- 2.04e-100324] 
       200        15.886       9.082  (1.749)       [2e+37 +/- 1.56e-100304]       [2e+37 +/- 1.48e-100304] 
       400        62.491      22.944  (2.724)       [8e+76 +/- 1.02e-100264]       [8e+76 +/- 9.63e-100265] 
       800       244.042      63.805  (3.825)      [2e+156 +/- 4.74e-100185]      [2e+156 +/- 4.48e-100185]

Numerical stability with arb coefficients seems to be quite similar to that of Brent-Kung. I was worried in advance that Kinoshita-Li might be numerically unstable, but it turns out to be very well behaved! The only catch is that it isn't consistently faster than Brent-Kung at low precision (where numerically stable polynomial multiplication is not necessarily quasilinear), but it clearly wins when the precision grows.

qqbar example

Example in a ring where we currently only use schoolbook multiplication: $f = g = \sum_{k \ge 1} (\sqrt{2}+k) x^k$ with qqbar coefficients.

        n     Brent-Kung Kinoshita-Li (speedup)
        25          0.13       0.138  (0.942)
        50          1.05       0.852  (1.232)
       100         7.355       4.968  (1.480)
       200        54.726      29.232  (1.872)
       400       393.888     179.693  (2.192)

Multivariate example

Example with gr_poly(nmod) coefficients, $f = g = \sum_k (k+t) x^k \in ((\mathbb{Z}/17 \mathbb{Z})[t])[x]$.

        n     Brent-Kung Kinoshita-Li (speedup)
        25      7.29e-05    0.000201  (0.363)
        50      0.000329    0.000852  (0.386)
       100       0.00151      0.0038  (0.397)
       200       0.00757      0.0183  (0.414)
       400        0.0419       0.134  (0.313)
       800         0.305       0.699  (0.436)
      1600         2.068       3.821  (0.541)
      3200         13.34      17.828  (0.748)
      6400        81.608      89.788  (0.909)

Here Kinoshita-Li seems to underperform. We might want some heuristic to detect such rings before using it by default in gr_poly_compose_series.

…orithm selection for composition and reversion

fredrik-johansson · 2026-05-03T21:12:39Z

I have now added fmpq_poly_compose_series_kinoshita_li which is much faster than the gr version with fmpq coefficients as it works with polynomials on common denominators. Claude was able to generate this translation of the gr version essentially in one shot, requiring only minor manual cleanup.

In addition, I have optimized fmpq_poly_compose_series_brent_kung by using fmpz_mat with separate denominators instead of fmpq_mat, which speeds up composition 1.5x-ish for small to moderate $n$.

The main function fmpq_poly_compose_series has been updated to choose Kinoshita-Li instead of Brent-Kung for $n \ge 250$.

Also, fmpq_poly_revert_series has been updated to use Newton iteration except for small $n$ now that composition is significantly faster.

Some timing results for fmpq_poly, old code vs new (speedup for small $n$ due to optimized Brent-Kung; for larger $n$ due to using Kinoshita-Li):

Composition, $f = g = \log(1+x)$:

       n            old           new    (speedup)
      12       6.12e-06       2.5e-06    (2.448)
      25       4.33e-05      2.76e-05    (1.569)
      50       0.000417       0.00024    (1.738)
     100         0.0034       0.00184    (1.848)
     200         0.0217        0.0134    (1.619)
     400          0.155        0.0769    (2.016)
     800          1.195         0.512    (2.334)
    1600          9.823         3.053    (3.217)
    3200         97.485        18.235    (5.346)

Composition, $f = g = \exp(x)-1$:

       n            old           new    (speedup)
      12       5.47e-06      2.16e-06    (2.532)
      25        7.5e-05      4.89e-05    (1.534)
      50       0.000592      0.000357    (1.658)
     100        0.00402        0.0026    (1.546)
     200         0.0265        0.0195    (1.359)
     400          0.205         0.136    (1.507)
     800           1.64         0.768    (2.135)
    1600         13.877         4.818    (2.880)

Composition, $f = g = \sqrt{1+x}-1$:

       n            old           new    (speedup)
      12        5.7e-06       2.1e-06    (2.714)
      25        3.7e-05      2.24e-05    (1.652)
      50       0.000333       0.00019    (1.753)
     100        0.00158        0.0011    (1.436)
     200        0.00824       0.00762    (1.081)
     400         0.0482        0.0346    (1.393)
     800          0.321         0.177    (1.814)
    1600            2.3         0.859    (2.678)
    3200         17.427         4.327    (4.028)

Composition, $f = \log(1+x), g = x/(1-x)/3$:

       n            old           new    (speedup)
      12       5.45e-06      1.81e-06    (3.011)
      25       1.85e-05      6.78e-06    (2.729)
      50       8.65e-05      6.42e-05    (1.347)
     100       0.000389       0.00028    (1.389)
     200        0.00265       0.00203    (1.305)
     400         0.0151        0.0144    (1.049)
     800         0.0809        0.0753    (1.074)
    1600          0.472         0.395    (1.195)
    3200          2.568         1.888    (1.360)
    6400         15.632         9.701    (1.611)

Composition, $f = g = x/(1-x)/3$:

       n            old           new    (speedup)
      12       4.63e-06      1.46e-06    (3.171)
      25       1.45e-05      4.11e-06    (3.528)
      50       5.44e-05      2.44e-05    (2.230)
     100       0.000202      0.000113    (1.788)
     200         0.0012      0.000851    (1.410)
     400        0.00678       0.00902    (0.752)
     800         0.0357        0.0416    (0.858)
    1600          0.219         0.231    (0.948)
    3200          1.279         1.116    (1.146)
    6400          7.179         5.902    (1.216)

Reversion, $f = \log(1+x)$:

       n            old           new    (speedup)
      12       4.42e-06      4.46e-06    (0.991)
      25       6.59e-05      3.62e-05    (1.820)
      50        0.00039      0.000349    (1.117)
     100        0.00259       0.00277    (0.935)
     200         0.0195        0.0172    (1.134)
     400          0.184         0.134    (1.373)
     800          1.928         0.814    (2.369)
    1600         21.251         4.987    (4.261)
    3200        265.237        33.305    (7.964)

Reversion, $f = \exp(x)-1$:

       n            old           new    (speedup)
      12       3.62e-06      3.59e-06    (1.008)
      25       5.91e-05      3.58e-05    (1.651)
      50       0.000384      0.000305    (1.259)
     100        0.00264       0.00213    (1.239)
     200         0.0198        0.0148    (1.338)
     400          0.195         0.113    (1.726)
     800          1.903          0.63    (3.021)
    1600         20.784         3.826    (5.432)
    3200        251.899        23.058    (10.925)

Reversion, $f = \sqrt{1+x}-1$:

       n            old           new    (speedup)
      12       2.59e-06      2.58e-06    (1.004)
      25       2.21e-05      9.22e-06    (2.397)
      50       0.000165      6.34e-05    (2.603)
     100         0.0012      0.000262    (4.580)
     200        0.00575        0.0015    (3.833)
     400         0.0316       0.00642    (4.922)
     800          0.219        0.0321    (6.822)
    1600          1.669         0.418    (3.993)
    3200         13.752         3.536    (3.889)
    6400         132.18        19.758    (6.690)

Reversion, $f = x/(1-x)/3$:

       n            old           new    (speedup)
      12       1.38e-06      1.43e-06    (0.965)
      25       3.66e-06      6.47e-06    (0.566)
      50       1.96e-05       0.00011    (0.178)
     100       7.59e-05      0.000949    (0.080)
     200       0.000516       0.00525    (0.098)
     400        0.00257        0.0273    (0.094)
     800         0.0108         0.173    (0.062)
    1600          0.053         1.193    (0.044)
    3200          0.351         7.977    (0.044)
    6400          2.061        56.239    (0.037)

(The last example having a special form where the previous default of fast Lagrange inversion is particularly efficient; it would be possible to detect such special input, but not sure if it's worth it.)

fredrik-johansson · 2026-05-04T11:40:53Z

Now enabled by default for all series composition functions, and reversion functions have been updated to favor Newton iteration for sufficiently large $n$ now that composition is asymptotically fast.

Algorithm cutoffs are always going to be suboptimal for some inputs, but hopefully this will improve performance in most cases.

)

fredrik-johansson added 3 commits May 3, 2026 00:40

Implement Kinoshita-Li power series composition

2d62098

Optimize fmpq_poly_compose_series_brent_kung

5012f2c

Add Kinoshita-Li series composition for fmpq_poly; adjust default alg…

75f4c50

…orithm selection for composition and reversion

Update series compositions cutoffs; tweaks to KL composition code

689eba5

fredrik-johansson merged commit 65198c1 into flintlib:main May 5, 2026
13 checks passed

edgarcosta mentioned this pull request May 5, 2026

docs(fmpq_poly): fix "automaticallyused" typo introduced in #2658 #2662

Merged

albinahlback pushed a commit that referenced this pull request May 5, 2026

docs(fmpq_poly): fix "automaticallyused" typo introduced in #2658 (#2662

1f650d0

)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Kinoshita-Li power series composition#2658

Implement Kinoshita-Li power series composition#2658
fredrik-johansson merged 4 commits intoflintlib:mainfrom
fredrik-johansson:compose

fredrik-johansson commented May 3, 2026

Uh oh!

fredrik-johansson commented May 3, 2026 •

edited

Loading

Uh oh!

fredrik-johansson commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fredrik-johansson commented May 3, 2026

nmod example

fmpz example

fmpq example

arb example

qqbar example

Multivariate example

Uh oh!

fredrik-johansson commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fredrik-johansson commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fredrik-johansson commented May 3, 2026 •

edited

Loading