Implement Kinoshita-Li power series composition#2658
Implement Kinoshita-Li power series composition#2658fredrik-johansson merged 4 commits intoflintlib:mainfrom
Conversation
…orithm selection for composition and reversion
|
I have now added In addition, I have optimized The main function Also, Some timing results for Composition, Composition, Composition, Composition, Composition, Reversion, Reversion, Reversion, Reversion, (The last example having a special form where the previous default of fast Lagrange inversion is particularly efficient; it would be possible to detect such special input, but not sure if it's worth it.) |
|
Now enabled by default for all series composition functions, and reversion functions have been updated to favor Newton iteration for sufficiently large Algorithm cutoffs are always going to be suboptimal for some inputs, but hopefully this will improve performance in most cases. |
Adds$O(M(n) \log n)$ over generic rings using the Kinoshita-Li algorithm (by @EntropyIncreaser). See discussion in #1911.
_gr_poly_compose_series_kinoshita_li/gr_poly_compose_series_kinoshita_lifor power series composition inThe threshold where this starts to beat the Brent-Kung baby-step giant-step algorithm seems to be between 100 and 1000, obviously depending on the ring and input; see below for benchmarks. Note that this algorithm might be preferable to Brent-Kung even in cases where it is not faster, as it uses less memory asymptotically ($O(n \log n)$ vs $O(n^{1.5})$ coefficients).
I have not yet switched over any other series composition and reversion functions (e.g.
fmpz_poly_compose_series) to use this algorithm.This code was written using Claude Sonnet. Methodology: I first asked Claude to read the paper and the comments in #1911 and implement the algorithm following FLINT coding conventions, using
gr_poly_compose_series_brent_kungas a guideline for the API. I explicitly instructed it to represent the bivariate polynomials in flat format and to implement KS multiplication. The first draft looked OK structurally but failed the tests. When I asked Claude to debug, it asked me for a copy of @vneiger's Sage implementation. With this as a blueprint, Claude was able to fix the bugs. After that, I reviewed the code and prompted Claude to make a series improvements:arbwhose univariate polynomial multiplication is designed for typical univariate input and poorly tuned for coefficients in the scrambled KS ordernmod example
Timings in seconds to compute$f(g(x)) \bmod x^n$ with random $f$ and $g$ , $p = 2^{63} + 29$ . Speedup ratios are relative to Brent-Kung.
nmodcoefficients modExcellent speedup with evident quasilinear asymptotics!
fmpz example
Example timings with$f = g = \sum_{k \ge 0} F_k x^k$ (Fibonacci generating function):
fmpzcoefficients,fmpq example
Example with$f = g = \log(1+x)$ :
fmpqcoefficients,Note that the implementation has not been optimized specifically for fraction fields; one can presumably gain something by handling denominators less generically.
arb example
Example with$f = g = \log(1+x)$ :
arbcoefficients, prec = 128,Same polynomials with prec = 3333:
Same polynomials with prec = 333333:
Numerical stability with
arbcoefficients seems to be quite similar to that of Brent-Kung. I was worried in advance that Kinoshita-Li might be numerically unstable, but it turns out to be very well behaved! The only catch is that it isn't consistently faster than Brent-Kung at low precision (where numerically stable polynomial multiplication is not necessarily quasilinear), but it clearly wins when the precision grows.qqbar example
Example in a ring where we currently only use schoolbook multiplication:$f = g = \sum_{k \ge 1} (\sqrt{2}+k) x^k$ with
qqbarcoefficients.Multivariate example
Example with$f = g = \sum_k (k+t) x^k \in ((\mathbb{Z}/17 \mathbb{Z})[t])[x]$ .
gr_poly(nmod)coefficients,Here Kinoshita-Li seems to underperform. We might want some heuristic to detect such rings before using it by default in
gr_poly_compose_series.