-
-
Notifications
You must be signed in to change notification settings - Fork 33.5k
gh-90716: Use subquadratic algorithms for int(string) #97550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
As identified in pythongh-95778 the algorithm used for decimal to binary conversion by int(string) has quadratic complexity. Following on from the reafctor of PyLong_FromString in pythongh-96808 this commit implements a subquadratic algorithm for parsing strings from decimal and other bases leveraging the subquadratic complexity of integer multiplication.
|
@oscarbenjamin, please look at Neil's gh-96673. There's no need to have two PRs aiming at the same thing. The str->int there is also limited by Karatsuba, but takes special advantage of base 10. The int->str there takes advantage of the Note that Neil's PR doesn't try to improve the speed of str->int for non-power-of-2 bases other than 10. Nobody cares 😉. But, if you care, I suggest building on Neil's PR's approach, which leaves this miserable code in Python instead of adding mountains of micro-level C fiddling. For inputs large enough for this code to pay off at all, essentially all the time is spent in CPython's int |
Does the special advantage exceed anything in this PR? |
Okay I see where you're going with this. I'll take a closer look at what the decimal module has... |
|
One thing I want to note is that saying that the time complexity is If you do the analysis carefully, then you will notice that if you assume |
You need to decide for yourself. At least 5 people have contributed to the code in his PR so far, and that's what I'm encouraging you to join in on. I don't have the time or the inclination to stare at piles of C code here 😉. The str->int in the other PR, for example, doesn't need to allocate memory for lists. It does need memory to store a dict with The other PR's algorithms are recursive, which allows them to be compact and elegant. Since we spend almost all cycles in |
My timings don't show any benefit resulting from taking "special advantage of base 10". Both PRs seem to be about the same for large inputs and gh-96673 seems to be slower than this PR for inputs of around 1000-2000 digits: def randstr(N):
return ''.join(random.choices('123456789', k=N))
Fair enough but I don't think that the code here is particularly complicated.
There is also the question of memory usage. I think that this PR could be made to have a peek memory usage not much bigger than the input string (as mentioned in the code comments). I haven't analysed the approach in gh-96673 in detail but from a quick look I expect that the memory overhead is a few times greater. |
Just a quick question (I haven't read any actual code). Is there any reason why Cython can't be used to get the C-like performance whilst coding in ~Python (i.e. without having to worry about the various [Edit] The licensing seems fine: from cython/COPYING.txt:
|
If I'm not mistaken, it's not intended to be used for such short strings: Lines 2696 to 2705 in 59c81da
|
Yes, that's what I thought. It's probably just a blip somehow in the timing script I used (I remember that I did repeat the timings because it seemed odd but got the same results again). The script was not really intended to target small run times. |
|
Closing after gh-96673 was merged. |
As identified in gh-95778 the algorithm used for decimal to binary conversion by int(string) has quadratic complexity. Following on from the refactor of PyLong_FromString in gh-96808 this commit implements a subquadratic algorithm for parsing strings from decimal and other bases leveraging the subquadratic complexity of integer multiplication.
This PR presents the algorithm discussed here:
https://discuss.python.org/t/int-str-conversions-broken-in-latest-python-bugfix-releases/18889/14?u=oscarbenjamin
Roughly this is algorithm 1.25 (FastIntegerInput) for base conversion as described in Richard P. Brent and Paul Zimmermann, Modern Computer Arithmetic:
https://members.loria.fr/PZimmermann/mca/mca-cup-0.5.9.pdf
The algorithm here delegates the computational work of base conversion to integer multiplication so that the cost of base conversion is
O(M(n)*log(n))whereM(n)is the cost of multiplying n bit integers. CPython's implementation of multiplication currently tops out at the Karatsuba algorithm meaning that for large integersM(n) ~ n**1.58but any improvements in multiplication would also apply to the routine added here. For now though what this means is that the complexity of parsing decimal or other base strings is such that a doubling of the input size means a tripling of the run time (rather than a quadrupling for the quadratic case). For example thetimes.pyscript shown below gives:For comparison with main we have:
The CVE referenced by gh-95778 is:
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-10735
and has the text:
For comparison with this PR:
That shows a slowdown of 15x when increasing the input size by 10x whereas the CVE refers to a 100x slowdown. The actual slowdown that I measure depends on the precise size but for very large inputs the expectation should be (as shown in the graph below) that the slowdown for a 10x bigger input is roughly:
It should be understood though that these relative slowdowns are compounded when further increasing the input size: for large enough inputs this PR can be arbitrarily many times faster than main but also the slowdown can be arbitrarily many times worse than linear.
I have not subjected this to any significant testing or benchmarking yet (any help with testing is obviously appreciated). Also there are probably no examples in the existing test suite that would exercise the codepaths introduced here because of the high value of the threshold for the algorithm. There are two key parameters that can probably be optimised but I've only tried a few values to gauge roughly what the best values might be (and only on one system).
This script can give an idea of what this PR could mean for performance:
This script can plot the results:
With those scripts you can do:
Then you should see this plot showing the different asymptotic complexity for this PR as compared to the quadratic algorithm used in main:


There are a few factors that lead to GMP having better performance than this PR but most notably for larger integers is the used of FFT-based multiplication giving
M(n) ~ n*log(n). In any case this PR has much better performance than main as is more dramatically illustrated in a plot using a linear scale: