-
-
Notifications
You must be signed in to change notification settings - Fork 31.6k
Improve performance of integer exponentiation #88542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Naively, I assumed that The performance of I have found the ratio of
for various X.Y: 2.4: 1.1 # ratio of time for x**2 / time for x*x In the 2.x series, performance was quite close. In 3.x, the ratio has increased significantly. Either integer multiplication has gotten much faster, or exponentiation much slower, or both. Shockingly (to me at least), an exponent of 1 is an order of magnitude slower than an multiplicand of 1: 2.7: 1.3 # ratio of time for x**1 / time for x*1 Even an exponent of 10 is a little slower than repeated multiplication in 3.9:
It would be nice if we could improve the performance of exponentiation. (OS: Linux) |
Is it because exponentiation becomes slower or because multiplication becomes faster? What are absolute numbers? |
I can't reproduce this on my Mac laptop (using Python builds from MacPorts). Numbers for both x**2 and x*x are fairly stable across Python 3.2 to Python 3.10. There's some variation, but nothing close to the same extent that Steven is seeing. Here are my raw numbers: lovelace:cpython mdickinson$ /opt/local/bin/python3.2 -m timeit -s "x=115" "x*x" |
Under the released 3.9.5 for 64-bit Win10, raising to the power 2 is clearly much slower than multiplying directly: C:\Windows\System32>py -3 -m timeit -s "x=151" "x*x" C:\Windows\System32>py -3 -m timeit -s "x=151" "x**2" Since the multiplication itself is cheap, overheads must account for it. Offhand, looks to me like the for (i = Py_SIZE(b) - 1; i >= 0; --i) {
digit bi = b->ob_digit[i];
for (j = (digit)1 << (PyLong_SHIFT-1); j != 0; j >>= 1) {
MULT(z, z, z);
if (bi & j)
MULT(z, a, z);
}
} Python ints on a 64-bit box are stored internally in base 2**30 (PyLong_SHIFT is 30). z starts life at 1. The first 28 trips through the loop are chewing up the 28 leading zero bits in exponent 2, so MULT(z, z, z) multiplies 1 by 1 to get 1, 28 times. Then again on the 29th iteration, but now "bi & j" is finally true (we finally found the leading one bit in exponent 2), so z is replaced by 1 times the base = the base. On the final, 30th, iteration, MULT(z, z, z) replaces z with its square, and we're done. It would probably be worthwhile to add code special-casing the leading Python "digit" of the exponent, fast-forwarding without any multiplies to the leading one bit, and setting z directly to the base then. |
This is a stab at reducing overhead for small exponents, along the lines I sketched: Unfortunately, I've been unable to convince BPO and GitHub to recognize that the PR is related to this report. Did something basic change? Incidentally, at first this change triggered rare shutdown deaths due to negative refcounts, in the collection of small integer objects. That was a head-scratcher! Turns that was, I believe, due to a "temp = NULL" line missing from earlier code introduced to implement powers of modular inverses. |
Closing this now because the pull request did, I believe, all that can be done at the function level. Exponents of 1 and 2 are well within a factor of 2 of repeated multiplication now, and it's essentially a tie at exponent 3 now. Above that, pow() wins now. On my box. Doing better would require a smarter compiler, which, e.g., knew that
|
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: