-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
Optimize floor division for ints #70477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The attached patch optimizes floor division for ints. ### spectral_norm ### -m timeit -s "x=22331" "x//2;x//3;x//4;x//5;x//6;x//7;x//8;x/99;x//100;" with patch: 0.298
without: 0.515 |
Added comments on Rietveld. |
Serhiy, Victor, thanks for the review. Attaching an updated version of the patch. |
This change looks related to the issue bpo-21955. IMHO we should take the same decision. I mean, maybe it's better to implement the fast-path only in ceval.c? Or maybe in ceval.c and longobject.c? |
There is no drastic difference on where you implement the fast path. I'd implement all specializations/optimizations in longobject.c and optimize ceval to call slots directly. That way, the implact on ceval performance would be minimal. |
Also, every other operation for longs (except %, for which I created issue bpo-26315) is optimized for single digit longs. This optimization is also important for users of operator.floordiv etc. Even if we decide to provide a fast path in ceval, it's going to be another matter. |
A slightly neater way to compute the result in the case that the signs differ is div = ~((left - 1) / right) That saves the extra multiplication and equality check. |
... though on second thoughts, it's probably better to spell that
to avoid compiler warnings about bit operations on signed types. A good compiler should be able to optimise |
STINNER Victor:
Yury Selivanov:
Oh wait, I was confused by my own patch for bpo-21955 inlining int operations in ceval.c. Since recent benchmarks showed slow-down when ceval.c is modified, I think that it's ok to modify longobject.c rather than ceval.c (and maybe only longobject.c, but let's discuss that in issue bpo-21955). |
Since the patch is changed (and may be changed further if accept Mark's suggestion), benchmarks results should be updated. Is it worth to move the optimization inside l_divmod? Will this speed up or slow down other operations that use l_divmod? |
Attaching a new patch -- big thanks to Mark and Serhiy.
The updated code works slightly faster - ~0.285 usec vs ~0.3 usec. |
Attaching a new patch -- fast_divmod.patch It combines patches for this issue and issue bpo-26315. Individual timeit benchmarks work as fast, but ** op becomes faster: -m timeit -s "x=223" "x**2;x**-2;x**2;x**-3;x**3;x**-3;x**4.5;x**-4.5" |
Just for the record, there's a less-branchy analog of the floor division code I suggested for modulo. In Python-land, it looks like this: def propermod(a, b):
# mimic the C setup
assert a != 0 and b != 0
left = abs(a)
size_a = -1 if a < 0 else 1
right = abs(b)
size_b = -1 if b < 0 else 1
# Compute mod: only one branch needed.
mod = left % right if size_a == size_b else right - 1 - (left - 1) % right
return mod * size_b
# Verify that we get the same results as the regular mod.
for n in range(-100, 100):
if n == 0:
continue
for d in range(-100, 100):
if d == 0:
continue
assert propermod(n, d) == n % d It may well not have any significant effect here, though. |
This works, Mark! Again, the difference in performance is very subtle, but the code is more compact. -m timeit -s "x=22331" "x//2;x//-3;x//4;x//5;x//-6;x//7;x//8;x//-99;x//100;" -m timeit -s "x=22331" "x%2;x%3;x%-4;x%5;x%6;x%-7;x%8;x%99;x%-100;" -m timeit "divmod(100,20);divmod(7,3);divmod(121,99);divmod(123121,123);divmod(-1000,7);divmod(23423,-111)" I'm in favour of fast_divmod_2.patch for solving both this issue and issue bpo-26315. Serhiy, what do you think? |
A couple more observations:
|
Attaching an updated patch.
Thanks. Not sure how this happened :(
Done.
Tried it, the difference is very small. For modulo division it's ~0.225 usec vs ~0.23 usec for [-m timeit -s "x=22331" "x%2;x%3;x%4;x%5;x%6;x%7;x%8;x%99;x%100;"] |
Thanks for the updates! No further comments from me - the patch looks good as far as I'm concerned. |
New changeset 37bacf3fa1f5 by Yury Selivanov in branch 'default': |
Committed. Thank you Serhiy, Mark and Victor for helping with the patch! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: