Skip to content

Commit

Permalink
bpo-46504: faster code for trial quotient in x_divrem() (GH-30856)
Browse files Browse the repository at this point in the history
* bpo-46504: faster code for trial quotient in x_divrem()

This brings x_divrem() back into synch with x_divrem1(), which was changed
in bpo-46406 to generate faster code to find machine-word division
quotients and remainders. Modern processors compute both with a single
machine instruction, but convincing C to exploit that requires writing
_less_ "clever" C code.
  • Loading branch information
tim-one committed Jan 25, 2022
1 parent b18fd54 commit 7c26472
Showing 1 changed file with 8 additions and 1 deletion.
9 changes: 8 additions & 1 deletion Objects/longobject.c
Expand Up @@ -2767,8 +2767,15 @@ x_divrem(PyLongObject *v1, PyLongObject *w1, PyLongObject **prem)
vtop = vk[size_w];
assert(vtop <= wm1);
vv = ((twodigits)vtop << PyLong_SHIFT) | vk[size_w-1];
/* The code used to compute the remainder via
* r = (digit)(vv - (twodigits)wm1 * q);
* and compilers generally generated code to do the * and -.
* But modern processors generally compute q and r with a single
* instruction, and modern optimizing compilers exploit that if we
* _don't_ try to optimize it.
*/
q = (digit)(vv / wm1);
r = (digit)(vv - (twodigits)wm1 * q); /* r = vv % wm1 */
r = (digit)(vv % wm1);
while ((twodigits)wm2 * q > (((twodigits)r << PyLong_SHIFT)
| vk[size_w-2])) {
--q;
Expand Down

0 comments on commit 7c26472

Please sign in to comment.