-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On path with a known exact float, extract the double with a fast macro. #21072
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why, but Python 3 changed math.floor()
to return an int instead of a float - so the larger the absolute value, the more time it takes to create an ever-larger int object. So it's not surprising that the time depends on the magnitude of the argument. I suppose PyLong_FromDouble()
could be micro-optimized to exploit that, eventually, the trailing bits of the potentially giant int must all be 0.
>>> math.floor(3.14e32)
314000000000000005680822245916672
Thanks @rhettinger for the PR 🌮🎉.. I'm working now to backport this PR to: 3.9. |
Sorry @rhettinger, I had trouble checking out the |
Thanks @rhettinger for the PR 🌮🎉.. I'm working now to backport this PR to: 3.9. |
…o. (pythonGH-21072) (cherry picked from commit 930f451) Co-authored-by: Raymond Hettinger <rhettinger@users.noreply.github.com>
GH-21102 is a backport of this pull request to the 3.9 branch. |
Thanks @rhettinger for the PR 🌮🎉.. I'm working now to backport this PR to: 3.9. |
…o. (pythonGH-21072) (cherry picked from commit 930f451) Co-authored-by: Raymond Hettinger <rhettinger@users.noreply.github.com>
GH-22108 is a backport of this pull request to the 3.9 branch. |
This matches a similar optimisation done for math.floor in python#21072 Before: ``` λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=3.14' 'ceil(x)' 20000000 loops, best of 11: 13.3 nsec per loop λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=0.0' 'ceil(x)' 20000000 loops, best of 11: 13.3 nsec per loop λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=-3.14E32' 'ceil(x)' 10000000 loops, best of 11: 35.3 nsec per loop λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=-323452345.14' 'ceil(x)' 10000000 loops, best of 11: 21.8 nsec per loop ``` After: ``` λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=3.14' 'ceil(x)' 20000000 loops, best of 11: 11.8 nsec per loop λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=0.0' 'ceil(x)' 20000000 loops, best of 11: 11.7 nsec per loop λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=-3.14E32' 'ceil(x)' 10000000 loops, best of 11: 32.7 nsec per loop λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=-323452345.14' 'ceil(x)' 10000000 loops, best of 11: 20.1 nsec per loop ```
This matches a similar optimisation done for math.floor in python#21072 Before: ``` λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=3.14' 'ceil(x)' 20000000 loops, best of 11: 13.3 nsec per loop λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=0.0' 'ceil(x)' 20000000 loops, best of 11: 13.3 nsec per loop λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=-3.14E32' 'ceil(x)' 10000000 loops, best of 11: 35.3 nsec per loop λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=-323452345.14' 'ceil(x)' 10000000 loops, best of 11: 21.8 nsec per loop ``` After: ``` λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=3.14' 'ceil(x)' 20000000 loops, best of 11: 11.8 nsec per loop λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=0.0' 'ceil(x)' 20000000 loops, best of 11: 11.7 nsec per loop λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=-3.14E32' 'ceil(x)' 10000000 loops, best of 11: 32.7 nsec per loop λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=-323452345.14' 'ceil(x)' 10000000 loops, best of 11: 20.1 nsec per loop ```
) This matches a similar optimisation done for math.floor in python#21072
We're already testing for an exact float, so take advantage of that information and extract the double with the fast macro.
Baseline timings
Timings with the patch:
While the timings all show improvements, I don't understand why the timings for floor() also depend on the magnitude of the inputs.