-
-
Notifications
You must be signed in to change notification settings - Fork 32.3k
bpo-41513: Improve speed and accuracy of math.hypot() #21803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it requires to make tests less strict, it is potentially breaking change, and therefore it should be mentioned in What's New.
for (i=0 ; i < n ; i++) { | ||
x = vec[i]; | ||
assert(Py_IS_FINITE(x) && fabs(x) <= max); | ||
x *= scale; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is faster, x *= scale
or x = ldexp(x, -max_e)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
The test as written was over-specified. It should have been written this way from the start.
-
The
x *= scale
is faster thanx = ldexp(x, -max_e)
. The former is a single, fast in-line instruction and the latter is an external library call.
Here's the generated code for the loop:
L284:
movsd (%r12,%rax,8), %xmm0
addq $1, %rax
cmpq %rax, %rbp
mulsd %xmm4, %xmm0 <-- x *= scale
movapd %xmm0, %xmm1
mulsd %xmm0, %xmm1 <-- x *= x
movapd %xmm2, %xmm0
addsd %xmm1, %xmm2 <-- csum += x
subsd %xmm2, %xmm0
addsd %xmm1, %xmm0
addsd %xmm0, %xmm3
jg L284
subsd %xmm
https://bugs.python.org/issue41513