Skip to content

bpo-41513: Improve speed and accuracy of math.hypot() #21803

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Aug 16, 2020

Conversation

rhettinger
Copy link
Contributor

@rhettinger rhettinger commented Aug 10, 2020

@rhettinger rhettinger added the performance Performance or resource usage label Aug 10, 2020
@rhettinger rhettinger changed the title bpo-41613: Improve speed and accuracy of math.hypot() bpo-41513: Improve speed and accuracy of math.hypot() Aug 10, 2020
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it requires to make tests less strict, it is potentially breaking change, and therefore it should be mentioned in What's New.

for (i=0 ; i < n ; i++) {
x = vec[i];
assert(Py_IS_FINITE(x) && fabs(x) <= max);
x *= scale;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is faster, x *= scale or x = ldexp(x, -max_e)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The test as written was over-specified. It should have been written this way from the start.

  • The x *= scale is faster than x = ldexp(x, -max_e). The former is a single, fast in-line instruction and the latter is an external library call.

Here's the generated code for the loop:

L284:
    movsd   (%r12,%rax,8), %xmm0
    addq    $1, %rax
    cmpq    %rax, %rbp
    mulsd   %xmm4, %xmm0           <-- x *= scale
    movapd  %xmm0, %xmm1
    mulsd   %xmm0, %xmm1           <-- x *= x
    movapd  %xmm2, %xmm0
    addsd   %xmm1, %xmm2            <-- csum += x
    subsd   %xmm2, %xmm0
    addsd   %xmm1, %xmm0
    addsd   %xmm0, %xmm3
    jg  L284
    subsd   %xmm

@rhettinger rhettinger merged commit fff3c28 into python:master Aug 16, 2020
shihai1991 pushed a commit to shihai1991/cpython that referenced this pull request Aug 20, 2020
xzy3 pushed a commit to xzy3/cpython that referenced this pull request Oct 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants