ENH: umath: simplify pairwise sum #1

larsmans · 2013-10-15T17:28:18Z

Simple recursive implementation with unrolled base case. Also fixed signed/unsigned issues by making all indices signed.

Added a unit test based on your example.

Performance seems unchanged: still about a third faster than before.

larsmans · 2013-10-15T17:29:21Z

numpy/core/src/umath/loops.c.src

Note that I'm adding to the first three members of r here, instead of r[0].

juliantaylor · 2013-10-16T16:45:40Z

thanks it is a little simpler. I wanted to have a look at this again after 1.8 is out. The big issue is the blocking done by the reduction iterator which reduces the usefulness of this change.

larsmans · 2013-10-16T16:52:37Z

You mean np.add.reduce? I do have a lot of code that calls plain old np.sum and could do with a speed boost :)

But take your time.

juliantaylor · 2013-10-16T16:53:59Z

np.sum goes over np.add.reduce too.
for speed I also plan to vectorize this (which was very easy with the iterative code, need to check how to fit it into the recursive one).

larsmans · 2013-10-16T16:58:02Z

Ok. The NumPy 1.7.1 I benchmarked against is the Ubuntu binary package, and I'm not sure if that uses SIMD (probably not). I can check if I can get the SIMD loop into the base case of this algorithm.

larsmans · 2013-10-16T20:09:09Z

I just checked the assembler output for this, and my GCC 4.7.3 is able to vectorize the loop itself at -O2 on x86-64. The unrolling is enough of a hint for it to do that. (I'm surprised as I though it didn't do that except at -O3.)

juliantaylor · 2013-10-16T20:23:27Z

gcc will only vectorize on -O3 or with -ftree-vectorize. But ti doesn't want to vectorize it for me, 4.7 and O3:

  5,96 │ 80:   add    $0x4,%rax                                                                    ▒
  6,00 │       addsd  (%r8),%xmm3                                                                  ▒
  6,11 │       add    %r10,%r8                                                                     ▒
  6,11 │       addsd  (%rcx),%xmm2                                                                 ▒
  6,04 │       mov    %rax,%rsi                                                                    ▒
  5,70 │       addsd  (%rcx,%r9,1),%xmm1                                                           ▒
  9,83 │       addsd  (%rcx,%r9,2),%xmm0                                                           ▒
  7,21 │       add    %r10,%rcx                                                                    ▒
  5,39 │       cmp    %r11,%rax                                                                    ▒
  6,85 │     ↑ jb     80                                                                           ▒

only scalar adds :(

larsmans · 2013-10-17T08:13:46Z

Ah, I had mistaken the addsd for a vector op since it's an SSE instruction. Must brush up my asm skills.

juliantaylor · 2013-10-17T09:07:08Z

it can probably vectorize it if you set -funsafe-math-operations, gcc is is very conservative about not changing the floating point semantics (which reordering and vectorizing does)

juliantaylor · 2013-12-02T20:06:43Z

added the commit to the original PR, thanks
note I fixed a stride bug in the recursive call.

Adds a regression test that demonstrates the issue.

BUG: non-uint-aligned arrays were counted as uint-aligned

larsmans reviewed Oct 15, 2013
View reviewed changes

numpy/core/src/umath/loops.c.src Outdated

Copy link

Author

larsmans Oct 15, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I'm adding to the first three members of r here, instead of r[0].

larsmans mentioned this pull request Oct 16, 2013

ENH: implement pairwise summation numpy/numpy#3685

Merged

juliantaylor closed this Dec 2, 2013

juliantaylor pushed a commit that referenced this pull request Nov 28, 2016

Merge pull request #1 from embray/asarray

f2c818a

Adds a regression test that demonstrates the issue.

juliantaylor pushed a commit that referenced this pull request May 12, 2019

Merge pull request #1 from ahaldane/mattip_align-longdouble

617dbcb

BUG: non-uint-aligned arrays were counted as uint-aligned

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ENH: umath: simplify pairwise sum #1

ENH: umath: simplify pairwise sum #1

Uh oh!

larsmans commented Oct 15, 2013

Uh oh!

larsmans Oct 15, 2013

Uh oh!

juliantaylor commented Oct 16, 2013

Uh oh!

larsmans commented Oct 16, 2013

Uh oh!

juliantaylor commented Oct 16, 2013

Uh oh!

larsmans commented Oct 16, 2013

Uh oh!

larsmans commented Oct 16, 2013

Uh oh!

juliantaylor commented Oct 16, 2013

Uh oh!

larsmans commented Oct 17, 2013

Uh oh!

juliantaylor commented Oct 17, 2013

Uh oh!

juliantaylor commented Dec 2, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ENH: umath: simplify pairwise sum #1

ENH: umath: simplify pairwise sum #1

Uh oh!

Conversation

larsmans commented Oct 15, 2013

Uh oh!

larsmans Oct 15, 2013

Choose a reason for hiding this comment

Uh oh!

juliantaylor commented Oct 16, 2013

Uh oh!

larsmans commented Oct 16, 2013

Uh oh!

juliantaylor commented Oct 16, 2013

Uh oh!

larsmans commented Oct 16, 2013

Uh oh!

larsmans commented Oct 16, 2013

Uh oh!

juliantaylor commented Oct 16, 2013

Uh oh!

larsmans commented Oct 17, 2013

Uh oh!

juliantaylor commented Oct 17, 2013

Uh oh!

juliantaylor commented Dec 2, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants