SIMD optimization for AABB::intersects_segment #2351

caospacifico · 2015-08-09T23:04:56Z

"AABB::intersects_segment" is optimized with Intel SSE instructions and runs ~2X as fast as the original.

Running:
valgrind --tool=callgrind --toggle-collect=TestAABB::test_aabb* --cache-sim=yes --branch-sim=yes ./bin/godot.x11.64 -test aabb

then:
kcachegrind callgrind.out.*

Normal binary output shows a cycle estimation of 971474:

SIMD optimized binary shows a cycle estimation of 515142:

NOTE: Only optimized for GCC, Intel SSE, and single precision floating point! It will compile on other platforms but won't be optimized.

reduz · 2015-08-09T23:14:02Z

this is pretty cool, I was planning to do a whole overhaul of the math
stuff using simd after next version, so will most likely use this PR as a
base

On Sun, Aug 9, 2015 at 8:05 PM, jjdicharry notifications@github.com wrote:

"AABB::intersects_segment" is optimized with Intel SSE instructions and
runs ~2X as fast as the original.

Running:
valgrind --tool=callgrind --toggle-collect=TestAABB::test_aabb*
--cache-sim=yes --branch-sim=yes ./bin/godot.x11.64 -test aabb

then:
kcachegrind callgrind.out.*

Normal binary output shows a cycle estimation of 971474:

[image: kcachgrind_aabb_nosimd]
https://cloud.githubusercontent.com/assets/12058304/9157352/4c2e72e8-3eaf-11e5-9e41-9d7ed4dfa429.jpg

SIMD optimized binary shows a cycle estimation of 515142:

[image: kcachegrind_aabb_simd]
https://cloud.githubusercontent.com/assets/12058304/9157365/b08a69fe-3eaf-11e5-913a-e0c420ead69d.png

NOTE: Only optimized for GCC, Intel SSE, and single precision floating

point! It will compile on other platforms but won't be optimized.

You can view, comment on, or merge this pull request online at:

#2351
Commit Summary

Fixed resource binary reading for quad vector3.

Optimizing Vector3 and add test.

Added vector optimization for AABB::intersects_segment.

Will handle optimization for double precision later.

File Changes

A bin/tests/test_aabb.cpp
https://github.com/okamstudio/godot/pull/2351/files#diff-0 (94)

M bin/tests/test_main.cpp
https://github.com/okamstudio/godot/pull/2351/files#diff-1 (6)

M core/io/resource_format_binary.cpp
https://github.com/okamstudio/godot/pull/2351/files#diff-2 (40)

M core/math/aabb.cpp
https://github.com/okamstudio/godot/pull/2351/files#diff-3 (87)

A core/math/vector3_simd.cpp
https://github.com/okamstudio/godot/pull/2351/files#diff-4 (187)

A core/math/vector3_simd.h
https://github.com/okamstudio/godot/pull/2351/files#diff-5 (530)

Patch Links:

https://github.com/okamstudio/godot/pull/2351.patch

https://github.com/okamstudio/godot/pull/2351.diff

—
Reply to this email directly or view it on GitHub
#2351.

caospacifico · 2015-08-09T23:30:38Z

I was trying to make the Vector3 class SIMD optimized but that created more problems because Intel requires SSE operations to be aligned at addresses that are multiples of 16. So I created a Vector3_simd class that I only use in AABB::intersects_segment where I can control its alignment.

reduz · 2015-08-09T23:34:08Z

It should be ok to add an extra float, I think I wrote all the code in the
engine with the expectation that an extra one would eventually be added.

On Sun, Aug 9, 2015 at 8:30 PM, jjdicharry notifications@github.com wrote:

I was trying to make the Vector3 class SIMD optimized but that created
more problems because Intel requires SSE operations to be aligned at
address that are multiples of 16. So I created a Vector3_simd class that I
only use in AABB::intersects_segment where I can control its alignment.

—
Reply to this email directly or view it on GitHub
#2351 (comment).

caospacifico · 2015-08-09T23:56:07Z

Thanks for thinking my contribution is cool! It did speed things up quite a bit! Also, I know about the extra float in Vector3. The problem is more complicated than that though. You have "new" and "malloc" which aren't guaranteed to be aligned. Vector3 could be in a struct or class, i.e. "struct {int a, Vector3 v};" that will throw the alignment off by 4 bytes. These are issues that are not impossible to fix but there are a lot of fixes to be made. Another thing is that other contributors in the future would have to consider the alignment when they want to use these SIMD objects. I just wanted to commit to the project without introducing a lot of bugs!

caospacifico · 2015-08-11T00:45:14Z

I may have a solution to the alignment issue. The SSE "movups" instruction
can be used to move unaligned data to "xmm" registers which can be used on
each math operation. This way the SSE instructions can be used without
issues. But I don't know if adding "movups" instructions will make every
math operation faster.

On Sun, Aug 9, 2015 at 4:34 PM, Juan Linietsky notifications@github.com
wrote:

It should be ok to add an extra float, I think I wrote all the code in the
engine with the expectation that an extra one would eventually be added.

On Sun, Aug 9, 2015 at 8:30 PM, jjdicharry notifications@github.com
wrote:

I was trying to make the Vector3 class SIMD optimized but that created
more problems because Intel requires SSE operations to be aligned at
address that are multiples of 16. So I created a Vector3_simd class that
I
only use in AABB::intersects_segment where I can control its alignment.

—
Reply to this email directly or view it on GitHub
#2351 (comment).

—
Reply to this email directly or view it on GitHub
#2351 (comment).

caospacifico closed this Aug 14, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

SIMD optimization for AABB::intersects_segment #2351

SIMD optimization for AABB::intersects_segment #2351

Uh oh!

caospacifico commented Aug 9, 2015

Uh oh!

reduz commented Aug 9, 2015

point! It will compile on other platforms but won't be optimized.

Uh oh!

caospacifico commented Aug 9, 2015

Uh oh!

reduz commented Aug 9, 2015

Uh oh!

caospacifico commented Aug 9, 2015

Uh oh!

caospacifico commented Aug 11, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

SIMD optimization for AABB::intersects_segment #2351

SIMD optimization for AABB::intersects_segment #2351

Uh oh!

Conversation

caospacifico commented Aug 9, 2015

Uh oh!

reduz commented Aug 9, 2015

point! It will compile on other platforms but won't be optimized.

Uh oh!

caospacifico commented Aug 9, 2015

Uh oh!

reduz commented Aug 9, 2015

Uh oh!

caospacifico commented Aug 9, 2015

Uh oh!

caospacifico commented Aug 11, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants