Stream functions should be optimized for SSE/ARM-NEON #25

walbourn · 2016-06-09T18:23:02Z

The existing Stream implementations in DirectXMath for SSE/SSE2 are basic loops that use the non-stream version.

XMVector2TransformStream
XMVector2TransformCoordStream
XMVector2TransformNormalStream
XMVector3TransformStream
XMVector3TransformCoordStream
XMVector3TransformNormalStream
XMVector3ProjectStream
XMVector3UnprojectStream
XMVector4TransformStream
XMPlaneTransformStream

The text was updated successfully, but these errors were encountered:

walbourn · 2016-06-09T18:23:55Z

Fixed for DirectXMath 3.04

walbourn · 2016-06-09T18:24:58Z

For SSE, these make use of _mm_stream_ps where possible to use 'non-temporal' stores to avoid polluting the cache. You can disable this behavior by using:

#define _XM_NO_MOVNT_

walbourn · 2016-06-09T18:25:50Z

The original implementation failed to include a call to _mm_sfence at the end of the Stream methods. This was fixed in DirectXMath 3.08.

walbourn · 2016-06-09T18:32:50Z

Additional optimizations made in DirectXMath 3.05

walbourn added enhancement optimization and removed enhancement labels Jun 9, 2016

walbourn closed this as completed Jun 9, 2016

walbourn self-assigned this Jun 9, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream functions should be optimized for SSE/ARM-NEON #25

Stream functions should be optimized for SSE/ARM-NEON #25

walbourn commented Jun 9, 2016

walbourn commented Jun 9, 2016

walbourn commented Jun 9, 2016

walbourn commented Jun 9, 2016

walbourn commented Jun 9, 2016

Stream functions should be optimized for SSE/ARM-NEON #25

Stream functions should be optimized for SSE/ARM-NEON #25

Comments

walbourn commented Jun 9, 2016

walbourn commented Jun 9, 2016

walbourn commented Jun 9, 2016

walbourn commented Jun 9, 2016

walbourn commented Jun 9, 2016