SIMD Benchmarks

Testing the relative performace of basic Matrix and Vector operations against their SIMD counterparts. Each operation's times are averaged over 10000 runs. 4x4 matrices or 4D Vectors of random single-precision floats are used. Seperate calculations performed for AVX 128-bit (XMM) and 256-bit (YMM) registers. The AVX2/FMA3 (128-bit) instruction set (fused multiply-add) requires an Intel Haswell CPU. All operations are fairly heavily optimized. SIMD matrix multiplication uses the linear combination method.

Tested on a 2014 rMBP with an Intel i5-4278u 2.6 GHz dual-core Haswell CPU.
OS: Windows 8.1 running on VMWare Fusion 7.
Compiled with MSVC++ 2012: x64 mode, /arch:AVX, /fp:Fast
Timing in CPU clock cycles via __rdtsc(). (Could also use the QueryPerformanceCounter API to control for clock frequency.)

Intel Intrinsics Guide: https://software.intel.com/sites/landingpage/IntrinsicsGuide/

####Matrix * Matrix

Reference: 262.00 cycles  
      SSE:  37.00 cycles  
  AVX_128:  
  AVX_256:  
AVX2/FMA3:  62.00 cycles

####Vector Normalization

Reference: 34.00 cycles  
      SSE: 55.00 cycles  
  AVX_128:  
  AVX_256:  
AVX2/FMA3:

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Timer		Timer
.gitignore		.gitignore
Matrix.cpp		Matrix.cpp
Matrix.h		Matrix.h
README.md		README.md
SIMD Benchmarks.sln		SIMD Benchmarks.sln
SIMD Benchmarks.vcxproj		SIMD Benchmarks.vcxproj
SIMD Benchmarks.vcxproj.filters		SIMD Benchmarks.vcxproj.filters
Vect4D.cpp		Vect4D.cpp
Vect4D.h		Vect4D.h
main.cpp		main.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SIMD Benchmarks

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SIMD Benchmarks

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages