Skip to content

Some tests on doing vector operations serially and in parallel on my x86_64 CPU with AVX2 and in parallel on my GPU with a compute shader.

License

Notifications You must be signed in to change notification settings

pin-lee/vector-tests

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Plan

Compare the performance of the following when adding sets of numbers:

  • Serial addition on the CPU.
  • Vectorized addition on the CPU with SIMD.
  • Multithreaded serial addition on the CPU with pthread.
  • Multithreaded vectorized addition on the CPU with pthread and SIMD.
  • Parallelized vector addition on the GPU via compute shader.

Some notes

  • In order to keep threaded tests fair, the thread pool was not reused. It is intentionally allowed to fall out of scope in order to maintain counting the overhead of initializing it.
  • All functions have inline directives as to obtain inlined performance while maintaining a semblance of code-cleanliness.

On simd_add

We're on a 64-bit processor, so we can actually fit 4 4-byte ints in each register.
So first we want to allocate our registers, moving 4 x_pos into one and 4 x_vel to the other. \ Then we want to add the registers together with the _mm_add_epi64 instruction.
We'll move the output into memory, overwriting the previous data with _mm_store
We'll be processing 4 at a time, so we want to move by 4.
This should, theoretically, result in a 4x performance increase. \

Resources

About

Some tests on doing vector operations serially and in parallel on my x86_64 CPU with AVX2 and in parallel on my GPU with a compute shader.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published