GitHub - pin-lee/vector-tests: Some tests on doing vector operations serially and in parallel on my x86_64 CPU with AVX2 and in parallel on my GPU with a compute shader.

The Plan

Compare the performance of the following when adding sets of numbers:

Serial addition on the CPU.
Vectorized addition on the CPU with SIMD.
Multithreaded serial addition on the CPU with pthread.
Multithreaded vectorized addition on the CPU with pthread and SIMD.
Parallelized vector addition on the GPU via compute shader.

Some notes

In order to keep threaded tests fair, the thread pool was not reused. It is intentionally allowed to fall out of scope in order to maintain counting the overhead of initializing it.
All functions have inline directives as to obtain inlined performance while maintaining a semblance of code-cleanliness.

On `simd_add`

We're on a 64-bit processor, so we can actually fit 4 4-byte ints in each register.
So first we want to allocate our registers, moving 4 x_pos into one and 4 x_vel to the other. \ Then we want to add the registers together with the _mm_add_epi64 instruction.
We'll move the output into memory, overwriting the previous data with _mm_store
We'll be processing 4 at a time, so we want to move by 4.
This should, theoretically, result in a 4x performance increase. \

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
main.cpp		main.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

CMakeLists.txt

CMakeLists.txt

LICENSE

LICENSE

README.md

README.md

main.cpp

main.cpp

Repository files navigation

The Plan

Some notes

On `simd_add`

Resources

About

Releases

Packages

Languages

License

pin-lee/vector-tests

Folders and files

Latest commit

History

Repository files navigation

The Plan

Some notes

On simd_add

Resources

About

Resources

License

Stars

Watchers

Forks

Languages

On `simd_add`