Add SIMD versions of scrambler and vector multiplication #361
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds x86 SIMD implementations of
scrambler
andvectorcf_mul
. Some consolidation of the configuration file which is ready for othervector*
module SIMD implementations.For now, I only implemented those two since it's what currently speeds up my application, given the modules I am utilizing.
Attached is a text file with results of the included benchmark where the performance improvement can be seen. For the
scrambler
module,packetizer
results are used to compare between implementations. For thevectorcf_mul
module,qdetector
is used. In the case ofqdetector
, the performance improvement is kind of negligible, since the vector multiplication is not where most of the time is spent. Thepacketizer
, however greatly benefits from thescrambler
SIMD implementation, showing about a 4x to 5x performance difference when comparing the portable version with the AVX2 version.As for AVX512, the performance increase is in the error margin, and sometimes shows a degradation. Kind of expected, given past experiences with it, and maybe that's why Intel killed it on consumer CPUs. Nevertheless it is there, shall anyone find it useful.
Benchmark ran on an Intel 11th gen 11950H CPU
liquid_perf.txt
As a sidenote, I have made some experiments with Manchester encoding using x86 SIMD instructions, which I think would fit in this library.
https://github.com/vankxr/manchester-simd