Skip to content

v4.0

Compare
Choose a tag to compare
@r-devulap r-devulap released this 31 Oct 16:17
· 193 commits to main since this release
7559f70

v4.0 is a significant release with new features and improvements. AVX-512 sorting methods gain up to 2x perf improvements and we have added AVX2 sorting methods to support a wider range of x86 processors. In additional to using it as a header file library, x86-simd-sort can be installed as a library, and it provides API access to various sorting methods with automatic runtime dispatch to select the fastest version based on the processor. Here is a quick summary of all the changes:

  • Added AVX2 implementations of avx2_qsort, avx2_qselect and avx2_partial_qsort for 32-bit and 64-bit data types. When compared to std::sort, these are up to 12x faster for 32-bit data and up to 7x faster for 64-bit data.
  • x86-simd-sort can now be built and installed as a shared library. The library provides runtime dispatch and automatically picks the fastest version among AVX-512/AVX2/scalar depending on the processor it is run on. Starting with clearlinux v40270, you can install x86-simd-sort with swupd bundle-add x86-simd-sort.
  • Perf improvements to avx512_qsort: 2x speed up for 32-bit data, 1.5x speed up for 64-bit data and 1.25x speed up for 16-bit data.
  • Perf improvements to avx512_argsort and avx512_argselect intended to mitigate the effect of a vulnerability in gather instruction.

What's Changed

New Contributors

Full Changelog: v3.0...4.0.rc