Skip to content

Releases: kimwalisch/primesieve

primesieve-12.5

25 Oct 16:37
Compare
Choose a tag to compare

This release improves the thread load balancing on CPUs with a large number of CPU cores. The worker threads now process smaller sieve intervals which improves the performance of short computations ≀10 seconds. On a 4th Gen AMD EPYC 9R14 CPU with 192 threads counting the primes up to 10^12 now runs 10% faster (in 1.187 secs) and counting the primes up to 10^11 runs 70% faster (in 0.115 secs).

ChangeLog

  • ParallelSieve.cpp: Tune thread load balancing.

primesieve-12.4

01 Aug 19:13
Compare
Choose a tag to compare

This is a maintenance release, the C/C++ API and ABI are fully backwards compatible with primesieve-12.*

ChangeLog

  • Move x86 CPUID code from cpuid.hpp to src/x86/cpuid.cpp.
  • multiarch_x86_popcnt.cmake: Detect x86 POPCNT support.
  • CMakeLists.txt: Use CMake list for all compile time definitions.
  • CMakeLists.txt: Use CMake list for all link libraries.

primesieve-12.3

18 Apr 17:50
Compare
Choose a tag to compare

This release adds runtime dispatching to AVX512 (for x64 CPUs that support it) for MinGW. For x64 CPUs, AVX512 runtime dispatching is now enabled by default when compiling using GCC and Clang on all operating systems.

  • Improve Windows multiarch support (now works with MinGW64).
  • Add runtime POPCNT detection using CPUID for x86 CPUs.
  • Improve GCC/Clang multiarch preprocessor logic.
  • CMakeLists.txt: Remove POPCNT/BMI check for x86 CPUs.

primesieve-12.1

10 Mar 10:22
Compare
Choose a tag to compare

This is a new maintenance release, it is fully backwards compatible with the previous release.

  • CMakeLists.txt: Fix undefined reference to pthread_create #146.
  • test/Riemann_R.cpp: Fix musl libc issue #147.
  • src/app/test.cpp: Fix -ffast-math failure.
  • test/count_primes2.cpp: Fix -ffast-math failure.
  • PrimeSieve.cpp: Improve status output.

primesieve-12.0

19 Feb 14:46
Compare
Choose a tag to compare

The C/C++ API and ABI of primesieve-12.0 are fully backwards compatible with primesieve-11.*

The stress test functionality is the main new feature of primesieve-12.0, it can be launched using the --stress-test[=MODE] option of the primesieve command-line application. The stress test option supports two modes: CPU (default) or RAM. The CPU mode uses little memory (< 5 MiB per thread) and puts the highest load on the CPU. The RAM mode uses much more memory (each thread uses about 1.16 GiB) than the CPU mode, but the CPU usually won't get as hot. Due to primesieve's function multi-versioning support, on x64 CPUs the stress test will run an AVX512 algorithm if your CPU supports it.

  • stressTest.cpp: New -S[=MODE] and --stress-test[=MODE] command-line options.
  • RiemannR.cpp: Faster Riemann R function implementation #144.
  • CmdOptions.cpp: New -R and --RiemannR command line options.
  • CmdOptions.cpp: New --RiemannR-inverse command line option.
  • CmdOptions.cpp: Add new --timeout option for stress testing.
  • main.cpp: Improve command-line option handling.

primesieve-11.2

10 Jan 17:13
Compare
Choose a tag to compare

This is a new maintenance release, it is fully backwards compatible with the previous release. This release contains one CMake bug fix, documentation improvements, tests have been ported to GitHub Actions and the nth prime code has been cleaned up.

  • nthPrime.cpp: Rewritten using more accurate nth prime approximation.
  • nthPrimeApprox.cpp: Added logarithmic integral and Riemann R function implementations.
  • cmake/libatomic.cmake: Fix failed to find libatomic #141.
  • .github/workflows/ci.yml: Port AppVeyor CI tests to GitHub Actions.
  • doc/C_API.md: Fix off by 1 error in OpenMP example #137.
  • doc/CPP_API.md: Fix off by 1 error in OpenMP example #137.
  • Vector.hpp: Rename pod_vector to Vector and pod_array to Array.
  • iterator.h: Improve documentation.
  • iterator.hpp: Improve documentation.
  • C_API.md: Add SIMD (vectorization) section.
  • CPP_API.md: Add SIMD (vectorization) section.
  • README.md: Add C & C++ API badges.

Thanks to @sethtroisi and Sven S. for being primesieve sponsors in this release cycle!

primesieve-11.1

13 May 07:47
Compare
Choose a tag to compare

When primesieve is distributed via distro package managers, it is often not compiled using the highest optimization level -O3. Because of this primesieve's pre-sieving algorithm was not auto-vectorized in many cases. As a workaround for this issue I have now manually vectorized the pre-sieving algorithm for x64 CPUs (using portable SSE2) and for ARM64 CPUs (using portable ARM NEON). This can improve performance by up to 40%.

  • PreSieve.cpp: Vectorize loop using x64 SSE2 & ARM NEON.
  • popcount.cpp: Add POPCNT algorithm for x64 & AArch64.
  • primesieve.h: Fix -Wstrict-prototypes warning.
  • examples/c/*.c: Fix -Wstrict-prototypes warning.
  • test/*.c: Fix -Wstrict-prototypes warning.
  • CMakeLists.txt: New WITH_AUTO_VECTORIZATION option (with default ON).
  • cmake/auto_vectorize.cmake: Enable auto-vectorization if the compiler supports it.
  • scripts/build_mingw64_x64.sh: Build primesieve x64 release binary.
  • scripts/build_mingw64_arm64.sh: Build primesieve arm64 release binary.

primesieve-11.0

06 Dec 17:56
Compare
Choose a tag to compare

This version fixes two annoying libprimesieve issues. Firstly, from now on the shared libprimesieve version (.so version) will match the primesieve version. This makes it easier to depend on libprimesieve and to update to the latest libprimesieve. Secondly, primesieve_jump_to() has been added to libprimesieve's API. The new primesieve_jump_to(iter, start, stop) includes the start number (generates primes β‰₯ start), whereas the old primesieve_skipto(iter, start, stop) excludes the start number (generates primes > start). In practice, the use of
primesieve_jump_to() requires up to 2x less start number corrections (e.g. start-1) compared to primesieve_skipto().

C API deprecations

The libprimesieve C API and ABI are backwards compatible with libprimesieve β‰₯ 10.0. However, the primesieve_skipto() function from the libprimesieve C API has been marked as deprecated, please use the new primesieve_jump_to() instead.

C++ API breaking changes

Unlike the C API, in the C++ API the primesieve::iterator::skipto() method has been replaced by primesieve::iterator::jump_to(). The new method includes the start number whereas the old method excluded the start number. The primesieve::iterator constructors now also include the start number while they previously excluded the start number. Please read the documentation for more information.

ChangeLog

  • CMakeLists.txt: Improve Emscripten WebAssembly support.
  • iterator.cpp: Add new primesieve::iterator::jump_to().
  • iterator.cpp: Fix use after free in primesieve::iterator::clear().
  • iterator-c.cpp: Add new primesieve_jump_to().
  • iterator-c.cpp: Mark primesieve_skipto() as deprecated.
  • iterator-c.cpp: Fix use after free in primesieve_iterator_clear().
  • pod_vector.hpp: Added support for types with destructors.
  • malloc_vector.hpp: Fix potential memory leak.
  • api.cpp: Support non power of 2 sieve sizes.
  • PrimeSieve.cpp: Support non power of 2 sieve sizes.
  • PreSieve.cpp: Use std::initializer_list instead of std::vector.
  • Erat.cpp: Improve documentation.
  • C_API.md: Improve next_prime() and prev_prime() documentation.
  • CPP_API.md: Improve next_prime() and prev_prime() documentation.

Acknowledgements

I would like to thank Philip Vetter for his detailed feedback on the libprimesieve API, which caused me to create the new primesieve_jump_to().

primesieve-8.0

05 Jul 13:46
Compare
Choose a tag to compare

This is a new major release, the API of libprimesieve is backwards compatible, but the ABI (Application Binary Interface) of libprimesieve is not backwards compatible. This means that if your program uses the C/C++ libprimesieve you can simply recompile your program against the latest libprimesieve without any modifications of your code needed. If on the other hand you have e.g. written libprimesieve bindings for another programming language you will have to migrate your code to the new libprimesieve ABI.

Highlights of primesieve-8.0

  • libprimesieve now has multiarch support for x64 CPUs. At runtime libprimesieve now dispatches to the latest supported CPU instruction set like POPCNT, BMI2, AVX512 #116.
  • libprimesieve now generates an array (or vector) of primes up to 20% faster #123.

ChangeLog

  • primesieve::iterator's ABI has been modified in both the C & C++ API.
    primesieve::iterator's API remains backwards compatible.
  • CPP_API.md: Renamed doc/CPP_Examples.md to doc/CPP_API.md.
  • C_API.md: Renamed doc/C_Examples.md to doc/C_API.md.
  • Fix undefined behavior (g++-12 issue) caused by resizeUninitialized.hpp, use new pod_vector<uint64_t> from pod_vector.hpp instead.
  • iterator.cpp: Enable pre-sieving for primesieve::iterator.prev_prime().
  • iterator-c.cpp: Enable pre-sieving for primesieve::iterator.prev_prime().
  • PreSieve.cpp: Detect if the user sieves many consective intervals.
  • PrimeGenerator.cpp: Improve AVX512 of fillNextPrimes().
  • PrimeGenerator.cpp: Reduce memory usage for tiny stop numbers.
  • PrimeGenerator.hpp: Add GCC/Clang's function multiversioning for AVX512.
  • Erat.cpp: Dynamically grow the sieve size: use a small sieve size for small stop numbers and a large sieve size for large stop numbers.
  • Erat.cpp: Reduce memory usage, allocate the minimum required memory to store all sieving primes.
  • CpuInfo.cpp: Detect AVX512 using CPUID.
  • pmath.hpp: Use compiler instrinsics for ilog2() & floorPow2().
  • StorePrimes.hpp: Use vector::insert() instead of vector::push_back(), see: #123.
  • CMakeLists.txt: Automatically enable expensive debug assertions in debug mode (if CMAKE_BUILD_TYPE=Debug).

primesieve-7.9

03 May 10:46
Compare
Choose a tag to compare

This is a new minor release, the API and ABI of libprimesieve are backwards compatible.

The focus of this release has been to reduce the memory usage of libprimesieve and to reduce its initialization overhead. I have also added support for big.LITTLE CPU detection on Linux which provides a significant speedup on Intel's latest consumer CPUs. Many of the improvements in this release originated from Jason's patch set, thank you Jason!

ChangeLog

  • intrinsics.hpp: Improved x64 BSF assembly.
  • iterator.cpp: Reduce memory allocations in generate_prev_primes().
  • iterator-c.cpp: Reduce memory allocations.
  • CpuInfo.cpp: Improve hybrid CPU detection on Linux.
  • Erat.cpp: Reduce memory usage when sieving a single segment.
  • EratBig.cpp: Improve instruction level parallelism.
  • EratBig.cpp: Improve next wheel index code.
  • EratBig.cpp: Use std::copy() instead of std::rotate().
  • SievingPrimes.cpp: Reduce branch mispredictions.
  • PreSieve.cpp: Hardcode buffersDist.
  • MemoryPool.cpp: Reduce memory usage.
  • StorePrimes.hpp: Improve nth prime approximation.
  • config.hpp: Tune FACTOR_ERATMEDIUM constant.
  • Use a single MemoryPool per thread (previously 2).
  • Increase max sieve array size to 8 KiB.