Skip to content
Compare
Choose a tag to compare

This is a new major release, the API of libprimesieve is backwards compatible, but the ABI (Application Binary Interface) of libprimesieve is not backwards compatible. This means that if your program uses the C/C++ libprimesieve you can simply recompile your program against the latest libprimesieve without any modifications of your code needed. If on the other hand you have e.g. written libprimesieve bindings for another programming language you will have to migrate your code to the new libprimesieve ABI.

Highlights of primesieve-8.0

  • libprimesieve now has multiarch support for x64 CPUs. At runtime libprimesieve now dispatches to the latest supported CPU instruction set like POPCNT, BMI2, AVX512 #116.
  • libprimesieve now generates an array (or vector) of primes up to 20% faster #123.

ChangeLog

  • primesieve::iterator's ABI has been modified in both the C & C++ API.
    primesieve::iterator's API remains backwards compatible.
  • CPP_API.md: Renamed doc/CPP_Examples.md to doc/CPP_API.md.
  • C_API.md: Renamed doc/C_Examples.md to doc/C_API.md.
  • Fix undefined behavior (g++-12 issue) caused by resizeUninitialized.hpp, use new pod_vector<uint64_t> from pod_vector.hpp instead.
  • iterator.cpp: Enable pre-sieving for primesieve::iterator.prev_prime().
  • iterator-c.cpp: Enable pre-sieving for primesieve::iterator.prev_prime().
  • PreSieve.cpp: Detect if the user sieves many consective intervals.
  • PrimeGenerator.cpp: Improve AVX512 of fillNextPrimes().
  • PrimeGenerator.cpp: Reduce memory usage for tiny stop numbers.
  • PrimeGenerator.hpp: Add GCC/Clang's function multiversioning for AVX512.
  • Erat.cpp: Dynamically grow the sieve size: use a small sieve size for small stop numbers and a large sieve size for large stop numbers.
  • Erat.cpp: Reduce memory usage, allocate the minimum required memory to store all sieving primes.
  • CpuInfo.cpp: Detect AVX512 using CPUID.
  • pmath.hpp: Use compiler instrinsics for ilog2() & floorPow2().
  • StorePrimes.hpp: Use vector::insert() instead of vector::push_back(), see: #123.
  • CMakeLists.txt: Automatically enable expensive debug assertions in debug mode (if CMAKE_BUILD_TYPE=Debug).
Compare
Choose a tag to compare

This is a new minor release, the API and ABI of libprimesieve are backwards compatible.

The focus of this release has been to reduce the memory usage of libprimesieve and to reduce its initialization overhead. I have also added support for big.LITTLE CPU detection on Linux which provides a significant speedup on Intel's latest consumer CPUs. Many of the improvements in this release originated from Jason's patch set, thank you Jason!

ChangeLog

  • intrinsics.hpp: Improved x64 BSF assembly.
  • iterator.cpp: Reduce memory allocations in generate_prev_primes().
  • iterator-c.cpp: Reduce memory allocations.
  • CpuInfo.cpp: Improve hybrid CPU detection on Linux.
  • Erat.cpp: Reduce memory usage when sieving a single segment.
  • EratBig.cpp: Improve instruction level parallelism.
  • EratBig.cpp: Improve next wheel index code.
  • EratBig.cpp: Use std::copy() instead of std::rotate().
  • SievingPrimes.cpp: Reduce branch mispredictions.
  • PreSieve.cpp: Hardcode buffersDist.
  • MemoryPool.cpp: Reduce memory usage.
  • StorePrimes.hpp: Improve nth prime approximation.
  • config.hpp: Tune FACTOR_ERATMEDIUM constant.
  • Use a single MemoryPool per thread (previously 2).
  • Increase max sieve array size to 8 KiB.
Compare
Choose a tag to compare

This is a new minor release, the API and ABI of libprimesieve are backwards compatible.

The primesieve command-line program runs up to 10% faster due to improved pre-sieving and libprimesieve's primesieve::iterator runs up to 15% faster due to improved pre-sieving, reduced branch mispredictions and increased instruction level parallelism. primesieve now pre-sieves the multiples of small primes < 100 (previously ≤ 19) using only half as much memory as before. Instead of using a single large pre-sieved buffer primesieve now uses 8 smaller pre-sieved buffers which are bitwise AND together before being copied into the sieve array. Thanks to @zielaj for this amazing work!

ChangeLog

  • PreSieve.cpp: Add multiple pre-sieve buffers #110.
  • PrimeGenerator.cpp: Reduce branch mispredictions #109.
  • PrimeGenerator.cpp: Add AVX512 algorithm #109.
  • iterator.cpp: Avoid default initialization of primes vector.
  • iterator-c.cpp: Avoid default initialization of primes vector.
  • ParallelSieve.cpp: Initialize PreSieve.
  • ALGORITHMS.md: Update documentation.
286ecbe
Compare
Choose a tag to compare

This is a new minor release, the API and ABI of libprimesieve are backwards compatible.

The CPU cache size detection has been improved on big.LITTLE CPUs such as Intel Alder Lake. The code now also handles uncertain situations better when CPU cache information is only partially available, it then uses a more conservative approach (i.e. smaller sieve array size) to prevent potential scaling issues.

Backwards incompatible change in primesieve command-line application

The behavior of the -q/--quiet option in the primesieve command-line application has been modified. This option now prints the result without any additional text, e.g. "1229" instead of previously "Primes: 1229". This is a backwards incompatible change in the primesieve command-line application, however I didn't increase primesieve's major version since this change does not affect libprimesieve's API/ABI.

ChangeLog

  • CpuInfo.cpp: Fix issues with big.LITTLE CPUs #105.
  • api.cpp: Simplify private L2 cache size detection #103.
  • config.cpp: Add fallback sieve size & L1 data cache size.
  • Erat.cpp: If runtime CPU cache detection fails use config::L1D_CACHE_BYTES.
  • main.cpp: Improve -q/--quiet option #102.
  • api-c.cpp: Print error messages to stderr.
  • iterator-c.cpp: Print error messages to stderr.
  • doc/primesieve.1: Update man page.
  • CMakeLists.txt: Add WITH_MSVC_CRT_STATIC option to force static linking.
  • C_Examples.md: Add CMake build instructions.
  • CPP_Examples.md: Add CMake build instructions.
Compare
Choose a tag to compare

This is a minor new release, the API and ABI (Application binary interface) are backwards compatible.

The CPU cache size detection has been improved on Linux and on the new Apple Silicon CPUs. I have also finally found a way to get rid of goto without deteriorating performance i.e. by annotating branches with likely or unlikely.

ChangeLog

  • The primesieve GUI application has been deprecated/removed. It only works with QT4 which has reached end-of-life.
  • Get rid Travis-CI because it is not free anymore.
  • CpuInfo.cpp: Linux kernel CPU detection has been updated.
  • CpuInfo.cpp: Add workaround for sysctl bug (macOS & iOS).
  • Erat.hpp: Use CTZ instruction on x64 and ARM64 CPUs.
  • config.hpp: Tune FACTOR_ERATSMALL factor.
  • EratSmall.cpp: Get rid of goto.
  • EratSmall.cpp: Optimize switch statement.
  • EratSmall.cpp: Annotate switch cases with fallthrough.
  • EratMedium.cpp: Get rid of goto.
  • EratMedium.cpp: Optimize switch statements.
  • EratMedium.cpp: Annotate switch cases with fallthrough.
  • EratBig.cpp: Simplify main sieving loop.
  • doc/C_Examples.md: libprimesieve C code examples.
  • doc/CPP_Examples.md: libprimesieve C++ code examples.
Compare
Choose a tag to compare

This is a minor new release, the API and ABI (Application binary interface) are backwards compatible.

Highlights

primesieve::iterator::next_prime() has been sped up by about 10%. Since primesieve::iterator is also used under the hood for generating an array (or vector) of primes that should also run slightly faster.

I have also removed the https://primesieve.org website because it simply took too much effort to maintain it and make it look nice across all devices, operating systems and browsers. Hence primesieve's main homepage is now its GitHub repo: https://github.com/kimwalisch/primesieve

Changelog

  • Erat.cpp: Silence MSVC debug warning.
  • StorePrimes.hpp: Add workaround for windows.h max/min macros.
  • PrimeGenerator.cpp: Cache more primes.
  • SievingPrimes.cpp: Cache more primes.
  • cmdoptions.cpp: Support options of type: --option VALUE.
  • help.cpp: Improve help menu.
  • CMakeLists.txt: Require CMake 3.4 instead 3.9.
  • primesieve.pc.in: Fix libdir and includedir.
  • README.md: Add libprimesieve multi-threading section.
  • BUILD.md: Add detailed build instructions.
  • doc/ALGORITHMS.md: Info from https://primesieve.org.
  • doc/primesieve.txt: New AsciiDoc man page.

For Linux package maintainers

primesieve-7.5 includes a new man page written using AsciiDoc. Previously primesieve's man page was generated using the help2man program. primesieve includes a ready to distribute man page at primesieve/doc/primesieve.1. If your Linux distribution requires that man pages must be generated from source you can find the related build instructions here: BUILD.md#man-page-regeneration.

Compare
Choose a tag to compare

This release fixes 2 bugs, improves caching of small primes and adds a new --test option to the primesieve console application.

  • CpuInfo.cpp: Fix MinGW CPU detection.
  • CMakeLists.txt: Fix cross compilation bug.
  • Add --test option: Runs self tests.
  • IteratorHelper.cpp: Improve caching of small primes.
  • ParallelSieve.cpp: Non-blocking status updates.
  • PrimeSieve.cpp: Simplify status update.
  • travis.yml: Test using GCC 5, 6, 7, 8, Clang 7 and MinGW.
237edba
Compare
Choose a tag to compare

primesieve-7.3 improves the cache efficiency of the sieving algorithm for large sieving primes. By using aligned memory it is possible to reduce the number of pointer indirections which reduces cache pollution. I have measured a speed up of 15% near 1e18 and a speed up of 25% near 1e19.

  • EratBig.cpp: Improve cache efficiency.
  • MemoryPoop.cpp: Allocate buckets aligned by sizeof(Bucket).
  • Bucket.hpp: sizeof(Bucket) is now a power of 2.
  • primesieve::iterator: Support C++ move semantics.
  • cmdoptions.cpp: Fix array out of bounds bug.
  • CpuInfo.cpp: Fix MinGW/MSYS2 -Wcast-function-type warning.
Compare
Choose a tag to compare

primesieve-7.2 features a new algorithm for medium sieving primes that improves the CPU's branch prediction rate by sorting the sieving primes (before using them). On AMD EPYC CPUs I have measured a speedup of up to 20% and on Intel Skylake CPUs I have measured a speedup of up to 10%. Ever since primesieve was created in 2010 its algorithm for medium sieving primes has been slower than yafu's algorithm for medium sieving primes. This performance issue has now been fixed!

Benchmark primesieve 7.1 vs 7.2

Start primesieve 7.1
AMD EPYC 7401P
primesieve 7.2
AMD EPYC 7401P
primesieve 7.1
Intel Xeon 8124M
primesieve 7.2
Intel Xeon 8124M
1010 2.73 sec 2.27 sec 1.97 sec 1.83 sec
1011 3.18 sec 2.65 sec 2.25 sec 2.06 sec
1012 4.00 sec 3.25 sec 2.64 sec 2.39 sec
1013 5.24 sec 4.11 sec 3.22 sec 2.92 sec
1014 6.44 sec 5.03 sec 3.82 sec 3.53 sec
1015 7.67 sec 6.04 sec 4.40 sec 4.08 sec
1016 8.81 sec 7.22 sec 5.03 sec 4.72 sec
1017 10.53 sec 8.51 sec 5.84 sec 5.60 sec

At each start offset primesieve counted the primes inside an interval of size 10^10 using a single thread.

Compare
Choose a tag to compare

primesieve-7.1 runs up to 30% faster on Intel Skylake-X CPUs!

The default sieve size is now (L2 cache size / 2). Using a sieve size that is slightly smaller than the L2 cache size allows other important data structures to also fit into the L2 cache. This reduces the number of L2 cache misses which improves performance on CPUs with slow L3 caches. primesieve-7.1 will also run slightly faster (< 3%) on most other Intel CPUs.

  • api.cpp: Default sieve size = (L2 cache size / 2).
  • CpuInfo.cpp: Improved CPU info detection.
  • Erat.cpp: Lazy PreSieve initialization.
  • EratSmall.cpp: Fix too large sieve size.
  • help.cpp: Update help menu (--help).
  • ParallelSieve.cpp: Improved load balancing.
  • --cpu-info: New option, prints CPU information.
  • Rename kilobytes to KiB because it is more accurate.
  • Faster Windows binary built using clang-cl.