GPU Mersenne primality test.
Branch: master
Clone or download
Latest commit 9e9a70e Feb 14, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore add scons build Nov 3, 2018
Args.cpp args -D multiple of 210 Jan 24, 2019
FFTConfig.cpp v6.2 : drop FFT middle 3, 5; add 6 Jan 27, 2019
FFTConfig.h add FFTConfig; display FFT configs on -h Jan 11, 2019
GmpUtil.cpp in work P-1 second stage Jan 23, 2019
GmpUtil.h in work P-1 second stage Jan 23, 2019
Gpu.cpp "fix" some msys2 warnings Feb 9, 2019
Gpu.h rm multiplySub Feb 2, 2019
LICENSE Add GPL license. Apr 19, 2017
Makefile Gpu: make temporary buffers explicit Jan 24, 2019
Pm1Plan.cpp Pm1Plan : do not push everything up anymore Jan 25, 2019
Pm1Plan.h P-1 stage2 ETA Jan 24, 2019
Primes.cpp faster K-set selection Oct 23, 2018
Primes.h cppcheck Dec 7, 2018 Update Feb 14, 2019
SConstruct P-1 show plan early; primenet script add P-1 Jan 24, 2019
Signal.cpp drop possibly excessive asserts in Signal.cpp for failing on Windows Oct 31, 2018
Signal.h add Signal Oct 23, 2018
Task.cpp enable write P-1 result Jan 24, 2019
Task.h obliterate Result.h .cpp Jan 14, 2019
Worktodo.cpp P-1 first stage Jan 14, 2019
args.h add P-1 second stage plan Pm1Plan Jan 18, 2019
checkpoint.cpp "fix" some msys2 warnings Feb 9, 2019
checkpoint.h drop bufBase Jan 11, 2019
clwrap.cpp reserve 400MB GPU mem in P-1 stage2 Feb 5, 2019
clwrap.h reserve 400MB GPU mem in P-1 stage2 Feb 5, 2019
common.cpp file openRead/openWrite Sep 14, 2018
common.h v6.2 : drop FFT middle 3, 5; add 6 Jan 27, 2019 fix M61, initial timing not optimized. Oct 29, 2017
file.h file openRead/openWrite Sep 14, 2018 reserve 400MB GPU mem in P-1 stage2 Feb 5, 2019
gpuowl.cpp obliterate Result.h .cpp Jan 14, 2019
k.cpp faster K-set selection Oct 23, 2018
kernel.h rename Stats to TimeInfo, move to kernel.h Jan 12, 2019 primenet more work cachen on P-1 Jan 26, 2019
selftest.h simplify sieve Aug 8, 2018
shared.h add shared.h; start add offset tracking to Gpu Jun 12, 2018
state.cpp rm OpenGpu (merged into Gpu) Oct 20, 2018
state.h rm OpenGpu (merged into Gpu) Oct 20, 2018
timeutil.cpp add timeutil.cpp Sep 18, 2018
timeutil.h Gpu: make temporary buffers explicit Jan 24, 2019
tinycl.h query gpu allocable blocks Jan 16, 2019
worktodo.h split implem for Worktodo & Result Sep 14, 2018


GpuOwl is a Mersenne primality tester for AMD GPUs.

Mersenne primes

Mersenne numbers are numbers of the form 2^p -1. Some of these are prime numbers, called Mersenne primes.

The largest known Mersenne primes are huge numbers. They are extremely difficult to find, and discovering a new Mersenne prime is a noteworthy achievement. A long-standing distributed compting project named the Great Internet Mersenne Prime Search (GIMPS) has been searching for Mersenne primes for the last 30 years.

While traditionally the algorithms involved were implemented targeting CPUs, the GPUs have seen increased usage in computing recently because of their impressive power and wide memory bandwidth, which are advantages relative to CPUs.

GpuOwl is an implementation of some of the algorithms involved in searching for Mersenne primes in the OpenCL language for execution on modern AMD GPUs. GpuOwl runs best on top of the ROCm OpenCL stack.

Mersenne primality tests

These are the main test involved in Mersenne prime search:

  • TF, Trial Factoring
  • P-1, Pollard's p-1 factoring
  • LL, Lucas-Lehmer primality test
  • PRP, probable prime test

Trial Factoring (TF)

In this test prime factors of increasingly larger magnitude are tried, checking if they divide the Mersenne candidate M(p). TF is good as a first line of attack, representing a cheap filter that removes some Mersenne candidates by finding a factor (thus deciding that the M(p) is not prime). The limitation of TF is that the checking effort grows exponentially with the size of the factors that are trialed, thus TF remains just a "first line of attack" approach.

Pollard's P-1 factoring (P-1)

This is a very ingenious, beautiful algorithm for finding factors of Mersenne candidates. It detects a special class of factors F where F-1 is higly composite (has many factors). P-1 is used as a preliminary filter (much like TF), that removes some Mersenne candidates, proving them composite by finding a factor.

Lucas-Lehmer (LL)

This is a test that proves whether a Mersenne number is prime or not, but without providing a factor in the case where it's not prime. The Lucas-Lehmer test is very simple to describe: iterate the function f(x)=(x^2 - 2) modulo M(p) starting with the number 4. If after p-1 iterations the result is 0, then M(p) is certainly prime, otherwise M(p) is certainly not prime.

Lucas-Lehmer, while a very efficient primality test, still takes a rather long time for large Mersenne numbers (on the order of weeks of intense compute), thus it is only applied to the Mersenne candidates that survived the cheaper preliminary filters TF and P-1.

PRP ("the new LL")

The probable prime test can prove that a candidate is composite (without providing a factor), but does not prove that a candidate is prime (only stating that it probably is prime) -- although in practice the difference between probable prime and proved prime is extremely small for large mersenne candidates.

The PRP test is very similar computationally to LL: PRP iterates f(x) = x^2 modulo M(p) starting from 3, for p iterations. The cost of PRP is exacly the same as LL.

In practice PRP is preferred over LL because PRP does have a very strong and useful error-checking technique, which protects effectivelly against computation errors (which are sometimes common on GPUs).

GpuOwl: OpenCL GPU Mersenne primality testing

GpuOwl implements the PRP and P-1 tests. It also implemented, at various points in the past, LL and TF but these are not active now in GpuOwl.

Let's consider the PRP test, to get an idea of what GpuOwl does under the hood.

PRP uses what is called a modular squaring, computing f(x) = x^2 modulo M(p), starting from 3 (where x is an integer).

The problem is in the size of the integer x that is to be squared, which is on the order of 100 million bits in size.

How do we compute efficiently the square of a 100 million bits integer? It turns out that one of the fastest multiplication algorithms for huge numbers consists in doing a convolution, which involves a direct and an inverse FFT transform, with a simple element-wise multiplication in the FFT domain.

And this is exacly what GpuOwl does: it implements, as building blocks, efficient huge FFT transforms. Many algorithmic tricks are also used to speed up computation, e.g. the "Irrational Base Discrete Weighted Transform" (IBDWT) described by Richard Crandall.

Files used by gpuOwl

  • worktodo.txt : contains exponents to test, one entry per line
  • results.txt : contains the results
  • N.owl : the most recent checkpoint for exponent ; will resume from here
  • N-prev.owl : the previous checkpoint, to be used if N.ll is lost or corrupted
  • N.iteration.owl : a persistent checkpoint at the given iteration


The lines in worktodo.txt must be of one of these forms:

  • 70100200
  • PRP=FCECE568118E4626AB85ED36A9CC8D4F,1,2,77936867,-1,75,0

The first form indicates just the exponent to test, while the form starting with PRP indicates both the exponent and the assignment ID (AID) from PrimeNet.


  • Make sure that the file is in the same folder as the executable
  • Get "PRP smallest available first time tests" assignments from GIMPS Manual Testing ( ).
  • Copy the assignment lines from GIMPS to a file named 'worktodo.txt'
  • Run gpuowl. It prints progress report on stdout and in gpuowl.log, and writes result lines to results.txt
  • Submit the result lines from results.txt to manual testing.


To build simply invoke "make" (or look inside the Makefile for a manual build).

  • the library libgmp-dev
  • a C++ compiler (e.g. gcc, clang)
  • an OpenCL implementation (which provides the libOpenCL library). Recommended: an AMD GPU with ROCm 1.7.

See "gpuowl -h" for the command line options.


Simply start GpuOwl with any valid exponent, and the built-in error checking kicks in, validating the computation. If you start seeing output lines with "OK", than it's working correctly. "EE" lines indicate computation errors.