Skip to content
Permalink
Browse files
Added Travis CI and GitHub Actions CI. Updated README.
  • Loading branch information
tdulcet authored and preda committed Mar 2, 2021
1 parent e47361b commit f5bb7c83e4438dc87398ad904582d08ad12120f0
Showing with 176 additions and 59 deletions.
  1. +54 −0 .github/workflows/ci.yml
  2. +54 −0 .travis.yml
  3. +2 −2 Makefile
  4. +61 −56 README.md
  5. +1 −1 codestyle.md
  6. +4 −0 tools/README.md
@@ -0,0 +1,54 @@
name: CI

on:
push:
pull_request:
schedule:
- cron: '0 0 1 * *'

jobs:
Linux:
name: Linux

runs-on: ${{ matrix.os }}
continue-on-error: ${{ (matrix.os == 'ubuntu-18.04' && matrix.cxx == 'g++') || matrix.cxx == 'clang++' }}
strategy:
matrix:
os: [ubuntu-18.04, ubuntu-20.04]
cxx: [g++, clang++]
fail-fast: false
env:
CXX: ${{ matrix.cxx }}
steps:
- uses: actions/checkout@v1
- name: Install
run: |
sudo apt-get -yqq update
sudo apt-get -yqq install cppcheck ocl-icd-* opencl-headers
$CXX --version
- name: Before
if: ${{ matrix.os == 'ubuntu-18.04' && matrix.cxx == 'g++' }}
run: |
sed -i 's/<filesystem>/<experimental\/filesystem>/' *.h *.cpp
sed -i 's/std::filesystem/std::experimental::filesystem/' *.h *.cpp
sed -i 's/assert(false);/abort();/' Pm1Plan.cpp
- name: Script
run: |
make -j "$(nproc)"
./gpuowl -h
- name: Cppcheck
run: cppcheck --enable=all .
- name: ShellCheck
run: bash -c 'shopt -s globstar; shellcheck -s bash **/*.sh || true'

Windows:
name: Windows

runs-on: windows-latest
continue-on-error: true
steps:
- uses: actions/checkout@v1
- name: Script
run: |
make gpuowl-win.exe
./gpuowl-win.exe -h
@@ -0,0 +1,54 @@
language: cpp

matrix:
include:
- name: "Ubuntu 18.04 (gcc)"
os: linux
dist: bionic
compiler: gcc
virt: vm
before_script:
- sed -i 's/<filesystem>/<experimental\/filesystem>/' *.h *.cpp
- sed -i 's/std::filesystem/std::experimental::filesystem/' *.h *.cpp
- sed -i 's/assert(false);/abort();/' Pm1Plan.cpp
- name: "Ubuntu 18.04 (clang)"
os: linux
dist: bionic
compiler: clang
virt: vm
before_script:
- sed -i 's/<filesystem>/<experimental\/filesystem>/' *.h *.cpp
- sed -i 's/std::filesystem/std::experimental::filesystem/' *.h *.cpp
- name: "Ubuntu 20.04 (gcc)"
os: linux
dist: focal
compiler: gcc
virt: vm
- name: "Ubuntu 20.04 (clang)"
os: linux
dist: focal
compiler: clang
virt: vm
- name: "Windows"
os: windows
install: choco install python3 --version=3.8.8
env: PATH=/c/Python38:/c/Python38/Scripts:$PATH
script:
- mingw32-make gpuowl-win.exe
- ./gpuowl-win.exe -h
allow_failures:
- compiler: gcc
os: linux
dist: bionic
- compiler: clang
- os: windows

install:
- sudo apt-get -yqq update
- sudo apt-get -yqq install cppcheck ocl-icd-* opencl-headers
script:
- make -j "$(nproc)"
- ./gpuowl -h
- cppcheck --enable=all .
- bash -c 'shopt -s globstar; shellcheck -s bash **/*.sh || true'

@@ -1,10 +1,10 @@
CXXFLAGS = -Wall -O2 -std=gnu++17
CXXFLAGS = -Wall -g -O3 -std=gnu++17

LIBPATH = -L/opt/rocm-4.0.0/opencl/lib -L/opt/rocm-3.3.0/opencl/lib/x86_64 -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L.

LDFLAGS = -lstdc++fs -lOpenCL -lgmp -pthread -lquadmath ${LIBPATH}

LINK = $(CXX) -o $@ ${OBJS} ${LDFLAGS}
LINK = $(CXX) $(CXXFLAGS) -o $@ ${OBJS} ${LDFLAGS}

SRCS = ProofCache.cpp Proof.cpp Pm1Plan.cpp B1Accumulator.cpp Memlock.cpp log.cpp GmpUtil.cpp Worktodo.cpp common.cpp main.cpp Gpu.cpp clwrap.cpp Task.cpp Saver.cpp timeutil.cpp Args.cpp state.cpp Signal.cpp FFTConfig.cpp AllocTrac.cpp gpuowl-wrap.cpp sha3.cpp md5.cpp
OBJS = $(SRCS:%.cpp=%.o)
117 README.md
@@ -1,21 +1,24 @@
[![Build Status](https://travis-ci.org/preda/gpuowl.svg?branch=master)](https://travis-ci.org/preda/gpuowl)
[![Actions Status](https://github.com/preda/gpuowl/workflows/CI/badge.svg?branch=master)](https://github.com/preda/gpuowl/actions)

# GpuOwl

GpuOwl is a Mersenne primality tester for AMD GPUs.
GpuOwl is a Mersenne primality tester for AMD, Nvidia and Intel GPUs supporting OpenCL.

If you are making source code changes to GpuOwl, please read the [code style](codestyle.md)

## Mersenne primes
Mersenne numbers are numbers of the form 2^p -1. Some of these are prime numbers, called <em>Mersenne primes</em>.
Mersenne numbers are numbers of the form 2<sup>p</sup> -1. Some of these are prime numbers, called _Mersenne primes_.

The largest known Mersenne primes are huge numbers. They are extremely difficult to find, and discovering a new Mersenne prime
is a noteworthy achievement. A long-standing distributed compting project named the Great Internet Mersenne Prime Search (GIMPS)
is a noteworthy achievement. A long-standing distributed computing project named the Great Internet Mersenne Prime Search (GIMPS)
has been searching for Mersenne primes for the last 30 years.

While traditionally the algorithms involved were implemented targeting CPUs, the GPUs have seen increased usage in computing recently
because of their impressive power and wide memory bandwidth, which are advantages relative to CPUs.

GpuOwl is an implementation of some of the algorithms involved in searching for Mersenne primes in the OpenCL language for execution
on modern AMD GPUs. GpuOwl runs best on top of the ROCm OpenCL stack.
on modern AMD, Nvidia and Intel GPUs. GpuOwl runs best on top of the ROCm OpenCL stack.

## Mersenne primality tests
These are the main test involved in Mersenne prime search:
@@ -25,86 +28,86 @@ These are the main test involved in Mersenne prime search:
* PRP, probable prime test

### Trial Factoring (TF)
In this test prime factors of increasingly larger magnitude are tried, checking if they divide the Mersenne candidate M(p).
In this test, prime factors of increasingly larger magnitude are tried, checking if they divide the Mersenne candidate M(p).
TF is good as a first line of attack, representing a cheap filter that removes some Mersenne candidates by finding a factor (thus
deciding that the M(p) is not prime). The limitation of TF is that the checking effort grows exponentially with the size of the
factors that are trialed, thus TF remains just a "first line of attack" approach.
factors that are trialed, thus TF remains just a first line of attack approach.

### Pollard's P-1 factoring (P-1)
This is a very ingenious, beautiful algorithm for finding factors of Mersenne candidates. It detects a special class of factors
F where F-1 is higly composite (has many factors). P-1 is used as a preliminary filter (much like TF), that removes some Mersenne
F where F-1 is highly composite (has many factors). P-1 is used as a preliminary filter (much like TF), that removes some Mersenne
candidates, proving them composite by finding a factor.

### Lucas-Lehmer (LL)
This is a test that proves whether a Mersenne number is prime or not, but without providing a factor in the case where it's not prime.
This is a test that proves whether a Mersenne number is prime or not, but without providing a factor in the case where it is not prime.
The Lucas-Lehmer test is very simple to describe: iterate the function f(x)=(x^2 - 2) modulo M(p) starting with the number 4. If
after p-2 iterations the result is 0, then M(p) is certainly prime, otherwise M(p) is certainly not prime.

Lucas-Lehmer, while a very efficient primality test, still takes a rather long time for large Mersenne numbers
(on the order of weeks of intense compute), thus it is only applied to the Mersenne candidates that survived the cheaper preliminary
filters TF and P-1.

### PRP ("the new LL")
### PRP (the new LL)
The probable prime test can prove that a candidate is composite (without providing a factor), but does not prove that a candidate
is prime (only stating that it <em>probably</em> is prime) -- although in practice the difference between probable prime and proved
prime is extremely small for large mersenne candidates.
is prime (only stating that it _probably_ is prime) -- although in practice the difference between probable prime and proved
prime is extremely small for large Mersenne candidates.

The PRP test is very similar computationally to LL: PRP iterates f(x) = x^2 modulo M(p) starting from 3, for p iterations. The cost
of PRP is exacly the same as LL.
of PRP is exactly the same as LL.

In practice PRP is preferred over LL because PRP does have a very strong and useful error-checking technique, which protects effectivelly against computation errors (which are sometimes common on GPUs).
In practice, PRP is preferred over LL because PRP does have a very strong and useful error-checking technique, which protects effectively against computation errors (which are sometimes common on GPUs).

## GpuOwl: OpenCL GPU Mersenne primality testing
GpuOwl implements the PRP and P-1 tests. It also implemented, at various points in the past, LL and TF but these are not active now
in GpuOwl.
in GpuOwl. For double check (DC) LL tests, see [version 6.11](https://github.com/preda/gpuowl/releases/tag/v6.11) and for first time LL tests, see the [LL branch](https://github.com/preda/gpuowl/tree/LL) (version 0.6).

Let's consider the PRP test, to get an idea of what GpuOwl does under the hood.
Let us consider the PRP test, to get an idea of what GpuOwl does under the hood.

PRP uses what is called a <em>modular squaring</em>, computing f(x) = x^2 modulo M(p), starting from 3 (where x is an integer).
PRP uses what is called a _modular squaring_, computing f(x) = x^2 modulo M(p), starting from 3 (where x is an integer).

The problem is in the size of the integer x that is to be squared, which is on the order of 100 million bits in size.
The problem is in the size of the integer x that is to be squared, which about 100 million bits in size.

How do we compute efficiently the square of a 100 million bits integer? It turns out that one of the fastest multiplication algorithms
for huge numbers consists in doing a convolution, which involves a direct and an inverse FFT transform, with a simple element-wise
multiplication in the FFT domain.

And this is exacly what GpuOwl does: it implements, as building blocks, efficient huge FFT transforms. Many algorithmic tricks
are also used to speed up computation, e.g. the "Irrational Base Discrete Weighted Transform" (IBDWT) described by Richard Crandall.
And this is exactly what GpuOwl does: it implements, as building blocks, efficient huge FFT transforms. Many algorithmic tricks
are also used to speed up computation, e.g. the Irrational Base Discrete Weighted Transform (IBDWT) described by Richard Crandall.



## Files used by gpuOwl
* worktodo.txt : contains exponents to test, one entry per line
* results.txt : contains the results
* N.owl : the most recent checkpoint for exponent <N>; will resume from here
* N-prev.owl : the previous checkpoint, to be used if N.ll is lost or corrupted
* N.iteration.owl : a persistent checkpoint at the given iteration
* `worktodo.txt` : contains exponents to test, one entry per line
* `results.txt` : contains the results
* `N.owl` : the most recent checkpoint for exponent <N>; will resume from here
* `N-prev.owl` : the previous checkpoint, to be used if N.ll is lost or corrupted
* `N.iteration.owl` : a persistent checkpoint at the given iteration

## worktodo.txt
The lines in worktodo.txt must be of one of these forms:
* 70100200
* PRP=FCECE568118E4626AB85ED36A9CC8D4F,1,2,77936867,-1,75,0
## `worktodo.txt`
The lines in `worktodo.txt` must be of one of these forms:
* `70100200`
* `PRP=FCECE568118E4626AB85ED36A9CC8D4F,1,2,77936867,-1,75,0`

The first form indicates just the exponent to test, while the form starting with PRP indicates both the
exponent and the assignment ID (AID) from PrimeNet.

## Usage
* Get "PRP smallest available first time tests" assignments from GIMPS Manual Testing ( http://mersenne.org/ ).
* Copy the assignment lines from GIMPS to a file named 'worktodo.txt'
* Run gpuowl. It prints progress report on stdout and in gpuowl.log, and writes result lines to results.txt
* Submit the result lines from results.txt to http://mersenne.org/ manual testing.
* Copy the assignment lines from GIMPS to a file named '`worktodo.txt`'
* Run `gpuowl`. It prints progress report on stdout and in `gpuowl.log`, and writes result lines to `results.txt`
* Submit the result lines from `results.txt` to http://mersenne.org/ manual testing.

## Build
To build simply invoke "make" (or look inside the Makefile for a manual build).
To build simply invoke "`make`" (or look inside the Makefile for a manual build).

* the library libgmp-dev
* a C++ compiler (e.g. gcc, clang)
* the GNU Multiple Precision (GMP) library `libgmp-dev`
* a C++20 compiler (e.g. GCC, Clang)
* an OpenCL implementation (which provides the **libOpenCL** library). Recommended: an AMD GPU with ROCm 1.7.

## See \"gpuowl -h\" for the command line options.
## See \"`gpuowl -h`\" for the command line options.

## Self-test
Simply start GpuOwl with any valid exponent, and the built-in error checking kicks in, validating the computation. If you start seeing output lines with "OK", than it's working correctly. "EE" lines indicate computation errors.
Simply start GpuOwl with any valid exponent, and the built-in error checking kicks in, validating the computation. If you start seeing output lines with "OK", than it is working correctly. "EE" lines indicate computation errors.

## Command-line Arguments
```
@@ -117,40 +120,42 @@ Simply start GpuOwl with any valid exponent, and the built-in error checking kic
-cpu <name> : specify the hardware name.
-time : display kernel profiling information.
-fft <spec> : specify FFT e.g.: 1152K, 5M, 5.5M, 256:10:1K
-block <value> : PRP GEC block size, or LL iteration-block size. Must divide 10'000.
-block <value> : PRP error-check block size. Must divide 10'000.
-log <step> : log every <step> iterations. Multiple of 10'000.
-jacobi <step> : (LL-only): do Jacobi check every <step> iterations. Default 1'000'000.
-carry long|short : force carry type. Short carry may be faster, but requires high bits/word.
-B1 : P-1 B1 bound, default 1000000
-B2 : P-1 B2 bound, default B1 * 30
-B1 : P-1 B1 bound
-B2 : P-1 B2 bound
-rB2 : ratio of B2 to B1. Default 30, used only if B2 is not explicitly set
-cleanup : delete save files at end of run
-prp <exponent> : run a single PRP test and exit, ignoring worktodo.txt
-pm1 <exponent> : run a single P-1 test and exit, ignoring worktodo.txt
-ll <exponent> : run a single LL test and exit, ignoring worktodo.txt
-verify <file>|<exponent> : verify PRP-proof contained in <file> or in the folder <exponent>/
-proof <power> : Valid <power> values are 6 to 9.
By default a proof of power 8 is generated, using 3GB of temporary disk space for a 100M exponent.
-verify <file> : verify PRP-proof contained in <file>
-proof <power> : By default a proof of power 8 is generated, using 3GB of temporary disk space for a 100M exponent.
A lower power reduces disk space requirements but increases the verification cost.
A proof of power 9 uses 6GB of disk space for a 100M exponent and enables faster verification.
-autoverify <power> : Self-verify proofs generated with at least this power. Default 9.
-tmpDir <dir> : specify a folder with plenty of disk space where temporary proof checkpoints will be stored.
-results <file> : name of results file, default 'results.txt'
-iters <N> : run next PRP test for <N> iterations and exit. Multiple of 10000.
-maxAlloc : limit GPU memory usage to this value in MB (needed on non-AMD GPUs)
-yield : enable work-around for CUDA busy wait taking up one CPU core
-maxAlloc <size> : limit GPU memory usage to size, which is a value with suffix M for MB and G for GB.
e.g. -maxAlloc 2048M or -maxAlloc 3.5G
-save <N> : specify the number of savefiles to keep (default 12).
-noclean : do not delete data after the test is complete.
-from <iteration> : start at the given iteration instead of the most recent saved iteration
-yield : enable work-around for Nvidia GPUs busy wait. Do not use on AMD GPUs!
-nospin : disable progress spinner
-use NEW_FFT8,OLD_FFT5,NEW_FFT10: comma separated list of defines, see the #if tests in gpuowl.cl (used for perf tuning)
-safeMath : do not use -cl-unsafe-math-optimizations (OpenCL)
-unsafeMath : use OpenCL -cl-unsafe-math-optimizations (use at your own risk)
-binary <file> : specify a file containing the compiled kernels binary
-device <N> : select a specific device:
```
Device numbers start at zero.

## Primenet.py Arguments
-h, --help show this help message and exit\
-u USERNAME Primenet user name\
-p PASSWORD Primenet password\
-t TIMEOUT Seconds to sleep between updates\
--dirs DIR \[DIR ...\] GpuOwl directories to scan\
--tasks NTASKS Number of tasks to fetch ahead\
## `Primenet.py` Arguments
```
-h, --help show this help message and exit
-u USERNAME Primenet user name
-p PASSWORD Primenet password
-t TIMEOUT Seconds to sleep between updates
--dirs DIR \[DIR ...\] GpuOwl directories to scan
--tasks NTASKS Number of tasks to fetch ahead
-w \{PRP,PM1,LL_DC,PRP_DC,PRP_WORLD_RECORD,PRP_100M\} GIMPS work type
```
@@ -5,7 +5,7 @@ GpuOwl code style, C++ and OpenCL
- open bracket on the same line
- always curly-braces {} after if and else
- no space between function name and open parens (e.g. in a function call)
- one space between between if/while/for and open parens
- one space between if/while/for and open parens

Example:
```C++
@@ -1,10 +1,14 @@
Simple ISA instruction-counts diff tool:
if you dumped ISA to two folders A and B, to see the simple diff use:
```sh
./tools/delta.sh A/*.s B/*.s
```

Example output:

```
~/gpuowl$ ./tools/delta.sh tmp4/5M_0_gfx906.s tmp6/5M_0_gfx906.s
tailFused : s_mov_b32 119 | tailFused : s_mov_b32 113
tailFused : v_add_f64 443 | tailFused : v_add_f64 437
tailFused : v_mul_f64 176 | tailFused : v_mul_f64 170
```

0 comments on commit f5bb7c8

Please sign in to comment.