GitHub - rob147147/CUDA-Riesel-Sieve: A CUDA based sieve for numbers of the form k*b^n-1. This project is heavily based on SR2Sieve.

This is a sieve for eliminating non-primes of the form k*b^n-1, typically searched by the Conjectures 'R Us (CRUS) project over at mersenneforum.org.

There is an executable available for download for Linux and Win64 (outdated), use the -h option to see the valid command line flags.

The CPU is responsible for reading in ABCD files and generating arrays of prime numbers to send to the GPU.
The GPU takes an array of prime numbers which may be prime factors of numbers of the form k*b^n-1. The GPU is also given a list of k-values and the n-min and n-max value from the ABCD file.

The core algorithm run on the GPU is the Baby Steps Giants Steps (BSGS) algorithm for solving the discrete logarithm.

This is very much a work in progress but these are some current performance numbers from CUDA enabled NVidia GPUs to which I have access.
The test file is R745.ABCD which contains 22 k-values, with n-min = 180,000 and n-max = 250,000 giving an n-range of 70,000.

CPU (1 core of i5-4440 @ 3.1Ghz, using SR2Sieve) - 6,000,000 p/sec

Latest update as of 29/06/24:
Nvidia GeForce RTX 2080 Ti (11750 GFLOPS, 11GB RAM, 5632kb L2 Cache, 250W) - 125,000,000 p/sec (with params -b 5 -m 4 -s 256 -Q 18) using the Linux executable
This appears to be GPU performance limited (CPU usage ~35%), and runs about 2-3 times quicker than srsieve2cl on the same GPU.

Nvidia A100 80GB PCIe (19500 GFLOPS, 80GB RAM HBM2, 40960kb L2 Cache, 250W) - 360,000,000 p/sec (with params -b 5 -m 8 -s 192 -Q 18) using Linux executable v0.24.6
Changing the argument 's' makes the code vary between CPU and GPU bound, runs 2-3 times quicker than srsieve2cl on the same GPU.

Old speed info for other GPUs:
Nvidia GeForce GTX 1060 (3855 GFLOPS, 6GB RAM, 1536kb L2 cache, 120 Watts) - ~23,000,000 p/sec @90% TDP ~108W (wth params -b 7 -m 5 -s 256 -Q 18)
Nvidia MX150 (1177 GFLOPS, 2GB RAM, 512kb L2 cache, 25 Watts) - 6,500,000 p/sec (with params -b 5 -m 5)
Nvidia GeForce RTX 2070 Max-Q (5460 GFLOPS, 8GB RAM, 4096kb L2 Cache, 80 Watts) - 54,800,000 p/sec (with params -b 9 -m 4)
Nvidia GeForce RTX 2080 (8920 GFLOPS, 8GB RAM, 4096kb L2 Cache, 215W) - 80,000,000 p/sec (with params -b 7 -m 4 -s 256 -Q 18)
Nvidia GeForce RTX 2080 Ti (11750 GFLOPS, 11GB RAM, 5632kb L2 Cache, 250W) - 123,000,000 p/sec (with params -b 8 -m 4 -s 256 -Q 18)

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
CUDA Riesel Sieve		CUDA Riesel Sieve
.gitattributes		.gitattributes
.gitignore		.gitignore
Activity1.nvact		Activity1.nvact
Activity2.nvact		Activity2.nvact
CUDA Riesel Sieve.sln		CUDA Riesel Sieve.sln
CUDARieselSieve_v0.21.5_win64.exe		CUDARieselSieve_v0.21.5_win64.exe
CUDARieselSieve_v0.23.6_linux_CUDA12		CUDARieselSieve_v0.23.6_linux_CUDA12
CUDARieselSieve_v0.24.6_linux_CUDA12		CUDARieselSieve_v0.24.6_linux_CUDA12
LICENSE		LICENSE
README.md		README.md
libcudasieve.a		libcudasieve.a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

rob147147/CUDA-Riesel-Sieve

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages