Skip to content


full Zcash miner, multi-GPU, Stratum
Browse files Browse the repository at this point in the history
  • Loading branch information
mbevand committed Nov 4, 2016
1 parent a69c562 commit 79aecc2
Show file tree
Hide file tree
Showing 8 changed files with 1,236 additions and 97 deletions.
11 changes: 8 additions & 3 deletions
@@ -1,8 +1,13 @@
# Current
# Version 3 (04 Nov 2016)

* SILENTARMY is now a full miner, not just a solver; the solver binary was
renamed "sa-solver" and the miner is the script "silentarmy"
* Multi-GPU support
* Stratum support for pool mining
* Reduce GPU memory usage to 671 MB (NR_ROWS_LOG=19) or 1208 MB
(NR_ROWS_LOG=20, default, ~10% faster than 19) per SILENTARMY instance
* Add support for multiple OpenCL platforms: --list-gpu now scans all available
(NR_ROWS_LOG=20, default, ~10% faster than 19) per Equihash instance
* Rename --list-gpu to --list and list all OpenCL devices (not just GPUs)
* Add support for multiple OpenCL platforms: --list now scans all available
platforms, numbering devices using globally unique IDs.
* Improve correctness: find ~0.09% more solutions

Expand Down
29 changes: 15 additions & 14 deletions Makefile
@@ -1,36 +1,37 @@
# Change this path if the SDK was installed in a non-standard location
AMDAPPSDK = "/opt/AMDAPPSDK-3.0/include"
# Change this path if your library is located elsewhere
LIBOPENCL = "/opt/amdgpu-pro/lib/x86_64-linux-gnu/"
OPENCL_HEADERS = "/opt/AMDAPPSDK-3.0/include"
# By default is searched in default system locations, this path
# lets you adds one more directory to the search path.
LIBOPENCL = "/opt/amdgpu-pro/lib/x86_64-linux-gnu"

CC = gcc
CPPFLAGS = -std=gnu99 -pedantic -Wextra -Wall -ggdb \
-Wno-deprecated-declarations \
-Wno-overlength-strings \
LDFLAGS = -rdynamic
LDFLAGS = -rdynamic -L${LIBOPENCL}
LDLIBS = -lOpenCL -lsodium
OBJ = main.o blake.o
INCLUDES = blake.h param.h _kernel.h

all : silentarmy
all : sa-solver

silentarmy : ${OBJ}
${CC} -o silentarmy ${OBJ} ${LDFLAGS} ${LDLIBS}
sa-solver : ${OBJ}
${CC} -o sa-solver ${OBJ} ${LDFLAGS} ${LDLIBS}


_kernel.h: param.h
_kernel.h : param.h
echo 'const char *ocl_code = R"_mrb_(' >$@
cpp $< >>$@
echo ')_mrb_";' >>$@

test: silentarmy
./silentarmy --nonces 100 -v -v 2>&1 | grep Soln: | \
diff -u sols-100 - | cut -c 1-75
test : sa-solver
./sa-solver --nonces 100 -v -v 2>&1 | grep Soln: | \
diff -u testing/sols-100 - | cut -c 1-75

clean :
rm -f silentarmy _kernel.h *.o _temp_*
rm -f sa-solver _kernel.h *.o _temp_*

re : clean all

Expand Down
131 changes: 97 additions & 34 deletions
@@ -1,50 +1,101 @@

SILENTARMY is an OpenCL GPU Zcash Equihash solver. It runs best on AMD GPUs
and implements the CLI API described in the
SILENTARMY is a [Zcash]( miner for Linux written in OpenCL with
multi-GPU support. The
[Stratum]( protocol is implemented for connecting to mining pools. It runs
best on AMD GPUs but has also been reported to work on other OpenCL devices such
as Xeon Phi, Intel GPUs, and through OpenCL CPU drivers. (Nvidia GPUs are not
currently supported due to an

After compiling SILENTARMY, list the available OpenCL devices:

`$ silentarmy --list`

Start mining with two GPUs (ID 2 and ID 5) on a pool:

`$ silentarmy --use 2,5 -c stratum+tcp:// -u t1cVviFvgJinQ4w3C2m2CfRxgP5DnHYaoFC`


$ silentarmy --help
Usage: silentarmy [options]
-h, --help show this help message and exit
-v, --verbose verbose mode (may be repeated for more verbosity)
--debug enable debug mode (for developers only)
--list list available OpenCL devices by ID (GPUs...)
--use=LIST use specified GPU device IDs to mine, for example to
use the first three: 0,1,2 (default: 0)
--instances=N run N instances of Equihash per GPU (default: 2)
-c POOL, --connect=POOL
connect to POOL, for example:
-u USER, --user=USER username for connecting to the pool
-p PWD, --pwd=PWD password for connecting to the pool

# Equihash solver

SILENTARMY also provides a command line Equihash solver (`sa-solver`)
implementing the CLI API described in the
[Zcash open source miner challenge](

To solve a specific block header and print the encoded solution on stdout, run
the following command (this header is from
[testnet block #2680](
and should result in 3 solutions):
[mainnet block #3400](
and should result in 1 Equihash solution):

$ silentarmy -i 0400000052a6a17bb3cf95c62ec140d22f4fe96cfbc192ff288251282174481312040000b9711b4850b4b89598e16103148a8a368f74e472fa919ac7d0dbb57b1090f6c80000000000000000000000000000000000000000000000000000000000000000667211581e1b071e4302000000000000020000000000000000000000000000000000000000000000
`$ sa-solver -i 04000000e54c27544050668f272ec3b460e1cde745c6b21239a81dae637fde4704000000844bc0c55696ef9920eeda11c1eb41b0c2e7324b46cc2e7aa0c2aa7736448d7a000000000000000000000000000000000000000000000000000000000000000068241a587e7e061d250e000000000000010000000000000000000000000000000000000000000000`

If the option `-i` is not specified, SILENTARMY solves a 140-byte header of all
zero bytes. The option `--nonces <nr>` instructs SILENTARMY to try multiple
If the option `-i` is not specified, `sa-solver` solves a 140-byte header of all
zero bytes. The option `--nonces <nr>` instructs the program to try multiple
nonces, each time incrementing the nonce by 1. So a convenient way to run a
benchmark is simply:
quick test/benchmark is simply:

$ silentarmy --nonces 100
`$ sa-solver --nonces 100`

Note: due to BLAKE2b optimizations in my implementation, if the header is
specified it must be 140 bytes and its last 12 bytes **must** be zero. For
convenience, `-i` can also specify a 108-byte nonceless header to which
SILENTARMY adds an implicit nonce of 32 zero bytes.
`sa-solver` adds an implicit nonce of 32 zero bytes.

Use the verbose (`-v`) and very verbose (`-v -v`) options to show the solutions
and statistics in progressively more and more details.

# Performance

* 45.7 Sol/s with one R9 Nano
* 39.6 Sol/s with one RX 480 8GB

Note: run 2 instances of SILENTARMY in parallel (eg. in 2 terminal consoles)
on the same GPU to reach these performance numbers. The code is currently very
poorly optimized; it makes zero attempts to keep the queue of OpenCL commands
full, therefore 2 instances are needed to keep the GPU fully utilized.
* 47.5 Sol/s with one R9 Nano
* 41.0 Sol/s with one RX 480 8GB

Note: the `silentarmy` **miner** automatically achieves this performance level,
however the `sa-solver` **command-line solver** by design runs only 1 instance
of the Equihash proof-of-work algorithm causing it to underperform. One must
manually run 2 instances of `sa-solver` (eg. in 2 terminal consoles) to
achieve the same performance level as the `silentarmy` **miner**.

Troubleshooting performance issues:
* By default SILENTARMY mines with only one device/GPU; make sure to specify
all the GPUs in the `--use` option, for example `silentarmy --use 0,1,2`
if the host has three devices with IDs 0, 1, and 2.
* If some GPUs have less than ~2.4 GB of GPU memory, run
`silentarmy --instances 1 --use ...` (2 instances use ~2.4 GB of GPU memory,
1 instance uses ~1.2 GB of GPU memory.)
* If 1 instance still requires too much memory, edit `param.h` and set
`NR_ROWS_LOG` to `19` (this reduces the per-instance memory usage to ~670 MB)
and run with `--instances 1`.

# Dependencies

SILENTARMY has primarily been tested with AMD GPUs on 64-bit Linux with the
**AMDGPU-PRO** driver (amdgpu.ko) or the **Radeon Software Crimson Edition**
driver (fglrx.ko). Its only build dependency is the OpenCL C headers from the
SILENTARMY has primarily been tested with AMD GPUs on 64-bit Linux with either
the **AMDGPU-PRO** driver (amdgpu.ko, for newer GPUs) or the **Radeon Software
Crimson Edition** driver (fglrx.ko, for older GPUs). Its only build
dependencies are an OpenCL implementation and the libsodium library.

Installation of the drivers and SDK can be error-prone, so below are
step-by-step instructions for Ubuntu 16.04 as well as Ubuntu 14.04.
step-by-step instructions for the AMD OpenCL implementation (**AMD APP SDK**),
for Ubuntu 16.04 as well as Ubuntu 14.04.

## Ubuntu 16.04

Expand All @@ -71,8 +122,8 @@ step-by-step instructions for Ubuntu 16.04 as well as Ubuntu 14.04.
8. Install system-wide by running as root (accept all the default options):
`$ sudo ./`

9. Install a compiler tools which you will need to compile SILENTARMY:
`$ sudo apt-get install build-essential`
9. Install compiler dependencies which you will need to compile SILENTARMY:
`$ sudo apt-get install build-essential libsodium-dev`

## Ubuntu 14.04

Expand All @@ -88,25 +139,35 @@ Compiling SILENTARMY is easy:

`$ make`

You may need to edit the `Makefile` and change the path
`/opt/AMDAPPSDK-3.0/include` if **AMD APP SDK** was installed in a non-default
location. Also if you are not using the **AMDGPU-PRO Driver** you may need
to edit the `Makefile` and change the path to ``.
You may need to specify the paths to the locations of your OpenCL C headers
and if the Makefile does not find them:

`$ make OPENCL_HEADERS=/path/here LIBOPENCL=/path/there`

Self-testing (solves 100 all-zero 140-byte blocks with their nonces varying
from 0 to 99):
Self-testing the command-line solver (solves 100 all-zero 140-byte blocks with
their nonces varying from 0 to 99):

`$ make test`

For more testing run `silentarmy --nonces 10000`. It should finds 18681
For more testing run `sa-solver --nonces 10000`. It should finds 18681
solutions which is less than 1% off the theoretical expected average number of
solutions of 1.88 per Equihash run at (n,k)=(200,9).

For installing, just copy `silentarmy` wherever.
For installing, just copy `silentarmy` and `sa-solver` to the same directory.

# Implementation details

SILENTARMY uses two hash tables to avoid having to sort the (Xi,i) pairs:
The `silentarmy` Python script is actually mostly a lighteight Stratum
implementation and job dispatcher that sends Equihash work items to 1 or more
instances of `sa-solver --mining` which initializes the solver in a special
"mining mode" so it can be controled via stdin/stdout. By default 2 instances
of `sa-solver` are launched for each GPU (this can be changed with the
`silentarmy --instances N` option.) 2 instances per GPU usually results in the
best performance.

The `sa-solver` binary invokes the OpenCL kernel which contains the core of the
Equihash algorithm. My implementation uses two hash tables to avoid having to
sort the (Xi,i) pairs:

* Round 0 (BLAKE2b) fills up table #0
* Round 1 reads table #0, identifies collisions, XORs the Xi's, stores
Expand Down Expand Up @@ -157,6 +218,8 @@ supports Equihash parameters 200,9.

Marc Bevand -- [](

Donations welcome: t1cVviFvgJinQ4w3C2m2CfRxgP5DnHYaoFC

# License

The MIT License (MIT)
Expand Down

0 comments on commit 79aecc2

Please sign in to comment.