Skip to content

Commit

Permalink
improve perf by 5% (OPTIM_FOR_FGLRX)
Browse files Browse the repository at this point in the history
  • Loading branch information
mbevand committed Nov 5, 2016
1 parent 68e5aa8 commit a33a906
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 1 deletion.
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,8 +102,14 @@ Troubleshooting performance issues:
all the GPUs in the `--use` option, for example `silentarmy --use 0,1,2`
if the host has three devices with IDs 0, 1, and 2.
* If some GPUs have less than ~2.4 GB of GPU memory, run
`silentarmy --instances 1 --use ...` (2 instances use ~2.4 GB of GPU memory,
`silentarmy --instances 1` (2 instances use ~2.4 GB of GPU memory,
1 instance uses ~1.2 GB of GPU memory.)
* If you are using an AMD GPU with the **Radeon Software Crimson Edition**
driver, as opposed to the **AMDGPU-PRO** driver, then edit param.h and set
`OPTIM_FOR_FGLRX` to 1. This will improve performance by +5% and reduce
GPU memory usage from 1.2 GB per instance to 805 MB per instance. But do
**not** set it if you are using the AMDGPU-PRO driver or else it will
degrade performance by -15% or more.
* If 1 instance still requires too much memory, edit `param.h` and set
`NR_ROWS_LOG` to `19` (this reduces the per-instance memory usage to ~670 MB)
and run with `--instances 1`.
Expand Down
6 changes: 6 additions & 0 deletions input.cl
Original file line number Diff line number Diff line change
Expand Up @@ -532,12 +532,15 @@ void equihash_round(uint round, __global char *ht_src, __global char *ht_dst,
if (!cnt)
// no elements in row, no collisions
return ;
#if NR_ROWS_LOG != 20 || !OPTIM_FOR_FGLRX
p += xi_offset;
for (i = 0; i < cnt; i++, p += SLOT_LEN)
first_words[i] = *(__global uchar *)p;
#endif
// find collisions
for (i = 0; i < cnt; i++)
for (j = i + 1; j < cnt; j++)
#if NR_ROWS_LOG != 20 || !OPTIM_FOR_FGLRX
if ((first_words[i] & mask) ==
(first_words[j] & mask))
{
Expand All @@ -558,6 +561,9 @@ void equihash_round(uint round, __global char *ht_src, __global char *ht_dst,
{
i = collisions[n] & 0xff;
j = collisions[n] >> 8;
#else
{
#endif
a = (__global ulong *)
(ht_src + tid * NR_SLOTS * SLOT_LEN + i * SLOT_LEN + xi_offset);
b = (__global ulong *)
Expand Down
6 changes: 6 additions & 0 deletions param.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@
// but occasionally misses ~1% of solutions.
#define NR_ROWS_LOG 20

// Set this to 1 if you are using an AMD GPU with the Radeon Software Crimson
// Edition driver (fglrx.ko), see README.md.
#define OPTIM_FOR_FGLRX 0

// Make hash tables OVERHEAD times larger than necessary to store the average
// number of elements per row. The ideal value is as small as possible to
// reduce memory usage, but not too small or else elements are dropped from the
Expand All @@ -25,6 +29,8 @@
#define OVERHEAD 3
#elif NR_ROWS_LOG == 19
#define OVERHEAD 5
#elif NR_ROWS_LOG == 20 && OPTIM_FOR_FGLRX
#define OVERHEAD 6
#elif NR_ROWS_LOG == 20
#define OVERHEAD 9
#endif
Expand Down

0 comments on commit a33a906

Please sign in to comment.