[WIP] allow one thread for vega gpus #624

psychocrypt · 2017-12-19T22:04:08Z

split the scratchpad buffer into two buffers
This pull request will be allow to use more than 2k threads within one gpu thread. With this PR there should be no need to spawn to cpu threads per vega gpu.

Note: This is only a hacked version, please do not look at the implementation.

HowTo

remove (rename) your old amd.txt config
start the miner
stop miner after amd.txt is created
increase the intensity up to the point where the miner is crashing
- a good starting point could be the intensity of thread 1 + intensity of thread 2 of the old amd.txt config

@JerichoJones @davidpesce Could you please report the hash rate e.g. with intensity 3864 or higher.

Njeroe · 2017-12-21T08:48:21Z

With the PR637 I got some results, but with this PR624 instanly crashes the windows with the generated 1536 intensity, 2x Vega64 here is the log

[2017-12-21 08:42:11] : WARNING: backend NVIDIA disabled.
[2017-12-21 08:42:11] : Found AMD platform index id = 0, name = Advanced Micro Devices, Inc.
[2017-12-21 08:42:11] : Found OpenCL GPU .
[2017-12-21 08:42:11] : Found OpenCL GPU .
[2017-12-21 08:42:11] : AMD: GPU configuration stored in file 'amd.txt'
[2017-12-21 08:42:11] : Compiling code and initializing GPUs. This will take a while...
[2017-12-21 08:42:11] : Device 0 work size 8 / 32.
[2017-12-21 08:42:17] : Device 1 work size 8 / 32.
[2017-12-21 08:42:24] : Starting AMD GPU thread 0, no affinity.
[2017-12-21 08:42:24] : Starting AMD GPU thread 1, no affinity.
[2017-12-21 08:42:24] : Starting 1x thread, affinity: 0.

I've tried it on 3816 intensity, windows crashes too

[2017-12-21 08:58:23] : WARNING: backend NVIDIA disabled.
[2017-12-21 08:58:23] : Compiling code and initializing GPUs. This will take a while...
[2017-12-21 08:58:26] : Device 0 work size 8 / 32.

on 1008 intensity I got some result, but above it crashes

[2017-12-21 09:28:16] : WARNING: backend NVIDIA disabled.
[2017-12-21 09:28:16] : Compiling code and initializing GPUs. This will take a while...
[2017-12-21 09:28:17] : Device 0 work size 8 / 32.
[2017-12-21 09:28:23] : Device 1 work size 8 / 32.
[2017-12-21 09:28:29] : Starting AMD GPU thread 0, no affinity.
[2017-12-21 09:28:29] : Starting AMD GPU thread 1, no affinity.
[2017-12-21 09:28:29] : Fast-connecting to pool.supportxmr.com:7777 pool ...
[2017-12-21 09:28:30] : Pool pool.supportxmr.com:7777 connected. Logging in...
[2017-12-21 09:28:30] : Difficulty changed. Now: 25000.
[2017-12-21 09:28:30] : Pool logged in.
[2017-12-21 09:28:31] : Result accepted by the pool.
HASHRATE REPORT - AMD
| ID |    10s |    60s |    15m | ID |    10s |    60s |    15m |
|  0 |  937.8 |   (na) |   (na) |  1 |  948.3 |   (na) |   (na) |

psychocrypt · 2017-12-21T10:19:23Z

@Njeroe big thanks for you test. It could be that windows is killing the miner if one thread is using the gpu for a to long time. Never the less your feedback is very helpful

JoKeRz42o · 2017-12-28T11:17:03Z

@psychocrypt I know this was already tested but I also gave it a try and I can confirm the results... After compiling and running with the auto generated amd.txt file (with intensity at 1562) Windows 10 seems to lock up and crash... BSOD ends with different reasons for the crash each time.

Mining Rig on a Xeon E2650 v2 Server with 5x Vega 56(bios modded to 64) and 3x Vega 64 and optimized Power Play table.

psychocrypt · 2017-12-28T11:36:46Z

Thanks for the feedback I will work on this next year. I tested it on nvidia and it works but I have currently no amd card.

taisel · 2018-01-08T06:15:31Z

Using those 8GBs in one alloc might necessitate setting the 64 bit device flag.

"set GPU_FORCE_64BIT_PTR=1"

uentity · 2018-01-19T05:45:24Z

Testes this PR with one RX 580 and two RX 480 GPUs in Linux with AMD drivers 17.50 + ROCm kernel 1.6.

Result without any modification of amd.txt (I use intensity: 2016 and worksize: 8): miner shows unrealistic very big hashrate (for ex. RX 480 shows >1200 H/s with ~950 w/o this PR) and it isn't "real", because mining pool drops diff as if performance were at least two times lower than normal (though I didn't test it for a long period of time to collect statistics of real hashrate).
Also system becomes very unresponsive, UI is lagging heavily.

psychocrypt · 2018-01-19T06:36:14Z

There are still bugs in this PR. I need to fix this first. I am currently setup my linux system that I can tested by my own too

Nuke33 · 2018-02-03T14:03:40Z

I can get as high as intensity: 4016 on a Vega FE but only if I disable strided_index. With it enabled I get instant BSOD

psychocrypt · 2018-02-03T14:19:04Z

There is an bug in my implementation therefore the miner crash. I think I will close this PR because I think there will be no benefit from this PR. The two independent threads per gpu will have more advantages.

robertarnesson · 2018-02-05T11:11:36Z

What would be the benefit of running 1 thread instead of 2?

uentity · 2018-02-05T12:37:26Z

The benefit could have been to optimize just one thread intensity instead of two.
Even if we assume that one thread has no benefit then I would prefer to set up this single thread and it's intensity rather than balancing with values for two threads.

But in this context we need to know what kind of advantages multiple-threads setup gives us.

Nuke33 · 2018-02-05T12:53:13Z

Even with 2 threads it is sometimes beneficial to be able to set an intensity higher than 2024.

robertarnesson · 2018-02-08T13:34:27Z

Instant crash for me, regardless of intensity or strided_index. Testing on 6x vega 56.
I'm thinking running on 1 thread could help with stability, but its just a hunch

This pull request will be allow to use more than 2k threads within one gpu thread. With this PR there should be no need to spawn to cpu threads per vega gpu.

psychocrypt · 2018-02-12T21:50:50Z

I fixed the bugs and tested this PR on my RX570 (~2% more hashes). I got a small hash rate increase.

Download: https://github.com/psychocrypt/xmr-stak/archive/topic-vegaOneThread.zip

@Njeroe @JoKeRz42o @taisel @uentity @Nuke33 could you please test if this single thread per GPu has an advantage over two threads. Please try to increase the intensity up to the max allowed by you device.

JerichoJones · 2018-02-12T23:29:57Z

vega64

Any higher settings caused hangs and blue screens with the PR.

psychocrypt · 2018-02-13T05:28:03Z

Thx could you please also test the hash rate of the new pr with the old cfg (2x1932)

JerichoJones · 2018-02-13T12:34:14Z

uentity · 2018-02-14T04:52:50Z

This PR definitely worth merging.

For RX 480 I found the sweet spot at intensity = 2304 that raised hashrate by ~30 H/s. On this GPU I get following results.
Upstream: ~940 H/s, intensity = 2016.
Upstream + extra intensity PR (discontinued): ~959 H/s, intensity = 2016, extra_intensity = 124.
This PR: ~970 H/s, intensity = 2304.

Actually my Vega 64 also successfully mines with this PR, but setup is far from optimal and thus absolute hashrate value is somewhat meaningless (I'm getting ~1208 H/s at best). This is due to my CPU (AMD FX) doesn't support PCIe atomics required for AMD ROCm OpenCL stack (and ROCm is is the only option (for a while) that provides decent OpenCL support for Vega on Linux).
I discovered a great project - mesa3d-comp-bridge - that allows to start xmr-stak on Vega using Mesa's Clover OpenCL driver. But it adds an extra translation layer and slows down mining performance.
That's why I have to wait until late March when AMD promised to release AMDGPU-PRO 18.10 that hopefully will enable Vega to run with their "legacy" OpenCL driver (or emulate atomics for 'ROCm`, not sure about details). Or upgrade the CPU :-)

Vega 64 TL;DR: best performance is ~1208 H/s with intensity = 3840. Splitting into two threads reduces hashrate.

Nuke33 · 2018-02-19T12:23:41Z

I can confirm that the last PR is increasing hashrates by around 5% on Vega GPUs.
Using a single thread still results in less hashes though.
With 2.2.0 release I could set max. 2 threads with 1972 intensity, resulting in ~2045 h/s on a Vega64.
With this last PR I can set the intensity on 2 threads to 2012, resulting in ~2105 h/s.
Single thread with intensity 4048 nets only ~2000 h/s.

Interestingly I noticed that you can now set much higher intensities if HBCC can utilize more RAM. For example I could only set the Vega64 to intensity 1982 for 2 threads with 8GB RAM, but with 32GB RAM and maxed out HBCC slider it was possible to set intensity to 2012.

GabrielKesler · 2018-03-04T14:01:11Z

I built this PR, but I am unable to get it to work properly, with one Vega 64 Liquid. On one thread with intensity 3864 the miner hangs.
Setting to 2 threads, 1932 intensity each, the miner can not get more than 400 hashes per second.

Let me know if you wish me to retest at any point.

aicastell · 2018-03-17T15:02:29Z

Here one user that just has tested xmr-stak with that PR 624 on the dev branch. Two threads with intensity=500 each one, this is my hashrate report with a single GPU (AMD ATI HD 6990):

HASHRATE REPORT - CPU
| ID |    10s |    60s |    15m | ID |    10s |    60s |    15m |
|  0 |   41.6 |   (na) |   (na) |  1 |   41.6 |   (na) |   (na) |
|  2 |   41.6 |   (na) |   (na) |  3 |   41.8 |   (na) |   (na) |
Totals (CPU):   166.5    0.0    0.0 H/s
-----------------------------------------------------------------
HASHRATE REPORT - AMD
| ID |    10s |    60s |    15m | ID |    10s |    60s |    15m |
|  0 |  240.3 |   (na) |   (na) |  1 |  239.8 |   (na) |   (na) |
Totals (AMD):   480.1    0.0    0.0 H/s
-----------------------------------------------------------------
Totals (ALL):    646.6    0.0    0.0 H/s
Highest:   644.7 H/s
-----------------------------------------------------------------

Is this the maximum expected?

psychocrypt · 2018-03-17T16:42:06Z

for 6990 please check this #472 thread. You need to use the correct driver.

uentity · 2018-04-19T18:13:06Z

@psychocrypt when we will finally see this PR merged? :-)

psychocrypt · 2018-04-19T19:04:07Z

I think never. It shows no effect and is contra productive to the strided index option. Do you need it.?

uentity · 2018-04-20T05:59:43Z

Hmm.. It was effective on my setup (see my comment above from 14 Feb). But I didn't track recent development and just discovered the new strided_index option which results in about the same (but slightly less) hashrate increase.

I think I should wait for upcoming AMDGPU-PRO 18.10 release and check if it finally would allow me to finally run two threads effectively on Vega64 (as everybody do).

psychocrypt added backend amd enhancement labels Dec 19, 2017

psychocrypt mentioned this pull request Dec 21, 2017

[WIP] unify local memory #637

Closed

split the scratchpad buffer into two buffer

675134c

This pull request will be allow to use more than 2k threads within one gpu thread. With this PR there should be no need to spawn to cpu threads per vega gpu.

psychocrypt force-pushed the topic-vegaOneThread branch from f0979ec to 675134c Compare February 12, 2018 21:48

uentity mentioned this pull request Feb 14, 2018

How to fix xmr-stak? matszpk/mesa3d-comp-bridge#1

Closed

psychocrypt mentioned this pull request Mar 12, 2018

Miner not using AMD 6990 GPU #472

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] allow one thread for vega gpus #624

[WIP] allow one thread for vega gpus #624

psychocrypt commented Dec 19, 2017 •

edited

Loading

Njeroe commented Dec 21, 2017 •

edited by psychocrypt

Loading

psychocrypt commented Dec 21, 2017

JoKeRz42o commented Dec 28, 2017

psychocrypt commented Dec 28, 2017 via email

taisel commented Jan 8, 2018

uentity commented Jan 19, 2018

psychocrypt commented Jan 19, 2018 via email

Nuke33 commented Feb 3, 2018

psychocrypt commented Feb 3, 2018 via email

robertarnesson commented Feb 5, 2018

uentity commented Feb 5, 2018

Nuke33 commented Feb 5, 2018 •

edited

Loading

robertarnesson commented Feb 8, 2018

psychocrypt commented Feb 12, 2018 •

edited

Loading

JerichoJones commented Feb 12, 2018 •

edited

Loading

psychocrypt commented Feb 13, 2018 via email

JerichoJones commented Feb 13, 2018

uentity commented Feb 14, 2018 •

edited

Loading

Nuke33 commented Feb 19, 2018

GabrielKesler commented Mar 4, 2018 •

edited

Loading

aicastell commented Mar 17, 2018 •

edited

Loading

psychocrypt commented Mar 17, 2018 via email

uentity commented Apr 19, 2018 •

edited

Loading

psychocrypt commented Apr 19, 2018 via email

uentity commented Apr 20, 2018

[WIP] allow one thread for vega gpus #624

Are you sure you want to change the base?

[WIP] allow one thread for vega gpus #624

Conversation

psychocrypt commented Dec 19, 2017 • edited Loading

HowTo

Njeroe commented Dec 21, 2017 • edited by psychocrypt Loading

psychocrypt commented Dec 21, 2017

JoKeRz42o commented Dec 28, 2017

psychocrypt commented Dec 28, 2017 via email

taisel commented Jan 8, 2018

uentity commented Jan 19, 2018

psychocrypt commented Jan 19, 2018 via email

Nuke33 commented Feb 3, 2018

psychocrypt commented Feb 3, 2018 via email

robertarnesson commented Feb 5, 2018

uentity commented Feb 5, 2018

Nuke33 commented Feb 5, 2018 • edited Loading

robertarnesson commented Feb 8, 2018

psychocrypt commented Feb 12, 2018 • edited Loading

JerichoJones commented Feb 12, 2018 • edited Loading

psychocrypt commented Feb 13, 2018 via email

JerichoJones commented Feb 13, 2018

uentity commented Feb 14, 2018 • edited Loading

Nuke33 commented Feb 19, 2018

GabrielKesler commented Mar 4, 2018 • edited Loading

aicastell commented Mar 17, 2018 • edited Loading

psychocrypt commented Mar 17, 2018 via email

uentity commented Apr 19, 2018 • edited Loading

psychocrypt commented Apr 19, 2018 via email

uentity commented Apr 20, 2018

psychocrypt commented Dec 19, 2017 •

edited

Loading

Njeroe commented Dec 21, 2017 •

edited by psychocrypt

Loading

Nuke33 commented Feb 5, 2018 •

edited

Loading

psychocrypt commented Feb 12, 2018 •

edited

Loading

JerichoJones commented Feb 12, 2018 •

edited

Loading

uentity commented Feb 14, 2018 •

edited

Loading

GabrielKesler commented Mar 4, 2018 •

edited

Loading

aicastell commented Mar 17, 2018 •

edited

Loading

uentity commented Apr 19, 2018 •

edited

Loading