Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] allow one thread for vega gpus #624

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

psychocrypt
Copy link
Collaborator

@psychocrypt psychocrypt commented Dec 19, 2017

  • split the scratchpad buffer into two buffers
    This pull request will be allow to use more than 2k threads within one gpu thread. With this PR there should be no need to spawn to cpu threads per vega gpu.

Note: This is only a hacked version, please do not look at the implementation.

HowTo

  • remove (rename) your old amd.txt config
  • start the miner
  • stop miner after amd.txt is created
  • increase the intensity up to the point where the miner is crashing
    • a good starting point could be the intensity of thread 1 + intensity of thread 2 of the old amd.txt config

@JerichoJones @davidpesce Could you please report the hash rate e.g. with intensity 3864 or higher.

@Njeroe
Copy link

Njeroe commented Dec 21, 2017

With the PR637 I got some results, but with this PR624 instanly crashes the windows with the generated 1536 intensity, 2x Vega64 here is the log

[2017-12-21 08:42:11] : WARNING: backend NVIDIA disabled.
[2017-12-21 08:42:11] : Found AMD platform index id = 0, name = Advanced Micro Devices, Inc.
[2017-12-21 08:42:11] : Found OpenCL GPU .
[2017-12-21 08:42:11] : Found OpenCL GPU .
[2017-12-21 08:42:11] : AMD: GPU configuration stored in file 'amd.txt'
[2017-12-21 08:42:11] : Compiling code and initializing GPUs. This will take a while...
[2017-12-21 08:42:11] : Device 0 work size 8 / 32.
[2017-12-21 08:42:17] : Device 1 work size 8 / 32.
[2017-12-21 08:42:24] : Starting AMD GPU thread 0, no affinity.
[2017-12-21 08:42:24] : Starting AMD GPU thread 1, no affinity.
[2017-12-21 08:42:24] : Starting 1x thread, affinity: 0.

I've tried it on 3816 intensity, windows crashes too

[2017-12-21 08:58:23] : WARNING: backend NVIDIA disabled.
[2017-12-21 08:58:23] : Compiling code and initializing GPUs. This will take a while...
[2017-12-21 08:58:26] : Device 0 work size 8 / 32.

on 1008 intensity I got some result, but above it crashes

[2017-12-21 09:28:16] : WARNING: backend NVIDIA disabled.
[2017-12-21 09:28:16] : Compiling code and initializing GPUs. This will take a while...
[2017-12-21 09:28:17] : Device 0 work size 8 / 32.
[2017-12-21 09:28:23] : Device 1 work size 8 / 32.
[2017-12-21 09:28:29] : Starting AMD GPU thread 0, no affinity.
[2017-12-21 09:28:29] : Starting AMD GPU thread 1, no affinity.
[2017-12-21 09:28:29] : Fast-connecting to pool.supportxmr.com:7777 pool ...
[2017-12-21 09:28:30] : Pool pool.supportxmr.com:7777 connected. Logging in...
[2017-12-21 09:28:30] : Difficulty changed. Now: 25000.
[2017-12-21 09:28:30] : Pool logged in.
[2017-12-21 09:28:31] : Result accepted by the pool.
HASHRATE REPORT - AMD
| ID |    10s |    60s |    15m | ID |    10s |    60s |    15m |
|  0 |  937.8 |   (na) |   (na) |  1 |  948.3 |   (na) |   (na) |

@psychocrypt
Copy link
Collaborator Author

@Njeroe big thanks for you test. It could be that windows is killing the miner if one thread is using the gpu for a to long time. Never the less your feedback is very helpful

@JoKeRz42o
Copy link

@psychocrypt I know this was already tested but I also gave it a try and I can confirm the results... After compiling and running with the auto generated amd.txt file (with intensity at 1562) Windows 10 seems to lock up and crash... BSOD ends with different reasons for the crash each time.

Mining Rig on a Xeon E2650 v2 Server with 5x Vega 56(bios modded to 64) and 3x Vega 64 and optimized Power Play table.

@psychocrypt
Copy link
Collaborator Author

psychocrypt commented Dec 28, 2017 via email

@taisel
Copy link
Contributor

taisel commented Jan 8, 2018

Using those 8GBs in one alloc might necessitate setting the 64 bit device flag.

"set GPU_FORCE_64BIT_PTR=1"

@uentity
Copy link

uentity commented Jan 19, 2018

Testes this PR with one RX 580 and two RX 480 GPUs in Linux with AMD drivers 17.50 + ROCm kernel 1.6.

Result without any modification of amd.txt (I use intensity: 2016 and worksize: 8): miner shows unrealistic very big hashrate (for ex. RX 480 shows >1200 H/s with ~950 w/o this PR) and it isn't "real", because mining pool drops diff as if performance were at least two times lower than normal (though I didn't test it for a long period of time to collect statistics of real hashrate).
Also system becomes very unresponsive, UI is lagging heavily.

@psychocrypt
Copy link
Collaborator Author

psychocrypt commented Jan 19, 2018 via email

@Nuke33
Copy link

Nuke33 commented Feb 3, 2018

I can get as high as intensity: 4016 on a Vega FE but only if I disable strided_index. With it enabled I get instant BSOD

@psychocrypt
Copy link
Collaborator Author

psychocrypt commented Feb 3, 2018 via email

@robertarnesson
Copy link

What would be the benefit of running 1 thread instead of 2?

@uentity
Copy link

uentity commented Feb 5, 2018

The benefit could have been to optimize just one thread intensity instead of two.
Even if we assume that one thread has no benefit then I would prefer to set up this single thread and it's intensity rather than balancing with values for two threads.

But in this context we need to know what kind of advantages multiple-threads setup gives us.

@Nuke33
Copy link

Nuke33 commented Feb 5, 2018

Even with 2 threads it is sometimes beneficial to be able to set an intensity higher than 2024.

@robertarnesson
Copy link

Instant crash for me, regardless of intensity or strided_index. Testing on 6x vega 56.
I'm thinking running on 1 thread could help with stability, but its just a hunch

This pull request will be allow to use more than 2k threads within one gpu thread.
With this PR there should be no need to spawn to cpu threads per vega gpu.
@psychocrypt
Copy link
Collaborator Author

psychocrypt commented Feb 12, 2018

I fixed the bugs and tested this PR on my RX570 (~2% more hashes). I got a small hash rate increase.

Download: https://github.com/psychocrypt/xmr-stak/archive/topic-vegaOneThread.zip

@Njeroe @JoKeRz42o @taisel @uentity @Nuke33 could you please test if this single thread per GPu has an advantage over two threads. Please try to increase the intensity up to the max allowed by you device.

@JerichoJones
Copy link

JerichoJones commented Feb 12, 2018

vega64

image

Any higher settings caused hangs and blue screens with the PR.

@psychocrypt
Copy link
Collaborator Author

psychocrypt commented Feb 13, 2018 via email

@JerichoJones
Copy link

image

@uentity
Copy link

uentity commented Feb 14, 2018

This PR definitely worth merging.

For RX 480 I found the sweet spot at intensity = 2304 that raised hashrate by ~30 H/s. On this GPU I get following results.
Upstream: ~940 H/s, intensity = 2016.
Upstream + extra intensity PR (discontinued): ~959 H/s, intensity = 2016, extra_intensity = 124.
This PR: ~970 H/s, intensity = 2304.

Actually my Vega 64 also successfully mines with this PR, but setup is far from optimal and thus absolute hashrate value is somewhat meaningless (I'm getting ~1208 H/s at best). This is due to my CPU (AMD FX) doesn't support PCIe atomics required for AMD ROCm OpenCL stack (and ROCm is is the only option (for a while) that provides decent OpenCL support for Vega on Linux).
I discovered a great project - mesa3d-comp-bridge - that allows to start xmr-stak on Vega using Mesa's Clover OpenCL driver. But it adds an extra translation layer and slows down mining performance.
That's why I have to wait until late March when AMD promised to release AMDGPU-PRO 18.10 that hopefully will enable Vega to run with their "legacy" OpenCL driver (or emulate atomics for 'ROCm`, not sure about details). Or upgrade the CPU :-)

Vega 64 TL;DR: best performance is ~1208 H/s with intensity = 3840. Splitting into two threads reduces hashrate.

@Nuke33
Copy link

Nuke33 commented Feb 19, 2018

I can confirm that the last PR is increasing hashrates by around 5% on Vega GPUs.
Using a single thread still results in less hashes though.
With 2.2.0 release I could set max. 2 threads with 1972 intensity, resulting in ~2045 h/s on a Vega64.
With this last PR I can set the intensity on 2 threads to 2012, resulting in ~2105 h/s.
Single thread with intensity 4048 nets only ~2000 h/s.

Interestingly I noticed that you can now set much higher intensities if HBCC can utilize more RAM. For example I could only set the Vega64 to intensity 1982 for 2 threads with 8GB RAM, but with 32GB RAM and maxed out HBCC slider it was possible to set intensity to 2012.

@GabrielKesler
Copy link

GabrielKesler commented Mar 4, 2018

I built this PR, but I am unable to get it to work properly, with one Vega 64 Liquid. On one thread with intensity 3864 the miner hangs.
Setting to 2 threads, 1932 intensity each, the miner can not get more than 400 hashes per second.

Let me know if you wish me to retest at any point.

@aicastell
Copy link

aicastell commented Mar 17, 2018

Here one user that just has tested xmr-stak with that PR 624 on the dev branch. Two threads with intensity=500 each one, this is my hashrate report with a single GPU (AMD ATI HD 6990):

HASHRATE REPORT - CPU
| ID |    10s |    60s |    15m | ID |    10s |    60s |    15m |
|  0 |   41.6 |   (na) |   (na) |  1 |   41.6 |   (na) |   (na) |
|  2 |   41.6 |   (na) |   (na) |  3 |   41.8 |   (na) |   (na) |
Totals (CPU):   166.5    0.0    0.0 H/s
-----------------------------------------------------------------
HASHRATE REPORT - AMD
| ID |    10s |    60s |    15m | ID |    10s |    60s |    15m |
|  0 |  240.3 |   (na) |   (na) |  1 |  239.8 |   (na) |   (na) |
Totals (AMD):   480.1    0.0    0.0 H/s
-----------------------------------------------------------------
Totals (ALL):    646.6    0.0    0.0 H/s
Highest:   644.7 H/s
-----------------------------------------------------------------

Is this the maximum expected?

@psychocrypt
Copy link
Collaborator Author

psychocrypt commented Mar 17, 2018 via email

@uentity
Copy link

uentity commented Apr 19, 2018

@psychocrypt when we will finally see this PR merged? :-)

@psychocrypt
Copy link
Collaborator Author

psychocrypt commented Apr 19, 2018 via email

@uentity
Copy link

uentity commented Apr 20, 2018

Hmm.. It was effective on my setup (see my comment above from 14 Feb). But I didn't track recent development and just discovered the new strided_index option which results in about the same (but slightly less) hashrate increase.

I think I should wait for upcoming AMDGPU-PRO 18.10 release and check if it finally would allow me to finally run two threads effectively on Vega64 (as everybody do).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants