Skip to content
This repository has been archived by the owner on Feb 8, 2018. It is now read-only.

Fixed DAG chunking #203

Closed
wants to merge 2 commits into from
Closed

Fixed DAG chunking #203

wants to merge 2 commits into from

Conversation

Equinox-
Copy link

Fixed support of chunking the DAG on GPUs unable to allocate a continuous block of memory large enough for the entire DAG.

The problem was in the offsets calculated in the CL kernel.
Also removed DAG duplication on chunked buffer mapping.

@ry60003333
Copy link

Does anyone have an ETA on when this would be merged in? I would love to run ethminer on a bunch of "old" cards that I have that have the required amount of memory but cannot allocate it all at once. See my comments in issue #2761 in cpp-ethereum.

@bobsummerwill
Copy link
Contributor

Hey @ry60003333,
The automated build steps will need to be green before we can do anything here.
I'm not sure offhand if the breaks currently showing are indicative with problems in our builds generally, or something specific to your changes. Please could you check? Thanks!

@LefterisJP
Copy link
Contributor

This seems to have changes in a lot of places including the openCL kernel code. Apart from having all tests green, before merging please try to test on as many different GPUs as possible. We have no automated tests for the openCL mining code so any merge is a risk.

@bobsummerwill
Copy link
Contributor

Yes, @LefterisJP. The range of "touch points" scare me too. CC @chriseth.

@ry60003333 If this code is working for you, then perhaps you are best just keeping it running as a private fork for you own benefit for the time being.

Many miners are already using the Genoil fork, rather than the "official" ethminer, so we're not even in a position where we have something in a particularly healthy state.

@Equinox-
Copy link
Author

The error is in ethereum/mix.
https://github.com/ethereum/mix/blob/c7b0854a450ef6199d6aa0a2d1e35c7c52063f57/src/MixClient.cpp#L67
This has since been patched on mix, however webthree-umbrella's submodule still points to this commit.

There actually aren't many changes in the OpenCL kernel code. The main ones are my unfolding the ternary operator used to decide which chunk to sample, and changing the offsets to use a define passed from ethash_cl_miner.
3b4c4ac#diff-1e4374038c6165d38201750ea44eae82L271

In ethash_cl_miner it adds another define with the chunk size, then allocates and uploads those chunks. The behavior on cards that can allocate a single chunk has not changed (beyond adding the extra define)

@ry60003333
Copy link

@bobsummerwill Thanks for the reply! It looks like @Equinox- is correct about ethereum/mix causing the builds to fail. I'm not sure what the process would be to get that fix so the tests would pass, but it would be a start.

I'll attempt to build from source with this code and test the resulting miner on both the R9 270s that require chunking and some R9 280X cards that can fit the DAG in one chunk to ensure that the OpenCL kernel code still works.

Sadly it seems like the Genoil fork doesn't support chunking the DAG either, so it would be nice to have this in the "official" version.

@bobsummerwill
Copy link
Contributor

I refresh the webthree-umbrella, so if you use "develop" it should have the latest and greatest everything now.

Yes - please do test away!

I would also recommend that you start a dialog with @Genoil about this DAG chunking functionality too.

I am hoping that we can upstream the Genoil changes and "heal the breach", but that may or may not be possible. At the time of writing the Genoil branch is the best miner to use, and we may or may not ever get back to an official miner which is worth people's while. I hope we will, but don't bet everything on it!

@Genoil
Copy link
Contributor

Genoil commented Mar 21, 2016

I recently removed the chunking parts because I was refactoring the kernel in an attempt to squeeze a bit of extra performance out of it. It didn't work anyway and it didn't look very pretty either. Nice fix though.

Most cards that are used for mining (including Pitcairn (78x0/270/370) work fine without chunking, by setting the right ENV vars. So I'm not really considering bringing it back.

A while ago I tried a different chunking method that only had host-side chunking and no chunks kernel-side. Unfortunately it didn't work out well on AMD hardware.

@ry60003333
Copy link

@bobsummerwill I'll give it a try from that branch! It would be nice to merge the changes back and "heal" the branch, but I agree that the need for it will likely determine if that happens.

@Genoil Thanks for explaining the situation; do you happen to know what ENV variables will work on Pitcairn MSI R9 270 cards? I'm running Ubuntu 14.04 with the fglrx-updates drivers, and unfortunately haven't had any success with ENV variables. I really do appreciate your input though!

@Genoil
Copy link
Contributor

Genoil commented Mar 22, 2016

@ry60003333 I don't own any Pitcairns, but apparently the R7 370 is now seen as one of the most efficient cards for Ethereum. With ENv vars I meant enviroment variables. Recently the list has grown to 5 of these to satisfy most modern AMD cards (export ==setx):

export GPU_FORCE_64BIT_PTR 0
export GPU_MAX_HEAP_SIZE 100
export GPU_USE_SYNC_OBJECTS 1
export GPU_MAX_ALLOC_PERCENT 100
export GPU_SINGLE_ALLOC_PERCENT=100

@ry60003333
Copy link

@Genoil Thank you for the reply! It looks like the last environment variable was the one that I was missing. I had tried all the others, but GPU_SINGLE_ALLOC_PERCENT looks like it was the one that did the trick on Ubuntu. I really appreciate the assistance!

@Genoil
Copy link
Contributor

Genoil commented Mar 23, 2016

@ry60003333 you're welcome. It's actually quite a recent requirement for some AMD cards, since the DAG has grown to about 1.4GB.

@otaku160
Copy link

Hi @ry60003333 and @Genoil
by setting GPU_SINGLE_ALLOC_PERCENT 100 i've lost 4Mh/s -_-
i dis this because i had an issue with chunk...
my GPU is an R9 380 2GB itx with Tonga chip

@chriseth
Copy link
Contributor

Could you rebase this, please? There are a lot of unrelated commits in this PR.

@Genoil
Copy link
Contributor

Genoil commented Mar 23, 2016

@otaku160 but can you mine without the setting?

@cgladue
Copy link

cgladue commented Mar 23, 2016

@Genoil i downloaded your latest miner (1.0.6) and tried this, but still failing to mine.

i tried setting all the ENV vars and still cannot allocate the DAG in a single chunk ... i am using ethminer, is there a different miner i should use ?

[0] Pitcairn
CL_DEVICE_TYPE: GPU
CL_DEVICE_GLOBAL_MEM_SIZE: 2147483648
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1408867653
CL_DEVICE_MAX_WORK_GROUP_SIZE: 256

my card is a 2GB R9 270 and it shows 1.4 GB as the max memory when doing list-devices./

@fussler
Copy link

fussler commented Mar 23, 2016

yesterday i could set use export GPU_SINGLE_ALLOC_PERCENT=100 and it worked with ethminer and stratum proxy.. Today my miner on linux mint.. was doing nothing when i woke up..
restarted and today i can't get the DAG to load it just ends up with -61 again. :(
im using a radeon 78** card

fuss@fussy ~/Downloads $ export GPU_MAX_ALLOC_PERCENT=100
fuss@fussy ~/Downloads $ ethminer --list-devices
[OPENCL]:
Listing OpenCL devices.
FORMAT: [deviceID] deviceName
[0] Pitcairn
CL_DEVICE_TYPE: GPU
CL_DEVICE_GLOBAL_MEM_SIZE: 1944059904
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1031798784
CL_DEVICE_MAX_WORK_GROUP_SIZE: 256

@Sharapoff
Copy link

Hi guys. Im yesterday connect to my rig of Asus R9 280x 3Gb one new ASUS R9 380 4Gb. After installation drivers im getting issue like this when im start eth-proxy.py file :
Traceback (most recent call last):
Failure: stratum.custom_exceptions.TransportException: SocketTransportClientFact
ory connection timed out.
After its starts show usuually normal strings,like this :
2016-03-23 15:33:47,963 INFO proxy # NEW_JOB MAIN_POOL
But its ends when im starting ethminer. Peers are disconnected and proxy not works.Before its shows that:
Creating one big buffer for the DAG
Loading single big chunk kernels
Mapping one big chunk.
DAG 15:50:56| Generating DAG file. Progress: 0 %
DAG 15:51:06| Generating DAG file. Progress: 1 %
DAG 15:51:18| Generating DAG file. Progress: 2 %
After miner shows hashrate same time (around few hours else) after falling. Im disconnected R9 380 ,removed and recollect twice DAGS. Delete all files and set it again at first from precompile version after from source and reinstall Python and all needed sources. Its not help me here. Im changed ports. And settings in my firewall(also turning off my antivir and firewall its not help ).
My specification:
OS : Windows 7 Ultimate (64 bit) fully updated on that moment.
Etherium stratum proxy version : 0.0.5
My bat file like that :
setx GPU_FORCE_64BIT_PTR 0
setx GPU_MAX_ALLOC_PERCENT 100
setx GPU_MAX_HEAP_SIZE 100
setx GPU_USE_SYNC_OBJECTS 1
ethminer.exe --farm-recheck 400 -G -F http://127.0.0.1:8080/ --cl-local-work 256 --cl-global-work 16384
Open_CL version : 1.2
GPU Kernel version; Tahiti
AMD Drivers version: 16.03
Cards now only few Asus R9 280x DC2T 3GDD5
On Windows also installed AMD Crimson 16 and "GPU Tweak 2" from official vendors sites. Guys may be anyone can help here. Thank you.

@fussler
Copy link

fussler commented Mar 23, 2016

had the same problem today too on windows.
its very fishy...

@BobDoe
Copy link

BobDoe commented Mar 23, 2016

yesterday windows 10, windows defender reported a virus it was eth proxy.exe.. Win Defender deleted the eth proxy.exe so I downloaded it again and windows defender taged and deleted the file at download.

found a copy in a zip file and checked it, it was not infected so using it now

@fussler
Copy link

fussler commented Mar 23, 2016

I only wanted to see if windows also did not work.

u use windows without the bullshit defender and wall

@Mayhemz
Copy link

Mayhemz commented Mar 23, 2016

@cgladue
guys having issues with 270x with 2gb do this
add --cl-extragpu-mem 0 when you're not connecting a display on these cards.
this fixed it for me
genoil mentioned this in https://forum.ethereum.org/discussion/2227/cuda-miner

@cgladue
Copy link

cgladue commented Mar 23, 2016

@Mayhemz

thanks for the suggestion, i tried it and it didnt help (i use resisters in a dummy plug anyways) it just appears that sapphire R9 270 cards have CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1408867653 which is just (as of last night) over the memory needed to load the full DAG.

i dont think this is going to be fixed unless there is a way to not load the complete dag in one huge file, perhaps chunk the DAG in 2 smaller files or something ? or is there a way to increase the value of CL_DEVICE_MAX_MEM_ALLOC_SIZE ?

@cgladue
Copy link

cgladue commented Mar 23, 2016

@Equinox-

i see at the top you said you:

Fixed support of chunking the DAG on GPUs unable to allocate a continuous block of memory large enough for the entire DAG.

The problem was in the offsets calculated in the CL kernel.
Also removed DAG duplication on chunked buffer mapping.

but seems like still having the issue where my card has 2GB of RAM, but can only alloc 1.4GB max at once, wasnt your fix supposed to allow me to keep mining ? perhaps i dont have the right binaries, where can i download the fixed binaries ?

@Mayhemz
Copy link

Mayhemz commented Mar 23, 2016

@cgladue
i have 2x r9 270's and this is the command im running as we speak on a win 8 64bit sys
setx GPU_FORCE_64BIT_PTR 0
setx GPU_MAX_HEAP_SIZE 100
setx GPU_USE_SYNC_OBJECTS 1
setx GPU_MAX_ALLOC_PERCENT 100
ethminer.exe -G -F address --cl-local-work 256 --cl-global-work 16384 --cl-extragpu-mem 0

and they both MEM_ALLOC_SIZE: 1408867653 like yours.

@cgladue
Copy link

cgladue commented Mar 23, 2016

@Mayhemz

What version of ethminer are you using, i am using 0.9.41-genoil-1.0.6

@Equinox-
Copy link
Author

@Genoil I also tried to reduce the number of registers; couldn't get it down below 67. I'll look at what you did and see if I can join the two to get it down to 64.
@cgladue I never published binaries with this fix; you would have to compile them yourself.

@AMPER228
Copy link

AMPER228 commented Apr 2, 2016

(MSI R9 380 2Gb) Driv 15.7.1, Win7 x64
image

My (Easy) not work string: gives an error message (-61) and (-38)
setx GPU_FORCE_64BIT_PTR 0
setx GPU_MAX_HEAP_SIZE 100
setx GPU_USE_SYNC_OBJECTS 1
SET GPU_SINGLE_ALLOC_PERCENT = 100
ethminer.exe -G -F http://eth-eu.dwarfpool.com:80/wallet --opencl-platform 0.

My NEW string: NOT gives an error message (-61) and (-38), BUT it produces air at speeds 13MH
ethminer.exe -G --farm http://eth-Ru.dwarfpool.com:80/wallet/AMPER228 --cl-global-work 16384 -t 3 --cl-local-work 256 --farm-recheck 400 --opencl-platform 0
If we replace (-t 3) to (-t 2) and (-t 1), it still shows 13Mh

image

WTF????!!!! HELP!!!

@AMPER228
Copy link

AMPER228 commented Apr 2, 2016

My statistics on " http://dwarfpool.com " equal to 0!!!!!
image

This race - the result of work of the laptop.
I assumed that my purse is not working, but on a laptop statistics appeared. By the way on the laptop windows 10.
Am I doing something wrong ???

@Equinox-
Copy link
Author

Equinox- commented Apr 2, 2016

I'm honestly lost here, since there are now multiple AMD cards this appears to not work on.
@AMPER228 Are you sure your card is even using chunking?
I suppose I could release a binary that I know works on my 7770 for testing on other GPUs. Would this be helpful?

@gogolplus
Copy link

setx GPU_FORCE_64BIT_PTR 0
setx GPU_MAX_HEAP_SIZE 100
setx GPU_USE_SYNC_OBJECTS 1
SET GPU_SINGLE_ALLOC_PERCENT = 100

form me its solved re 270x 2gb

@AMPER228
Copy link

AMPER228 commented Apr 3, 2016

Equinox Would this be helpful?
Yes, it will be useful if you explain what actions have to perform (for testing).
And if it will work, it will be one of the solutions.
And I'm not sure your card is even using chunking.

@AMPER228
Copy link

AMPER228 commented Apr 3, 2016

(MSI R9 380 2Gb) Drivers 15.7.1, Win7 x64
RAM 12Gb. AMD FX (tm)-4100 Quad-Core Processor3.60 GHz
Everything worked.
I install the drivers graphics card (in the first series of 15.7.1)
I install a newer version of the Microsoft .NET Framework 4.5.3
I disable Windows Firewall
image

My statistic
image

@Equinox-
Copy link
Author

Equinox- commented Apr 3, 2016

I've built some binaries that force chunking here, both the debugging and the release binary. If you're having troubles with this code feel free to try these.
https://github.com/Equinox-/libethereum/releases/tag/0.1_debug

@neil-jones
Copy link

I just tried the new binary and it's still not committing new work. Ran the debug binary, and the CPU hash doesn't end with GPU hash.

GPU lid=44, nonce = ecaec71b49aa6679, hash = 4565752f
CPU nonce=ecaec71b49aa6679, hash=920a07baf47232b4
GPU lid=45, nonce = ecaec71b49aa667a, hash = bd30ba33
CPU nonce=ecaec71b49aa667a, hash=4ede7c0613894c2d
GPU lid=46, nonce = ecaec71b49aa667b, hash = 60196cfc
CPU nonce=ecaec71b49aa667b, hash=8fb30fc4535ccdff
GPU lid=47, nonce = ecaec71b49aa667c, hash = 3f76fc8e
CPU nonce=ecaec71b49aa667c, hash=b1f7297999a6898
GPU lid=48, nonce = ecaec71b49aa667d, hash = 775ba8f3
CPU nonce=ecaec71b49aa667d, hash=fa6e9ac6c169471a
GPU lid=49, nonce = ecaec71b49aa667e, hash = 70863fbb
CPU nonce=ecaec71b49aa667e, hash=ccaa5f3760c64f72
GPU lid=50, nonce = ecaec71b49aa667f, hash = 977983ef
CPU nonce=ecaec71b49aa667f, hash=d3259ee2ceb0b24a
GPU lid=51, nonce = ecaec71b49aa6680, hash = 74d7092c
CPU nonce=ecaec71b49aa6680, hash=b04ecb3beb0b44dd
GPU lid=52, nonce = ecaec71b49aa6681, hash = fde5c5ee
CPU nonce=ecaec71b49aa6681, hash=332c338bcae1fe81
GPU lid=53, nonce = ecaec71b49aa6682, hash = a9596c09
CPU nonce=ecaec71b49aa6682, hash=190d8a3cbaad09dd

What does this mean? I'm running an AMD 7570 with 2GB RAM

@Equinox-
Copy link
Author

Equinox- commented Apr 3, 2016

Not sure. Both those binaries work on my 7770s, so I'm unsure why they don't work on your card. What version of the AMD drivers do you have, what arguments are you using to launch, and could I get some more info on your exact card?

@Equinox-
Copy link
Author

Equinox- commented Apr 3, 2016

Interesting. I just used the --opencl-device flag with ethminer and it failed to work; I'll try to figure that out.

@neil-jones
Copy link

Driver version is 15.200.1045.0, and launch args are "--cl-local-work 64 --cl-global-work 4096". This is my card, except mine is the 2GB version. https://www.techpowerup.com/gpudb/b692/pegatron-hd-7570.html

@Equinox-
Copy link
Author

Equinox- commented Apr 3, 2016

What does ethminer --list-devices show, and what are the first 20 or so lines of output by ethminer when you run it?

@isghe
Copy link

isghe commented Apr 3, 2016

The risk that some GPU are burning working for nothing is too much high. I think we should concentrate creating a unit test assuring that GPU algorithm is working good, before starting the GPU mining; both for full DAG and chunk DAG.

@neil-jones
Copy link

ethminer --list-devices returns:

Listing OpenCL devices.
FORMAT: [deviceID] deviceName
[0] Turks
CL_DEVICE_TYPE: GPU
CL_DEVICE_GLOBAL_MEM_SIZE: 2147483648
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 536870912
CL_DEVICE_MAX_WORK_GROUP_SIZE: 256

Here's the first bit of output:

Found suitable OpenCL device [Turks] with 2147483648 bytes of GPU memory
miner 19:41:18.318|main Getting work package...
miner 19:41:18.739|main Grabbing DAG for #63ca6f54…
miner 19:41:20.360|main Got work package:
i 19:41:20.360| Loading full DAG of seedhash: #67f3589a…
miner 19:41:20.362|main Header-hash: d118a1852b2d2e6800ad6fe232b025bff118485e82491ef3776f2c0a71c80fbd
miner 19:41:20.381|main Seedhash: 63ca6f54b1af76dd4df3b908cee464ff1f212f08352cbe7eb4422806bb0c7885
miner 19:41:20.389|main Target: 0000000225c17d04dad2965cc5a02a23e254c0c3f75d9178046aeb27ce1ca574
i 19:41:20.400|gpuminer0 workLoop 0 #00000000… #63ca6f54…
i 19:41:20.405|gpuminer0 Initialising miner...
miner 19:41:20.901|main Mining on PoWhash #d118a185… : 0 H/s = 0 hashes / 0.5 s
Using platform: AMD Accelerated Parallel Processing
i 19:41:21.969| Full DAG loaded
Using device: Turks(OpenCL 1.2 AMD-APP (1800.5))
miner 19:41:21.994|main Got work package:
miner 19:41:21.997|main Header-hash: 4667da13b8e87f49946646e5ff7b422eefa6b379343a7f1fb81b1d8120be5d0d
miner 19:41:22.004|main Seedhash: 63ca6f54b1af76dd4df3b908cee464ff1f212f08352cbe7eb4422806bb0c7885
miner 19:41:22.010|main Target: 0000000225c17d04dad2965cc5a02a23e254c0c3f75d9178046aeb27ce1ca574
Printing program log
"C:\Users\neil\AppData\Local\Temp\OCL39D.tmp.cl", line 117: warning: variable
"b4_0" was declared but never referenced
uint4* b4_0 = (uint4*) b;
^

Failed to allocate 1 big chunk. Max allocateable memory is 536870912. Trying to allocate 4 chunks.
Creating buffer for chunk 0 size=356515584
Creating buffer for chunk 1 size=356515584
Creating buffer for chunk 2 size=356515584
Creating buffer for chunk 3 size=356515712
Loading chunk kernels
Mapping chunk 0 with size=356515584 and offset=0
Mapping chunk 1 with size=356515584 and offset=356515584
Mapping chunk 2 with size=356515584 and offset=713031168
Mapping chunk 3 with size=356515712 and offset=1069546752
Creating buffer for header.
Creating mining buffer 0
Creating mining buffer 1
i 19:41:36.124|gpuminer0 workLoop 1 #63ca6f54… #63ca6f54…
miner 19:41:36.630|main Mining on PoWhash #4667da13… : 520126 H/s = 262144 hashes / 0.504 s
miner 19:41:37.353|main Got work package:
miner 19:41:37.355|main Header-hash: d33c13624abc788baf82f18712733c337a564169a0814441ae6e8beffbe0a398
miner 19:41:37.361|main Seedhash: 63ca6f54b1af76dd4df3b908cee464ff1f212f08352cbe7eb4422806bb0c7885
miner 19:41:37.368|main Target: 0000000225c17d04dad2965cc5a02a23e254c0c3f75d9178046aeb27ce1ca574
i 19:41:37.567|gpuminer0 workLoop 1 #63ca6f54… #63ca6f54…
miner 19:41:38.082|main Mining on PoWhash #d33c1362… : 513001 H/s = 262144 hashes / 0.511 s
miner 19:41:39.141|main Mining on PoWhash #d33c1362… : 1238865 H/s = 1310720 hashes / 1.058 s
miner 19:41:41.241|main Mining on PoWhash #d33c1362… : 1373135 H/s = 2883584 hashes / 2.1 s

@Equinox-
Copy link
Author

Equinox- commented Apr 3, 2016

Mind trying it with the following environmental variables (setx name val or export name=val)

setx GPU_FORCE_64BIT_PTR 0
setx GPU_MAX_ALLOC_PERCENT 100
setx GPU_MAX_HEAP_SIZE 100
setx GPU_SINGLE_ALLOC_PERCENT 100
setx GPU_USE_SYNC_OBJECTS 1

@neil-jones
Copy link

Don't mind at all. I ran the setx commands and fired up ethminer, but it seems to be doing the same thing.

`Found suitable OpenCL device [Turks] with 2147483648 bytes of GPU memory
miner 20:00:19.871|main Getting work package...
miner 20:00:22.248|main Grabbing DAG for #63ca6f54…
i 20:00:23.769| Loading full DAG of seedhash: #67f3589a…
miner 20:00:23.769|main Got work package:
miner 20:00:23.788|main Header-hash: 58e28b07486745c0096cd73815b4abf5ec80fccb18d172bcd7d77240af9b0c08
miner 20:00:23.794|main Seedhash: 63ca6f54b1af76dd4df3b908cee464ff1f212f08352cbe7eb4422806bb0c7885
miner 20:00:23.805|main Target: 0000000225c17d04dad2965cc5a02a23e254c0c3f75d9178046aeb27ce1ca574
i 20:00:23.835|gpuminer0 workLoop 0 #00000000… #63ca6f54…
i 20:00:23.839|gpuminer0 Initialising miner...
miner 20:00:24.336|main Mining on PoWhash #58e28b07… : 0 H/s = 0 hashes / 0.5 s
Using platform: AMD Accelerated Parallel Processing
i 20:00:25.452| Full DAG loaded
Using device: Turks(OpenCL 1.2 AMD-APP (1800.5))
miner 20:00:26.439|main Mining on PoWhash #58e28b07… : 0 H/s = 0 hashes / 2.102 s
miner 20:00:27.368|main Got work package:
miner 20:00:27.371|main Header-hash: 02b47f7332c6ceee16f3f53ef0c0e4c9ac4307be8cc9aff12a735a6fe5ed1c3e
miner 20:00:27.379|main Seedhash: 63ca6f54b1af76dd4df3b908cee464ff1f212f08352cbe7eb4422806bb0c7885
miner 20:00:27.385|main Target: 0000000225c17d04dad2965cc5a02a23e254c0c3f75d9178046aeb27ce1ca574
Printing program log
"C:\Users\neil\AppData\Local\Temp\OCL765A.tmp.cl", line 117: warning: variable
"b4_0" was declared but never referenced
uint4* b4_0 = (uint4*) b;
^

Failed to allocate 1 big chunk. Max allocateable memory is 536870912. Trying to allocate 4 chunks.
Creating buffer for chunk 0 size=356515584
Creating buffer for chunk 1 size=356515584
Creating buffer for chunk 2 size=356515584
Creating buffer for chunk 3 size=356515712
Loading chunk kernels
Mapping chunk 0 with size=356515584 and offset=0
Mapping chunk 1 with size=356515584 and offset=356515584
Mapping chunk 2 with size=356515584 and offset=713031168
Mapping chunk 3 with size=356515712 and offset=1069546752
Creating buffer for header.
Creating mining buffer 0
Creating mining buffer 1
i 20:00:39.552|gpuminer0 workLoop 1 #63ca6f54… #63ca6f54…
miner 20:00:40.065|main Mining on PoWhash #02b47f73… : 513001 H/s = 262144 hashes / 0.511 s
miner 20:00:40.460|main Got work package:
miner 20:00:40.462|main Header-hash: e75e0252ef1b666754e581320719b4759f66b427d37563f8218a794e7434c96e
miner 20:00:40.468|main Seedhash: 63ca6f54b1af76dd4df3b908cee464ff1f212f08352cbe7eb4422806bb0c7885
miner 20:00:40.473|main Target: 0000000225c17d04dad2965cc5a02a23e254c0c3f75d9178046aeb27ce1ca574
i 20:00:40.590|gpuminer0 workLoop 1 #63ca6f54… #63ca6f54…
miner 20:00:41.092|main Mining on PoWhash #e75e0252… : 524288 H/s = 262144 hashes / 0.5 s
miner 20:00:41.985|main Mining on PoWhash #e75e0252… : 1174217 H/s = 1048576 hashes / 0.893 s
miner 20:00:42.893|main Mining on PoWhash #e75e0252… : 1445115 H/s = 1310720 hashes / 0.907 s
miner 20:00:43.381|main Got work package:
miner 20:00:43.383|main Header-hash: c4bae4e848a8abdb2eafeb9a54b0a83f0191bbcd9ee9ae8e6a18bc1f96289262
miner 20:00:43.389|main Seedhash: 63ca6f54b1af76dd4df3b908cee464ff1f212f08352cbe7eb4422806bb0c7885
miner 20:00:43.396|main Target: 0000000225c17d04dad2965cc5a02a23e254c0c3f75d9178046aeb27ce1ca574
i 20:00:43.625|gpuminer0 workLoop 1 #63ca6f54… #63ca6f54…
miner 20:00:44.125|main Mining on PoWhash #c4bae4e8… : 524288 H/s = 262144 hashes / 0.5 s`

@Equinox-
Copy link
Author

Equinox- commented Apr 4, 2016

I'm assuming that means the ethminer_debug outputs are invalid again (CPU hash doesn't end with GPU hash)

@neil-jones
Copy link

Correct; debug's CPU hash doesn't end with GPU hash.

@Equinox-
Copy link
Author

Equinox- commented Apr 4, 2016

If you are having trouble with chunked mining you can try running the chunked DAG debugger. This won't actually mine anything; it uploads the DAG, runs through it to ensure integrity, then outputs CPU/GPU hash pairs. If it fails before the CPU/GPU hash pairs get printed (DAG verification fails) post the log.
https://github.com/Equinox-/libethereum/releases/tag/0.1.1

@Genoil
Copy link
Contributor

Genoil commented Apr 18, 2016

BTW i looked into my broken chunks implementation and fixed it. It does work, but it doesn't seem to be very useful since for the majority of cards it's more a matter of setting the right environment variables to fix the allocation issues. It's also slower on AMD cards and really doesn't do anything useful on the Nvidia platform.

I also managed to get the VGPRS usage down to 56, but I got 108 scratch registers back in return, which totally kills the added value of an extra wavefront

@Equinox-
Copy link
Author

How low did you get it before scratch registers started appearing? I've
got a GCN3 disassembler/assembler I might be able to use to cut out a few
more.
On Apr 18, 2016 07:19, "Genoil" notifications@github.com wrote:

BTW i looked into my broken chunks implementation and fixed it. It does
work, but it doesn't seem to be very useful since for the majority of cards
it's more a matter of setting the right environment variables to fix the
allocation issues. It's also slower on AMD cards and really doesn't do
anything useful on the Nvidia platform.

I also managed to get the VGPRS usage down to 56, but I got 108 scratch
registers back in return, which totally kills the added value of an extra
wavefront


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#203 (comment)

@Genoil
Copy link
Contributor

Genoil commented Apr 18, 2016

It was either 78 or 80 (the 80 one works real nice on CUDA-CL with maxrregs compiler option), or 56.

You use this for Theta:

    for(int i = 0; i < 5; i++)
        t[i] = a[i] ^ a[i+5] ^ a[i+10] ^ a[i+15] ^ a[i+20];

// #pragma unroll (enable to get speed back, but also +24 VGPRS) 
    for(int j = 0; j < 5; j++)
    {
            u = t[(j+4)%5] ^ ROL2(t[(j+1)%5], 1);

            for(int i = 0; i < 5; i++)
                a[i*5+j] ^= u;
    }

The dynamic indexing of a (the 1600-bit keccak state) forces the compiler to move it out of the registers. Same happens for t (the temporary 1600-bit keccak state). 25 * 2 * 2 = 100 is about 108 scratch regs. Why it then saves just 24 VGPRS is still a bit of a mystery, but it really doesn't matter.

@Genoil
Copy link
Contributor

Genoil commented Apr 18, 2016

Ah finally getting a bit of grip on that dreaded GCN compiler. Down to 23 VGPRS with an occupancy of 100%. Dramatic hashrate though. Good example why occupancy isn't everything :)

@Equinox-
Copy link
Author

Equinox- commented May 5, 2016

I'm going to close this. At this point the DAG size has increased even further and the few edge cases that the environmental variables don't solve don't seem to work with this either.

@dan-da
Copy link

dan-da commented Jun 27, 2016

I'm not sure why this patch was closed.

Recently I've been unable to mine with stock ethminer using either a 7970 3 Gb or an R9 270 2 Gb card due to the DAG alloc issue. ( note: for some reason one 7970 works fine and another doesn't. )

I've tried all env variable hacks in this thread and elsewhere to no avail.

Applying this patch fixes the problem for both cards, and all is well.

@dan-da
Copy link

dan-da commented Jun 29, 2016

For anyone interested, I created a fork of Genoil's ethminer that includes this patch. The chunking works great with R9 270 and HD 7970 and is automatic if allocating a full DAG fails.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet