Fixed DAG chunking #203

Closed
wants to merge 2 commits into
from

Projects

None yet
@Equinox-

Fixed support of chunking the DAG on GPUs unable to allocate a continuous block of memory large enough for the entire DAG.

The problem was in the offsets calculated in the CL kernel.
Also removed DAG duplication on chunked buffer mapping.

@ry60003333 ry60003333 referenced this pull request in ethereum/cpp-ethereum Mar 21, 2016
Open

ethminer fails after 1st workloop, Frontier.help! #2761

@ry60003333

Does anyone have an ETA on when this would be merged in? I would love to run ethminer on a bunch of "old" cards that I have that have the required amount of memory but cannot allocate it all at once. See my comments in issue #2761 in cpp-ethereum.

@bobsummerwill
Contributor

Hey @ry60003333,
The automated build steps will need to be green before we can do anything here.
I'm not sure offhand if the breaks currently showing are indicative with problems in our builds generally, or something specific to your changes. Please could you check? Thanks!

@LefterisJP
Contributor

This seems to have changes in a lot of places including the openCL kernel code. Apart from having all tests green, before merging please try to test on as many different GPUs as possible. We have no automated tests for the openCL mining code so any merge is a risk.

@bobsummerwill
Contributor

Yes, @LefterisJP. The range of "touch points" scare me too. CC @chriseth.

@ry60003333 If this code is working for you, then perhaps you are best just keeping it running as a private fork for you own benefit for the time being.

Many miners are already using the Genoil fork, rather than the "official" ethminer, so we're not even in a position where we have something in a particularly healthy state.

@Equinox-

The error is in ethereum/mix.
https://github.com/ethereum/mix/blob/c7b0854a450ef6199d6aa0a2d1e35c7c52063f57/src/MixClient.cpp#L67
This has since been patched on mix, however webthree-umbrella's submodule still points to this commit.

There actually aren't many changes in the OpenCL kernel code. The main ones are my unfolding the ternary operator used to decide which chunk to sample, and changing the offsets to use a define passed from ethash_cl_miner.
3b4c4ac#diff-1e4374038c6165d38201750ea44eae82L271

In ethash_cl_miner it adds another define with the chunk size, then allocates and uploads those chunks. The behavior on cards that can allocate a single chunk has not changed (beyond adding the extra define)

@ry60003333

@bobsummerwill Thanks for the reply! It looks like @Equinox- is correct about ethereum/mix causing the builds to fail. I'm not sure what the process would be to get that fix so the tests would pass, but it would be a start.

I'll attempt to build from source with this code and test the resulting miner on both the R9 270s that require chunking and some R9 280X cards that can fit the DAG in one chunk to ensure that the OpenCL kernel code still works.

Sadly it seems like the Genoil fork doesn't support chunking the DAG either, so it would be nice to have this in the "official" version.

@bobsummerwill
Contributor

I refresh the webthree-umbrella, so if you use "develop" it should have the latest and greatest everything now.

Yes - please do test away!

I would also recommend that you start a dialog with @Genoil about this DAG chunking functionality too.

I am hoping that we can upstream the Genoil changes and "heal the breach", but that may or may not be possible. At the time of writing the Genoil branch is the best miner to use, and we may or may not ever get back to an official miner which is worth people's while. I hope we will, but don't bet everything on it!

@Genoil
Contributor
Genoil commented Mar 21, 2016

I recently removed the chunking parts because I was refactoring the kernel in an attempt to squeeze a bit of extra performance out of it. It didn't work anyway and it didn't look very pretty either. Nice fix though.

Most cards that are used for mining (including Pitcairn (78x0/270/370) work fine without chunking, by setting the right ENV vars. So I'm not really considering bringing it back.

A while ago I tried a different chunking method that only had host-side chunking and no chunks kernel-side. Unfortunately it didn't work out well on AMD hardware.

@ry60003333

@bobsummerwill I'll give it a try from that branch! It would be nice to merge the changes back and "heal" the branch, but I agree that the need for it will likely determine if that happens.

@Genoil Thanks for explaining the situation; do you happen to know what ENV variables will work on Pitcairn MSI R9 270 cards? I'm running Ubuntu 14.04 with the fglrx-updates drivers, and unfortunately haven't had any success with ENV variables. I really do appreciate your input though!

@Genoil
Contributor
Genoil commented Mar 22, 2016

@ry60003333 I don't own any Pitcairns, but apparently the R7 370 is now seen as one of the most efficient cards for Ethereum. With ENv vars I meant enviroment variables. Recently the list has grown to 5 of these to satisfy most modern AMD cards (export ==setx):

export GPU_FORCE_64BIT_PTR 0
export GPU_MAX_HEAP_SIZE 100
export GPU_USE_SYNC_OBJECTS 1
export GPU_MAX_ALLOC_PERCENT 100
export GPU_SINGLE_ALLOC_PERCENT=100

@ry60003333

@Genoil Thank you for the reply! It looks like the last environment variable was the one that I was missing. I had tried all the others, but GPU_SINGLE_ALLOC_PERCENT looks like it was the one that did the trick on Ubuntu. I really appreciate the assistance!

@Genoil
Contributor
Genoil commented Mar 23, 2016

@ry60003333 you're welcome. It's actually quite a recent requirement for some AMD cards, since the DAG has grown to about 1.4GB.

@otaku160

Hi @ry60003333 and @Genoil
by setting GPU_SINGLE_ALLOC_PERCENT 100 i've lost 4Mh/s -_-
i dis this because i had an issue with chunk...
my GPU is an R9 380 2GB itx with Tonga chip

@chriseth
Contributor

Could you rebase this, please? There are a lot of unrelated commits in this PR.

@Genoil
Contributor
Genoil commented Mar 23, 2016

@otaku160 but can you mine without the setting?

@cgladue
cgladue commented Mar 23, 2016

@Genoil i downloaded your latest miner (1.0.6) and tried this, but still failing to mine.

i tried setting all the ENV vars and still cannot allocate the DAG in a single chunk ... i am using ethminer, is there a different miner i should use ?

[0] Pitcairn
CL_DEVICE_TYPE: GPU
CL_DEVICE_GLOBAL_MEM_SIZE: 2147483648
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1408867653
CL_DEVICE_MAX_WORK_GROUP_SIZE: 256

my card is a 2GB R9 270 and it shows 1.4 GB as the max memory when doing list-devices./

@fussler
fussler commented Mar 23, 2016

yesterday i could set use export GPU_SINGLE_ALLOC_PERCENT=100 and it worked with ethminer and stratum proxy.. Today my miner on linux mint.. was doing nothing when i woke up..
restarted and today i can't get the DAG to load it just ends up with -61 again. :(
im using a radeon 78** card

fuss@fussy ~/Downloads $ export GPU_MAX_ALLOC_PERCENT=100
fuss@fussy ~/Downloads $ ethminer --list-devices
[OPENCL]:
Listing OpenCL devices.
FORMAT: [deviceID] deviceName
[0] Pitcairn
CL_DEVICE_TYPE: GPU
CL_DEVICE_GLOBAL_MEM_SIZE: 1944059904
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1031798784
CL_DEVICE_MAX_WORK_GROUP_SIZE: 256

@Sharapoff

Hi guys. Im yesterday connect to my rig of Asus R9 280x 3Gb one new ASUS R9 380 4Gb. After installation drivers im getting issue like this when im start eth-proxy.py file :
Traceback (most recent call last):
Failure: stratum.custom_exceptions.TransportException: SocketTransportClientFact
ory connection timed out.
After its starts show usuually normal strings,like this :
2016-03-23 15:33:47,963 INFO proxy # NEW_JOB MAIN_POOL
But its ends when im starting ethminer. Peers are disconnected and proxy not works.Before its shows that:
Creating one big buffer for the DAG
Loading single big chunk kernels
Mapping one big chunk.
DAG 15:50:56| Generating DAG file. Progress: 0 %
DAG 15:51:06| Generating DAG file. Progress: 1 %
DAG 15:51:18| Generating DAG file. Progress: 2 %
After miner shows hashrate same time (around few hours else) after falling. Im disconnected R9 380 ,removed and recollect twice DAGS. Delete all files and set it again at first from precompile version after from source and reinstall Python and all needed sources. Its not help me here. Im changed ports. And settings in my firewall(also turning off my antivir and firewall its not help ).
My specification:
OS : Windows 7 Ultimate (64 bit) fully updated on that moment.
Etherium stratum proxy version : 0.0.5
My bat file like that :
setx GPU_FORCE_64BIT_PTR 0
setx GPU_MAX_ALLOC_PERCENT 100
setx GPU_MAX_HEAP_SIZE 100
setx GPU_USE_SYNC_OBJECTS 1
ethminer.exe --farm-recheck 400 -G -F http://127.0.0.1:8080/ --cl-local-work 256 --cl-global-work 16384
Open_CL version : 1.2
GPU Kernel version; Tahiti
AMD Drivers version: 16.03
Cards now only few Asus R9 280x DC2T 3GDD5
On Windows also installed AMD Crimson 16 and "GPU Tweak 2" from official vendors sites. Guys may be anyone can help here. Thank you.

@fussler
fussler commented Mar 23, 2016

had the same problem today too on windows.
its very fishy...

@BobDoe
BobDoe commented Mar 23, 2016

yesterday windows 10, windows defender reported a virus it was eth proxy.exe.. Win Defender deleted the eth proxy.exe so I downloaded it again and windows defender taged and deleted the file at download.

found a copy in a zip file and checked it, it was not infected so using it now

@fussler
fussler commented Mar 23, 2016

I only wanted to see if windows also did not work.

u use windows without the bullshit defender and wall

@Mayhemz
Mayhemz commented Mar 23, 2016

@cgladue
guys having issues with 270x with 2gb do this
add --cl-extragpu-mem 0 when you're not connecting a display on these cards.
this fixed it for me
genoil mentioned this in https://forum.ethereum.org/discussion/2227/cuda-miner

@cgladue
cgladue commented Mar 23, 2016

@Mayhemz

thanks for the suggestion, i tried it and it didnt help (i use resisters in a dummy plug anyways) it just appears that sapphire R9 270 cards have CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1408867653 which is just (as of last night) over the memory needed to load the full DAG.

i dont think this is going to be fixed unless there is a way to not load the complete dag in one huge file, perhaps chunk the DAG in 2 smaller files or something ? or is there a way to increase the value of CL_DEVICE_MAX_MEM_ALLOC_SIZE ?

@cgladue
cgladue commented Mar 23, 2016

@Equinox-

i see at the top you said you:

Fixed support of chunking the DAG on GPUs unable to allocate a continuous block of memory large enough for the entire DAG.

The problem was in the offsets calculated in the CL kernel.
Also removed DAG duplication on chunked buffer mapping.

but seems like still having the issue where my card has 2GB of RAM, but can only alloc 1.4GB max at once, wasnt your fix supposed to allow me to keep mining ? perhaps i dont have the right binaries, where can i download the fixed binaries ?

@Mayhemz
Mayhemz commented Mar 23, 2016

@cgladue
i have 2x r9 270's and this is the command im running as we speak on a win 8 64bit sys
setx GPU_FORCE_64BIT_PTR 0
setx GPU_MAX_HEAP_SIZE 100
setx GPU_USE_SYNC_OBJECTS 1
setx GPU_MAX_ALLOC_PERCENT 100
ethminer.exe -G -F address --cl-local-work 256 --cl-global-work 16384 --cl-extragpu-mem 0

and they both MEM_ALLOC_SIZE: 1408867653 like yours.

@cgladue
cgladue commented Mar 23, 2016

@Mayhemz

What version of ethminer are you using, i am using 0.9.41-genoil-1.0.6

@Equinox-

@Genoil I also tried to reduce the number of registers; couldn't get it down below 67. I'll look at what you did and see if I can join the two to get it down to 64.
@cgladue I never published binaries with this fix; you would have to compile them yourself.

@Mayhemz
Mayhemz commented Mar 23, 2016

@cgladue
0.9.41-genoil-1.0.6b the one genoil released recently. Aslo works with the old one. i deleted my dag files first, then ran that command as above, also added setx GPU_SINGLE_ALLOC_PERCENT 100 first without --cl-extragpu-mem 0 which failed after downloading the diag, then i removed setx GPU_SINGLE_ALLOC_PERCENT 100 and used --cl-extragpu-mem 0 which worked straight away

@cgladue
cgladue commented Mar 23, 2016

@Mayhemz
i copy and pasted your batch file and same result, error -61 & error -38

@Equinox-
i am running windows is there any instructions on how to compile it myself ? :'(

@cgladue
cgladue commented Mar 23, 2016

@Mayhemz
would you happen to know where i can get 1.0.6b google has no results

@Mayhemz
Mayhemz commented Mar 23, 2016

@cgladue
what windows are you running? and are you doing this through a monitor thats connected to the pc?

@cgladue
cgladue commented Mar 23, 2016

@Mayhemz
i am running Windows 8.1 Enterprise 8GB RAM 64-Bit, there are VGA Dummy plugs (with resistors) to trick the card into thinking there is a monitor connected, i am connected via teamviewer.

@cgladue
cgladue commented Mar 23, 2016

@Mayhemz
IT WORKED!!! 1.0.6b worked by just swapping out the binaries !! thank you so much for the help. hopefully it will run ok up till the DAG grows to 2GB

@Sharapoff

Guys WIn 7 64bit Ultimate. Im mined more than month without any problems with active Agnitum Outpost Firewall. And 320Gb HDD with free space more than 200Gb. Ethminer needs only 1348 Mb (info from "GPU Tweak Monitor" App) all that time so here not any memory problem in my situation(R9 280x each have 3gb). Its issues im get after detecting new R9 380 and R9 280x on one rig. Im not even start proxy or miner. And yes on of the card periodically connecting to my monitor PC via VGA slot.But for 3 years of works its a first time im getting same problem. Im cannot really understand where here a main error in OS in proxy conf or in miner may be in my Internet settings but without starting Ethminer already proxy works good(but anyway its shown that Error look here guys):
"Traceback (most recent call last):
Failure: stratum.custom_exceptions.TransportException: SocketTransportClientFact
ory connection timed out."
After that its start to find new jobs.

Equinox- added some commits Mar 9, 2016
@Equinox- Equinox- Fixed chunking to work on graphics cards with enough overall memory b…
…ut not enough in a single block.

The problem was in the offsets calculated in the CL kernel.
Also removed DAG duplication on chunked buffer updates.
b76ee91
@Equinox- Equinox- Run convert_uint on the defines to ensure type.
b6443a1
@Genoil
Contributor
Genoil commented Mar 23, 2016

@Equinox- 67! almost there! curious how you got there, too

btw 1.0.6b has a bug with auto DAG file deletion.

@Equinox-

@Genoil
I unrolled the loop into sets of 4 calls to keccak_f1600_round; I found this interfaced better between loops. (Really I just played with powers of two until it worked well)
I used amd_bitalign for rotating, however it seemed that using ulongs and OpenCL's native rotate() method would also work well.
For some reason the beginning of the Theta section (b = a ^ ..etc) seemed to lose almost 4 registers by doing a uint8 and a uint2 instead of 5 uint2. (No idea why)
I moved all the Rho Pi calculations down into Chi. (Your modification where Rho Pi doesn't have to be stored in an auxiliary b[25] might be more efficient than this, since I still used b[6])
I don't recall if splitting a[25] into a_0, a_5, a_10, a_15, a_20 (which are just pointers to a at the given offset) improved usage at all.
My reduced kernel is here; I think one of my changes bumped it back up to 73 or something (and I couldn't figure out what I did to get it to 67)
https://github.com/Equinox-/libethereum/blob/reg_reduce/libethash-cl/ethash_cl_miner_kernel.cl

@Genoil
Contributor
Genoil commented Mar 25, 2016

@Equinox- nice! btw Note that the (if out_size >=) statements don't have any effect, because the parameters that go in are 8 resp. 4. It did save some VGPRS in my case.

amd_bit_align is not really needed, the compiler seems to be able to figure that out if I look at the generate ISA code.

@Equinox-

Yeah the compiler figured it out but I think explicitly calling bitalign
reduced the register count
On Mar 25, 2016 00:52, "Genoil" notifications@github.com wrote:

@Equinox- https://github.com/Equinox- nice! btw Note that the (if
out_size >=) statements don't have any effect, because the parameters that
go in are 8 resp. 4. It did save some VGPRS in my case.

amd_bit_align is not really needed, the compiler seems to be able to
figure that out if I look at the generate ISA code.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#203 (comment)

@chriseth
Contributor

What is the status about testing on this? Does this change reduce performance on cards that do not need chunking?

@Equinox-

No it would not. The modification is just to the _chunks methods, which
are only referenced when normal allocation fails.
On Mar 25, 2016 10:53, "chriseth" notifications@github.com wrote:

What is the status about testing on this? Does this change reduce
performance on cards that do not need chunking?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#203 (comment)

@ufukkiraz

Allocating/mapping single buffer failed with: clCreateBuffer(-61). GPU can't all
ocate the DAG in a single chunk. Bailing.
clEnqueueWriteBuffer(-38)

I get this error ??

@Equinox-

If you are running this fork of ethminer (Equinox-/libethereum) it should
give that error, then attempt chunked allocation, then if that fails it
should bail.
On Mar 28, 2016 08:29, "ufukkiraz" notifications@github.com wrote:

Allocating/mapping single buffer failed with: clCreateBuffer(-61). GPU
can't all
ocate the DAG in a single chunk. Bailing.
clEnqueueWriteBuffer(-38)

I get this error ??


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#203 (comment)

@ufukkiraz

So what should I do to get this error For

?

@Equinox-

If you just get that error you probably aren't running my fork; to run my
fork you have to build from source.
On Mar 28, 2016 08:36, "ufukkiraz" notifications@github.com wrote:

So what should I do to get this error For

?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#203 (comment)

@ufukkiraz

How Can I do that ?

@Equinox-

http://www.ethdocs.org/en/latest/ethereum-clients/cpp-ethereum/building-from-source/index.html#building-from-source
You'll also need to point the libethereum submodule to my fork; (cd
libethereum; git remote add equinox mygithubcloneurl; git fetch equinox
develop; git merge equinox/develop)
On Mar 28, 2016 08:40, "ufukkiraz" notifications@github.com wrote:

How Can I do that ?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#203 (comment)

@Jeff8800

@Equinox- Thanks very much for your fix. For those of us running GPUs with 2GB of memory, how long can we expect this fix to work (i.e.., how much is "enough overall memory", and for how long will that be a valid?). Trying to figure out if I need to invest in GPUs with 4GB of memory. Thanks!

@Equinox-

@Jeff8800 The DAG is expected to grow about 73% per year. It's ~1.4GB right now, so that gives about a half year of mining before 2GB isn't enough. (This could be wrong, if it is someone else feel free to correct)

@Jeff8800

@Equinox- Thanks, very helpful to have an estimated timeline. In a multi-card rig, do all the cards individually hold the DAG? Or would I just need to upgrade one card to 4GB?

@isghe
isghe commented Mar 30, 2016

@Equinox- I compiled and tested your fix on my private blockchain, but "Never manages to mine a block" persists.

@Equinox-

@Jeff8800 Each card will need space for the entire DAG.
@isghe It's very hard to debug this type of thing; what type of card are you using?

@isghe
isghe commented Mar 30, 2016

@Equinox- I am with a MBP 2015 with Intel Iris Pro 1536 MB.

$ ./ethminer --list-devices
[OPENCL]:
Listing OpenCL devices.
FORMAT: [deviceID] deviceName
[0] Iris Pro
    CL_DEVICE_TYPE: GPU
    CL_DEVICE_GLOBAL_MEM_SIZE: 1610612736
    CL_DEVICE_MAX_MEM_ALLOC_SIZE: 402653184
    CL_DEVICE_MAX_WORK_GROUP_SIZE: 512

Forcing the chunk from your fix (but anyway without your fix is not mining any block too):
isghe@9ece92c

$ ./ethminer -G -F http://127.0.0.1:8545
...
[OPENCL]:Creating buffer for chunk 0 size=268434944
[OPENCL]:Creating buffer for chunk 1 size=268434944
[OPENCL]:Creating buffer for chunk 2 size=268434944
[OPENCL]:Creating buffer for chunk 3 size=268435072
[OPENCL]:Loading chunk kernels
[OPENCL]:Mapping chunk 0 with size=268434944 and offset=0
miner  23:08:21.058|  Mining on PoWhash #d51da089… : 0 H/s = 0 hashes / 0.503 s
[OPENCL]:Mapping chunk 1 with size=268434944 and offset=268434944
[OPENCL]:Mapping chunk 2 with size=268434944 and offset=536869888
[OPENCL]:Mapping chunk 3 with size=268435072 and offset=805304832
[OPENCL]:Creating buffer for header.
[OPENCL]:Creating mining buffer 0
[OPENCL]:Creating mining buffer 1
miner  23:08:21.565|  Mining on PoWhash #d51da089… : 0 H/s = 0 hashes / 0.506 s
miner  23:08:22.067|  Mining on PoWhash #d51da089… : 2610996 H/s = 1310720 hashes / 0.502 s
miner  23:08:22.573|  Mining on PoWhash #d51da089… : 3114582 H/s = 1572864 hashes / 0.505 s
miner  23:08:23.078|  Mining on PoWhash #d51da089… : 2595485 H/s = 1310720 hashes / 0.505 s
miner  23:08:23.583|  Mining on PoWhash #d51da089… : 2595485 H/s = 1310720 hashes / 0.505 s
miner  23:08:24.087|  Mining on PoWhash #d51da089… : 2605805 H/s = 1310720 hashes / 0.503 s
miner  23:08:24.590|  Mining on PoWhash #d51da089… : 3133195 H/s = 1572864 hashes / 0.502 s
miner  23:08:25.092|  Mining on PoWhash #d51da089… : 2610996 H/s = 1310720 hashes / 0.502 s
miner  23:08:25.594|  Mining on PoWhash #d51da089… : 2616207 H/s = 1310720 hashes / 0.501 s
miner  23:08:26.096|  Mining on PoWhash #d51da089… : 3139449 H/s = 1572864 hashes / 0.501 s
miner  23:08:26.602|  Mining on PoWhash #d51da089… : 2590355 H/s = 1310720 hashes / 0.506 s
miner  23:08:27.106|  Mining on PoWhash #d51da089… : 2600634 H/s = 1310720 hashes / 0.504 s
miner  23:08:27.610|  Mining on PoWhash #d51da089… : 2605805 H/s = 1310720 hashes / 0.503 s
miner  23:08:28.111|  Mining on PoWhash #d51da089… : 3139449 H/s = 1572864 hashes / 0.501 s
miner  23:08:28.614|  Mining on PoWhash #d51da089… : 2610996 H/s = 1310720 hashes / 0.502 s
miner  23:08:29.118|  Mining on PoWhash #d51da089… : 2605805 H/s = 1310720 hashes / 0.503 s
miner  23:08:29.620|  Mining on PoWhash #d51da089… : 3139449 H/s = 1572864 hashes / 0.501 s
miner  23:08:30.123|  Mining on PoWhash #d51da089… : 2605805 H/s = 1310720 hashes / 0.503 s

but never mining a block.

p.s.
without forcing chunk crash on clEnqueueWriteBuffer:

[OPENCL]:Printing program log
[OPENCL]:
[OPENCL]:Creating one big buffer for the DAG
[OPENCL]:Loading single big chunk kernels
[OPENCL]:Mapping one big chunk.
Abort trap: 6

and finally using CPU is mining regularly:

$ ./ethminer -C -F http://127.0.0.1:8545
miner  23:14:43.887|  Getting work package...
miner  23:14:44.547|  Got work package:
  ℹ  23:14:44.547|  Loading full DAG of seedhash: #290decd9…
miner  23:14:44.547|    Header-hash: d51da089bb76a6d55a9f1d23677bce760dc5888d204f9b30e98805ef96190dc9
miner  23:14:44.547|    Seedhash: 0000000000000000000000000000000000000000000000000000000000000000
miner  23:14:44.547|    Target: 00002241db62e4e9d99f206cb1745486cef8bb22a0de095930ba30afc09ee87c
miner  23:14:45.053|  Mining on PoWhash #d51da089… : 0 H/s = 0 hashes / 0.504 s
  ℹ  23:14:45.235|  Full DAG loaded
miner  23:14:45.737|  Mining on PoWhash #d51da089… : 166666 H/s = 114000 hashes / 0.684 s
miner  23:14:46.243|  Mining on PoWhash #d51da089… : 300198 H/s = 151600 hashes / 0.505 s
miner  23:14:46.749|  Mining on PoWhash #d51da089… : 290316 H/s = 146900 hashes / 0.506 s
  ℹ  23:14:47.254|  Solution found; Submitting to http://127.0.0.1:8545 ...
  ℹ  23:14:47.255|    Nonce: 2d906fc82e5b733e
  ℹ  23:14:47.255|    Mixhash: b40a64300c5f0748fb98b40e23aab1b0ac5d877461065a1738cda89b5d5d6008
  ℹ  23:14:47.255|    Header-hash: d51da089bb76a6d55a9f1d23677bce760dc5888d204f9b30e98805ef96190dc9
  ℹ  23:14:47.255|    Seedhash: 0000000000000000000000000000000000000000000000000000000000000000
  ℹ  23:14:47.255|    Target: 00002241db62e4e9d99f206cb1745486cef8bb22a0de095930ba30afc09ee87c
  ℹ  23:14:47.255|    Ethash: 00002043253cbeeed46a8fdf5f0db86c789578c8357bc555d06dd6bad01d22d6
  ℹ  23:14:47.256|  B-) Submitted and accepted.
miner  23:14:47.257|  Getting work package...
@Equinox-

@isghe Does it ever find a share on the public blockchain?

@isghe
isghe commented Mar 31, 2016

@Equinox- Yes I did: successfully with CPU mining, for nothing with CHUNK GPU miner, even if CHUNK GPU reports is almost 10 time faster than CPU miner. I don't think the problem is related to the private or public blockchain itself, but really inside the handling of the CHUNK GPU algorithm or itself.
And I am asking to myself right now... how many people is running CHUNK GPU algorithm for nothing right now? And how many people running CHUNK GPU with success?

@Equinox-

@isghe I just ran it on my AMD GPU with chunking and it managed to find and submit a solution. You can try running it from my reg_reduce branch; that has a debugging utility that allows you to compare the output hash values for the GPU with the CPU. (Only the last 32 bits)

@isghe
isghe commented Mar 31, 2016

@Equinox-
should be the GPU hash output and CPU hash output the same?

[OPENCL]:Printing program log
[OPENCL]:<program source>:102:16: warning: unused variable 'b4_0'
        uint4* b4_0 = (uint4*) b;
               ^

[OPENCL]:Failed to allocate 1 big chunk. Max allocateable memory is 402653184. Trying to allocate 4 chunks.
[OPENCL]:Creating buffer for chunk 0 size=268434944
[OPENCL]:Creating buffer for chunk 1 size=268434944
[OPENCL]:Creating buffer for chunk 2 size=268434944
[OPENCL]:Creating buffer for chunk 3 size=268435072
[OPENCL]:Loading chunk kernels
[OPENCL]:Mapping chunk 0 with size=268434944 and offset=0
[OPENCL]:Mapping chunk 1 with size=268434944 and offset=268434944
[OPENCL]:Mapping chunk 2 with size=268434944 and offset=536869888
[OPENCL]:Mapping chunk 3 with size=268435072 and offset=805304832
[OPENCL]:Creating buffer for header.
[OPENCL]:Creating mining buffer 0
[OPENCL]:Creating mining buffer 1
miner  03:46:17.000|  Mining on PoWhash #37480358… : 0 H/s = 0 hashes / 0.501 s
GPU lid=0, nonce = 1ff62c4ed0b06174, hash = d89467f3
CPU nonce=1ff62c4ed0b06174, hash=548dc4e6278a433b
GPU lid=1, nonce = 1ff62c4ed0b06175, hash = d418455f
CPU nonce=1ff62c4ed0b06175, hash=579d1ad53253ca7f
GPU lid=2, nonce = 1ff62c4ed0b06176, hash = 164de5d3
CPU nonce=1ff62c4ed0b06176, hash=358168a9a74591f
GPU lid=3, nonce = 1ff62c4ed0b06177, hash = 2fc9e59
CPU nonce=1ff62c4ed0b06177, hash=31dc602173223c39
...
GPU lid=58, nonce = 1ff62c4ed0d861ae, hash = c877a69
CPU nonce=1ff62c4ed0d861ae, hash=aea1114508de4c7f
GPU lid=59, nonce = 1ff62c4ed0d861af, hash = 5da69d25
CPU nonce=1ff62c4ed0d861af, hash=25987aa897198029
GPU lid=60, nonce = 1ff62c4ed0d861b0, hash = bd081470
CPU nonce=1ff62c4ed0d861b0, hash=f006eda3929e2144
GPU lid=61, nonce = 1ff62c4ed0d861b1, hash = ed4154bc
CPU nonce=1ff62c4ed0d861b1, hash=c26fb893031010a
GPU lid=62, nonce = 1ff62c4ed0d861b2, hash = c42de7fb
CPU nonce=1ff62c4ed0d861b2, hash=14e8c9731d570027
miner  03:46:18.509|  Mining on PoWhash #37480358… : 2092966 H/s = 1048576 hashes / 0.501 s
@Equinox-

The CPU hash should end with the GPU hash. I know the kernel you're using
works on AMD cards so I'm at a loss to explain it. It's possible the
kernel still uses amd_bitalign but I thought I had disabled that.
On Mar 30, 2016 18:54, "isidoro ghezzi" notifications@github.com wrote:

@Equinox- https://github.com/Equinox-
should be the GPU hash output and CPU hash output the same?

[OPENCL]:Printing program log
[OPENCL]::102:16: warning: unused variable 'b4_0'
uint4* b4_0 = (uint4*) b;
^

[OPENCL]:Failed to allocate 1 big chunk. Max allocateable memory is 402653184. Trying to allocate 4 chunks.
[OPENCL]:Creating buffer for chunk 0 size=268434944
[OPENCL]:Creating buffer for chunk 1 size=268434944
[OPENCL]:Creating buffer for chunk 2 size=268434944
[OPENCL]:Creating buffer for chunk 3 size=268435072
[OPENCL]:Loading chunk kernels
[OPENCL]:Mapping chunk 0 with size=268434944 and offset=0
[OPENCL]:Mapping chunk 1 with size=268434944 and offset=268434944
[OPENCL]:Mapping chunk 2 with size=268434944 and offset=536869888
[OPENCL]:Mapping chunk 3 with size=268435072 and offset=805304832
[OPENCL]:Creating buffer for header.
[OPENCL]:Creating mining buffer 0
[OPENCL]:Creating mining buffer 1
miner 03:46:17.000| Mining on PoWhash #37480358… : 0 H/s = 0 hashes / 0.501 s
GPU lid=0, nonce = 1ff62c4ed0b06174, hash = d89467f3
CPU nonce=1ff62c4ed0b06174, hash=548dc4e6278a433b
GPU lid=1, nonce = 1ff62c4ed0b06175, hash = d418455f
CPU nonce=1ff62c4ed0b06175, hash=579d1ad53253ca7f
GPU lid=2, nonce = 1ff62c4ed0b06176, hash = 164de5d3
CPU nonce=1ff62c4ed0b06176, hash=358168a9a74591f
GPU lid=3, nonce = 1ff62c4ed0b06177, hash = 2fc9e59
CPU nonce=1ff62c4ed0b06177, hash=31dc602173223c39
...
GPU lid=58, nonce = 1ff62c4ed0d861ae, hash = c877a69
CPU nonce=1ff62c4ed0d861ae, hash=aea1114508de4c7f
GPU lid=59, nonce = 1ff62c4ed0d861af, hash = 5da69d25
CPU nonce=1ff62c4ed0d861af, hash=25987aa897198029
GPU lid=60, nonce = 1ff62c4ed0d861b0, hash = bd081470
CPU nonce=1ff62c4ed0d861b0, hash=f006eda3929e2144
GPU lid=61, nonce = 1ff62c4ed0d861b1, hash = ed4154bc
CPU nonce=1ff62c4ed0d861b1, hash=c26fb893031010a
GPU lid=62, nonce = 1ff62c4ed0d861b2, hash = c42de7fb
CPU nonce=1ff62c4ed0d861b2, hash=14e8c9731d570027
miner 03:46:18.509| Mining on PoWhash #37480358… : 2092966 H/s = 1048576 hashes / 0.501 s


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#203 (comment)

@clintljohnson

I have two Sapphire 7870's (2GB RAM each) and have this same problem.

@Genoil
Contributor
Genoil commented Mar 31, 2016

@Equinox- finally found some time to fire up ye olde 7950 to see if something could be done about that reg count. I started changing bits of code by looking at yours and ended up with something similar but totally different :). Down from 78 to 70 regs now, although I made a mistake somewhere causing no valid solutions. But that should be fixable. Strangely enough, when I disable the bitalign in your kernel, the regcount goes from 76 to 78 (where's 67?), in my code it does nothing. Also, the unrolling works differently.

Will share when I get the bug fixed...

@Equinox-

@Genoil I think when I got it to 67 it had stopped working and I had no idea where the mistake was so I had to go back in history quite a bit to figure out where the mistake was.
I have your kernel down at 69 but it also isn't working; I'll have to see if I can get that one working.
I'm honestly unsure if it is possible to even get it down to 64 registers, since 50 registers are required to store the current state, and a minimum of 11 are required during theta (by my count). I could probably get it down to 64 by writing all the ISA for keccak by hand but this means the compiler can only make 3 mistakes.
@clintljohnson I have no idea why it isn't working on your cards aren't working. I have 7770s, which to my knowledge use the same (southern islands) ISA as the 78xx series.

@Genoil
Contributor
Genoil commented Apr 1, 2016

@Equinox- yeah I got to 57 at some point, but it doesn't count if it doesn't work :). i started over today and discovered the mistake, which you probably also made. Integrating rho/pi and chi to reduce the size of the b array just doesn't work.

On the upside, i managed to tune the opencl kernel for NVidia in such a way, that it is as fast as my CUDA kernel. @bobsummerwill this opens an oppurtunity for me to switch over to webthree-umbrella, get rid of CUDA and move the extra command-line goodies from 1.0.6 into the official ethminer. I'll sleep on that one :)

@TheDeafMute

So... What do I do to make my 7870 work...

@Equinox-
Equinox- commented Apr 2, 2016

Without a 7870 I can't say why it doesn't work. As it is I'd suggest
building it from each of my branches to see if either works (reg_reduce and
develop). And if neither works I don't have any ideas.
On Apr 2, 2016 00:10, "TheDeafMute" notifications@github.com wrote:

So... What do I do to make my 7870 work...


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#203 (comment)

@AMPER228
AMPER228 commented Apr 2, 2016

(MSI R9 380 2Gb) Driv 15.7.1, Win7 x64
image

My (Easy) not work string: gives an error message (-61) and (-38)
setx GPU_FORCE_64BIT_PTR 0
setx GPU_MAX_HEAP_SIZE 100
setx GPU_USE_SYNC_OBJECTS 1
SET GPU_SINGLE_ALLOC_PERCENT = 100
ethminer.exe -G -F http://eth-eu.dwarfpool.com:80/wallet --opencl-platform 0.

My NEW string: NOT gives an error message (-61) and (-38), BUT it produces air at speeds 13MH
ethminer.exe -G --farm http://eth-Ru.dwarfpool.com:80/wallet/AMPER228 --cl-global-work 16384 -t 3 --cl-local-work 256 --farm-recheck 400 --opencl-platform 0
If we replace (-t 3) to (-t 2) and (-t 1), it still shows 13Mh

image

WTF????!!!! HELP!!!

@AMPER228
AMPER228 commented Apr 2, 2016

My statistics on " http://dwarfpool.com " equal to 0!!!!!
image

This race - the result of work of the laptop.
I assumed that my purse is not working, but on a laptop statistics appeared. By the way on the laptop windows 10.
Am I doing something wrong ???

@Equinox-
Equinox- commented Apr 2, 2016

I'm honestly lost here, since there are now multiple AMD cards this appears to not work on.
@AMPER228 Are you sure your card is even using chunking?
I suppose I could release a binary that I know works on my 7770 for testing on other GPUs. Would this be helpful?

@gogolplus

setx GPU_FORCE_64BIT_PTR 0
setx GPU_MAX_HEAP_SIZE 100
setx GPU_USE_SYNC_OBJECTS 1
SET GPU_SINGLE_ALLOC_PERCENT = 100

form me its solved re 270x 2gb

@AMPER228
AMPER228 commented Apr 3, 2016

Equinox Would this be helpful?
Yes, it will be useful if you explain what actions have to perform (for testing).
And if it will work, it will be one of the solutions.
And I'm not sure your card is even using chunking.

@AMPER228
AMPER228 commented Apr 3, 2016

(MSI R9 380 2Gb) Drivers 15.7.1, Win7 x64
RAM 12Gb. AMD FX (tm)-4100 Quad-Core Processor3.60 GHz
Everything worked.
I install the drivers graphics card (in the first series of 15.7.1)
I install a newer version of the Microsoft .NET Framework 4.5.3
I disable Windows Firewall
image

My statistic
image

@Equinox-
Equinox- commented Apr 3, 2016

I've built some binaries that force chunking here, both the debugging and the release binary. If you're having troubles with this code feel free to try these.
https://github.com/Equinox-/libethereum/releases/tag/0.1_debug

@neil-jones

I just tried the new binary and it's still not committing new work. Ran the debug binary, and the CPU hash doesn't end with GPU hash.

GPU lid=44, nonce = ecaec71b49aa6679, hash = 4565752f
CPU nonce=ecaec71b49aa6679, hash=920a07baf47232b4
GPU lid=45, nonce = ecaec71b49aa667a, hash = bd30ba33
CPU nonce=ecaec71b49aa667a, hash=4ede7c0613894c2d
GPU lid=46, nonce = ecaec71b49aa667b, hash = 60196cfc
CPU nonce=ecaec71b49aa667b, hash=8fb30fc4535ccdff
GPU lid=47, nonce = ecaec71b49aa667c, hash = 3f76fc8e
CPU nonce=ecaec71b49aa667c, hash=b1f7297999a6898
GPU lid=48, nonce = ecaec71b49aa667d, hash = 775ba8f3
CPU nonce=ecaec71b49aa667d, hash=fa6e9ac6c169471a
GPU lid=49, nonce = ecaec71b49aa667e, hash = 70863fbb
CPU nonce=ecaec71b49aa667e, hash=ccaa5f3760c64f72
GPU lid=50, nonce = ecaec71b49aa667f, hash = 977983ef
CPU nonce=ecaec71b49aa667f, hash=d3259ee2ceb0b24a
GPU lid=51, nonce = ecaec71b49aa6680, hash = 74d7092c
CPU nonce=ecaec71b49aa6680, hash=b04ecb3beb0b44dd
GPU lid=52, nonce = ecaec71b49aa6681, hash = fde5c5ee
CPU nonce=ecaec71b49aa6681, hash=332c338bcae1fe81
GPU lid=53, nonce = ecaec71b49aa6682, hash = a9596c09
CPU nonce=ecaec71b49aa6682, hash=190d8a3cbaad09dd

What does this mean? I'm running an AMD 7570 with 2GB RAM

@Equinox-
Equinox- commented Apr 3, 2016

Not sure. Both those binaries work on my 7770s, so I'm unsure why they don't work on your card. What version of the AMD drivers do you have, what arguments are you using to launch, and could I get some more info on your exact card?

@Equinox-
Equinox- commented Apr 3, 2016

Interesting. I just used the --opencl-device flag with ethminer and it failed to work; I'll try to figure that out.

@neil-jones

Driver version is 15.200.1045.0, and launch args are "--cl-local-work 64 --cl-global-work 4096". This is my card, except mine is the 2GB version. https://www.techpowerup.com/gpudb/b692/pegatron-hd-7570.html

@Equinox-
Equinox- commented Apr 3, 2016

What does ethminer --list-devices show, and what are the first 20 or so lines of output by ethminer when you run it?

@isghe
isghe commented Apr 3, 2016

The risk that some GPU are burning working for nothing is too much high. I think we should concentrate creating a unit test assuring that GPU algorithm is working good, before starting the GPU mining; both for full DAG and chunk DAG.

@neil-jones

ethminer --list-devices returns:

Listing OpenCL devices.
FORMAT: [deviceID] deviceName
[0] Turks
CL_DEVICE_TYPE: GPU
CL_DEVICE_GLOBAL_MEM_SIZE: 2147483648
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 536870912
CL_DEVICE_MAX_WORK_GROUP_SIZE: 256

Here's the first bit of output:

Found suitable OpenCL device [Turks] with 2147483648 bytes of GPU memory
miner 19:41:18.318|main Getting work package...
miner 19:41:18.739|main Grabbing DAG for #63ca6f54…
miner 19:41:20.360|main Got work package:
i 19:41:20.360| Loading full DAG of seedhash: #67f3589a…
miner 19:41:20.362|main Header-hash: d118a1852b2d2e6800ad6fe232b025bff118485e82491ef3776f2c0a71c80fbd
miner 19:41:20.381|main Seedhash: 63ca6f54b1af76dd4df3b908cee464ff1f212f08352cbe7eb4422806bb0c7885
miner 19:41:20.389|main Target: 0000000225c17d04dad2965cc5a02a23e254c0c3f75d9178046aeb27ce1ca574
i 19:41:20.400|gpuminer0 workLoop 0 #00000000… #63ca6f54…
i 19:41:20.405|gpuminer0 Initialising miner...
miner 19:41:20.901|main Mining on PoWhash #d118a185… : 0 H/s = 0 hashes / 0.5 s
Using platform: AMD Accelerated Parallel Processing
i 19:41:21.969| Full DAG loaded
Using device: Turks(OpenCL 1.2 AMD-APP (1800.5))
miner 19:41:21.994|main Got work package:
miner 19:41:21.997|main Header-hash: 4667da13b8e87f49946646e5ff7b422eefa6b379343a7f1fb81b1d8120be5d0d
miner 19:41:22.004|main Seedhash: 63ca6f54b1af76dd4df3b908cee464ff1f212f08352cbe7eb4422806bb0c7885
miner 19:41:22.010|main Target: 0000000225c17d04dad2965cc5a02a23e254c0c3f75d9178046aeb27ce1ca574
Printing program log
"C:\Users\neil\AppData\Local\Temp\OCL39D.tmp.cl", line 117: warning: variable
"b4_0" was declared but never referenced
uint4* b4_0 = (uint4*) b;
^

Failed to allocate 1 big chunk. Max allocateable memory is 536870912. Trying to allocate 4 chunks.
Creating buffer for chunk 0 size=356515584
Creating buffer for chunk 1 size=356515584
Creating buffer for chunk 2 size=356515584
Creating buffer for chunk 3 size=356515712
Loading chunk kernels
Mapping chunk 0 with size=356515584 and offset=0
Mapping chunk 1 with size=356515584 and offset=356515584
Mapping chunk 2 with size=356515584 and offset=713031168
Mapping chunk 3 with size=356515712 and offset=1069546752
Creating buffer for header.
Creating mining buffer 0
Creating mining buffer 1
i 19:41:36.124|gpuminer0 workLoop 1 #63ca6f54… #63ca6f54…
miner 19:41:36.630|main Mining on PoWhash #4667da13… : 520126 H/s = 262144 hashes / 0.504 s
miner 19:41:37.353|main Got work package:
miner 19:41:37.355|main Header-hash: d33c13624abc788baf82f18712733c337a564169a0814441ae6e8beffbe0a398
miner 19:41:37.361|main Seedhash: 63ca6f54b1af76dd4df3b908cee464ff1f212f08352cbe7eb4422806bb0c7885
miner 19:41:37.368|main Target: 0000000225c17d04dad2965cc5a02a23e254c0c3f75d9178046aeb27ce1ca574
i 19:41:37.567|gpuminer0 workLoop 1 #63ca6f54… #63ca6f54…
miner 19:41:38.082|main Mining on PoWhash #d33c1362… : 513001 H/s = 262144 hashes / 0.511 s
miner 19:41:39.141|main Mining on PoWhash #d33c1362… : 1238865 H/s = 1310720 hashes / 1.058 s
miner 19:41:41.241|main Mining on PoWhash #d33c1362… : 1373135 H/s = 2883584 hashes / 2.1 s

@Equinox-
Equinox- commented Apr 3, 2016

Mind trying it with the following environmental variables (setx name val or export name=val)

setx GPU_FORCE_64BIT_PTR 0
setx GPU_MAX_ALLOC_PERCENT 100
setx GPU_MAX_HEAP_SIZE 100
setx GPU_SINGLE_ALLOC_PERCENT 100
setx GPU_USE_SYNC_OBJECTS 1
@neil-jones

Don't mind at all. I ran the setx commands and fired up ethminer, but it seems to be doing the same thing.

`Found suitable OpenCL device [Turks] with 2147483648 bytes of GPU memory
miner 20:00:19.871|main Getting work package...
miner 20:00:22.248|main Grabbing DAG for #63ca6f54…
i 20:00:23.769| Loading full DAG of seedhash: #67f3589a…
miner 20:00:23.769|main Got work package:
miner 20:00:23.788|main Header-hash: 58e28b07486745c0096cd73815b4abf5ec80fccb18d172bcd7d77240af9b0c08
miner 20:00:23.794|main Seedhash: 63ca6f54b1af76dd4df3b908cee464ff1f212f08352cbe7eb4422806bb0c7885
miner 20:00:23.805|main Target: 0000000225c17d04dad2965cc5a02a23e254c0c3f75d9178046aeb27ce1ca574
i 20:00:23.835|gpuminer0 workLoop 0 #00000000… #63ca6f54…
i 20:00:23.839|gpuminer0 Initialising miner...
miner 20:00:24.336|main Mining on PoWhash #58e28b07… : 0 H/s = 0 hashes / 0.5 s
Using platform: AMD Accelerated Parallel Processing
i 20:00:25.452| Full DAG loaded
Using device: Turks(OpenCL 1.2 AMD-APP (1800.5))
miner 20:00:26.439|main Mining on PoWhash #58e28b07… : 0 H/s = 0 hashes / 2.102 s
miner 20:00:27.368|main Got work package:
miner 20:00:27.371|main Header-hash: 02b47f7332c6ceee16f3f53ef0c0e4c9ac4307be8cc9aff12a735a6fe5ed1c3e
miner 20:00:27.379|main Seedhash: 63ca6f54b1af76dd4df3b908cee464ff1f212f08352cbe7eb4422806bb0c7885
miner 20:00:27.385|main Target: 0000000225c17d04dad2965cc5a02a23e254c0c3f75d9178046aeb27ce1ca574
Printing program log
"C:\Users\neil\AppData\Local\Temp\OCL765A.tmp.cl", line 117: warning: variable
"b4_0" was declared but never referenced
uint4* b4_0 = (uint4*) b;
^

Failed to allocate 1 big chunk. Max allocateable memory is 536870912. Trying to allocate 4 chunks.
Creating buffer for chunk 0 size=356515584
Creating buffer for chunk 1 size=356515584
Creating buffer for chunk 2 size=356515584
Creating buffer for chunk 3 size=356515712
Loading chunk kernels
Mapping chunk 0 with size=356515584 and offset=0
Mapping chunk 1 with size=356515584 and offset=356515584
Mapping chunk 2 with size=356515584 and offset=713031168
Mapping chunk 3 with size=356515712 and offset=1069546752
Creating buffer for header.
Creating mining buffer 0
Creating mining buffer 1
i 20:00:39.552|gpuminer0 workLoop 1 #63ca6f54… #63ca6f54…
miner 20:00:40.065|main Mining on PoWhash #02b47f73… : 513001 H/s = 262144 hashes / 0.511 s
miner 20:00:40.460|main Got work package:
miner 20:00:40.462|main Header-hash: e75e0252ef1b666754e581320719b4759f66b427d37563f8218a794e7434c96e
miner 20:00:40.468|main Seedhash: 63ca6f54b1af76dd4df3b908cee464ff1f212f08352cbe7eb4422806bb0c7885
miner 20:00:40.473|main Target: 0000000225c17d04dad2965cc5a02a23e254c0c3f75d9178046aeb27ce1ca574
i 20:00:40.590|gpuminer0 workLoop 1 #63ca6f54… #63ca6f54…
miner 20:00:41.092|main Mining on PoWhash #e75e0252… : 524288 H/s = 262144 hashes / 0.5 s
miner 20:00:41.985|main Mining on PoWhash #e75e0252… : 1174217 H/s = 1048576 hashes / 0.893 s
miner 20:00:42.893|main Mining on PoWhash #e75e0252… : 1445115 H/s = 1310720 hashes / 0.907 s
miner 20:00:43.381|main Got work package:
miner 20:00:43.383|main Header-hash: c4bae4e848a8abdb2eafeb9a54b0a83f0191bbcd9ee9ae8e6a18bc1f96289262
miner 20:00:43.389|main Seedhash: 63ca6f54b1af76dd4df3b908cee464ff1f212f08352cbe7eb4422806bb0c7885
miner 20:00:43.396|main Target: 0000000225c17d04dad2965cc5a02a23e254c0c3f75d9178046aeb27ce1ca574
i 20:00:43.625|gpuminer0 workLoop 1 #63ca6f54… #63ca6f54…
miner 20:00:44.125|main Mining on PoWhash #c4bae4e8… : 524288 H/s = 262144 hashes / 0.5 s`

@Equinox-
Equinox- commented Apr 4, 2016

I'm assuming that means the ethminer_debug outputs are invalid again (CPU hash doesn't end with GPU hash)

@neil-jones

Correct; debug's CPU hash doesn't end with GPU hash.

@Equinox-
Equinox- commented Apr 4, 2016

If you are having trouble with chunked mining you can try running the chunked DAG debugger. This won't actually mine anything; it uploads the DAG, runs through it to ensure integrity, then outputs CPU/GPU hash pairs. If it fails before the CPU/GPU hash pairs get printed (DAG verification fails) post the log.
https://github.com/Equinox-/libethereum/releases/tag/0.1.1

@Genoil
Contributor
Genoil commented Apr 18, 2016

BTW i looked into my broken chunks implementation and fixed it. It does work, but it doesn't seem to be very useful since for the majority of cards it's more a matter of setting the right environment variables to fix the allocation issues. It's also slower on AMD cards and really doesn't do anything useful on the Nvidia platform.

I also managed to get the VGPRS usage down to 56, but I got 108 scratch registers back in return, which totally kills the added value of an extra wavefront

@Equinox-

How low did you get it before scratch registers started appearing? I've
got a GCN3 disassembler/assembler I might be able to use to cut out a few
more.
On Apr 18, 2016 07:19, "Genoil" notifications@github.com wrote:

BTW i looked into my broken chunks implementation and fixed it. It does
work, but it doesn't seem to be very useful since for the majority of cards
it's more a matter of setting the right environment variables to fix the
allocation issues. It's also slower on AMD cards and really doesn't do
anything useful on the Nvidia platform.

I also managed to get the VGPRS usage down to 56, but I got 108 scratch
registers back in return, which totally kills the added value of an extra
wavefront


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#203 (comment)

@Genoil
Contributor
Genoil commented Apr 18, 2016 edited

It was either 78 or 80 (the 80 one works real nice on CUDA-CL with maxrregs compiler option), or 56.

You use this for Theta:

    for(int i = 0; i < 5; i++)
        t[i] = a[i] ^ a[i+5] ^ a[i+10] ^ a[i+15] ^ a[i+20];

// #pragma unroll (enable to get speed back, but also +24 VGPRS) 
    for(int j = 0; j < 5; j++)
    {
            u = t[(j+4)%5] ^ ROL2(t[(j+1)%5], 1);

            for(int i = 0; i < 5; i++)
                a[i*5+j] ^= u;
    }

The dynamic indexing of a (the 1600-bit keccak state) forces the compiler to move it out of the registers. Same happens for t (the temporary 1600-bit keccak state). 25 * 2 * 2 = 100 is about 108 scratch regs. Why it then saves just 24 VGPRS is still a bit of a mystery, but it really doesn't matter.

@Genoil
Contributor
Genoil commented Apr 18, 2016

Ah finally getting a bit of grip on that dreaded GCN compiler. Down to 23 VGPRS with an occupancy of 100%. Dramatic hashrate though. Good example why occupancy isn't everything :)

@Equinox-
Equinox- commented May 5, 2016

I'm going to close this. At this point the DAG size has increased even further and the few edge cases that the environmental variables don't solve don't seem to work with this either.

@Equinox- Equinox- closed this May 5, 2016
@chriseth chriseth removed the in progress label May 5, 2016
@bobsummerwill bobsummerwill referenced this pull request in ethereum/webthree-umbrella May 12, 2016
Closed

OpenCL chunker is disabled #227

@dan-da
dan-da commented Jun 27, 2016

I'm not sure why this patch was closed.

Recently I've been unable to mine with stock ethminer using either a 7970 3 Gb or an R9 270 2 Gb card due to the DAG alloc issue. ( note: for some reason one 7970 works fine and another doesn't. )

I've tried all env variable hacks in this thread and elsewhere to no avail.

Applying this patch fixes the problem for both cards, and all is well.

@dan-da
dan-da commented Jun 29, 2016 edited

For anyone interested, I created a fork of Genoil's ethminer that includes this patch. The chunking works great with R9 270 and HD 7970 and is automatic if allocating a full DAG fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment