This repository has been archived by the owner. It is now read-only.

Fixed DAG chunking #203

Closed
wants to merge 2 commits into
from

Conversation

Projects
None yet
@Equinox-

Fixed support of chunking the DAG on GPUs unable to allocate a continuous block of memory large enough for the entire DAG.

The problem was in the offsets calculated in the CL kernel.
Also removed DAG duplication on chunked buffer mapping.

@ry60003333

This comment has been minimized.

Show comment
Hide comment
@ry60003333

ry60003333 Mar 21, 2016

Does anyone have an ETA on when this would be merged in? I would love to run ethminer on a bunch of "old" cards that I have that have the required amount of memory but cannot allocate it all at once. See my comments in issue #2761 in cpp-ethereum.

Does anyone have an ETA on when this would be merged in? I would love to run ethminer on a bunch of "old" cards that I have that have the required amount of memory but cannot allocate it all at once. See my comments in issue #2761 in cpp-ethereum.

@bobsummerwill

This comment has been minimized.

Show comment
Hide comment
@bobsummerwill

bobsummerwill Mar 21, 2016

Contributor

Hey @ry60003333,
The automated build steps will need to be green before we can do anything here.
I'm not sure offhand if the breaks currently showing are indicative with problems in our builds generally, or something specific to your changes. Please could you check? Thanks!

Contributor

bobsummerwill commented Mar 21, 2016

Hey @ry60003333,
The automated build steps will need to be green before we can do anything here.
I'm not sure offhand if the breaks currently showing are indicative with problems in our builds generally, or something specific to your changes. Please could you check? Thanks!

@LefterisJP

This comment has been minimized.

Show comment
Hide comment
@LefterisJP

LefterisJP Mar 21, 2016

Contributor

This seems to have changes in a lot of places including the openCL kernel code. Apart from having all tests green, before merging please try to test on as many different GPUs as possible. We have no automated tests for the openCL mining code so any merge is a risk.

Contributor

LefterisJP commented Mar 21, 2016

This seems to have changes in a lot of places including the openCL kernel code. Apart from having all tests green, before merging please try to test on as many different GPUs as possible. We have no automated tests for the openCL mining code so any merge is a risk.

@bobsummerwill

This comment has been minimized.

Show comment
Hide comment
@bobsummerwill

bobsummerwill Mar 21, 2016

Contributor

Yes, @LefterisJP. The range of "touch points" scare me too. CC @chriseth.

@ry60003333 If this code is working for you, then perhaps you are best just keeping it running as a private fork for you own benefit for the time being.

Many miners are already using the Genoil fork, rather than the "official" ethminer, so we're not even in a position where we have something in a particularly healthy state.

Contributor

bobsummerwill commented Mar 21, 2016

Yes, @LefterisJP. The range of "touch points" scare me too. CC @chriseth.

@ry60003333 If this code is working for you, then perhaps you are best just keeping it running as a private fork for you own benefit for the time being.

Many miners are already using the Genoil fork, rather than the "official" ethminer, so we're not even in a position where we have something in a particularly healthy state.

@Equinox-

This comment has been minimized.

Show comment
Hide comment
@Equinox-

Equinox- Mar 21, 2016

The error is in ethereum/mix.
https://github.com/ethereum/mix/blob/c7b0854a450ef6199d6aa0a2d1e35c7c52063f57/src/MixClient.cpp#L67
This has since been patched on mix, however webthree-umbrella's submodule still points to this commit.

There actually aren't many changes in the OpenCL kernel code. The main ones are my unfolding the ternary operator used to decide which chunk to sample, and changing the offsets to use a define passed from ethash_cl_miner.
3b4c4ac#diff-1e4374038c6165d38201750ea44eae82L271

In ethash_cl_miner it adds another define with the chunk size, then allocates and uploads those chunks. The behavior on cards that can allocate a single chunk has not changed (beyond adding the extra define)

The error is in ethereum/mix.
https://github.com/ethereum/mix/blob/c7b0854a450ef6199d6aa0a2d1e35c7c52063f57/src/MixClient.cpp#L67
This has since been patched on mix, however webthree-umbrella's submodule still points to this commit.

There actually aren't many changes in the OpenCL kernel code. The main ones are my unfolding the ternary operator used to decide which chunk to sample, and changing the offsets to use a define passed from ethash_cl_miner.
3b4c4ac#diff-1e4374038c6165d38201750ea44eae82L271

In ethash_cl_miner it adds another define with the chunk size, then allocates and uploads those chunks. The behavior on cards that can allocate a single chunk has not changed (beyond adding the extra define)

@ry60003333

This comment has been minimized.

Show comment
Hide comment
@ry60003333

ry60003333 Mar 21, 2016

@bobsummerwill Thanks for the reply! It looks like @Equinox- is correct about ethereum/mix causing the builds to fail. I'm not sure what the process would be to get that fix so the tests would pass, but it would be a start.

I'll attempt to build from source with this code and test the resulting miner on both the R9 270s that require chunking and some R9 280X cards that can fit the DAG in one chunk to ensure that the OpenCL kernel code still works.

Sadly it seems like the Genoil fork doesn't support chunking the DAG either, so it would be nice to have this in the "official" version.

@bobsummerwill Thanks for the reply! It looks like @Equinox- is correct about ethereum/mix causing the builds to fail. I'm not sure what the process would be to get that fix so the tests would pass, but it would be a start.

I'll attempt to build from source with this code and test the resulting miner on both the R9 270s that require chunking and some R9 280X cards that can fit the DAG in one chunk to ensure that the OpenCL kernel code still works.

Sadly it seems like the Genoil fork doesn't support chunking the DAG either, so it would be nice to have this in the "official" version.

@bobsummerwill

This comment has been minimized.

Show comment
Hide comment
@bobsummerwill

bobsummerwill Mar 21, 2016

Contributor

I refresh the webthree-umbrella, so if you use "develop" it should have the latest and greatest everything now.

Yes - please do test away!

I would also recommend that you start a dialog with @Genoil about this DAG chunking functionality too.

I am hoping that we can upstream the Genoil changes and "heal the breach", but that may or may not be possible. At the time of writing the Genoil branch is the best miner to use, and we may or may not ever get back to an official miner which is worth people's while. I hope we will, but don't bet everything on it!

Contributor

bobsummerwill commented Mar 21, 2016

I refresh the webthree-umbrella, so if you use "develop" it should have the latest and greatest everything now.

Yes - please do test away!

I would also recommend that you start a dialog with @Genoil about this DAG chunking functionality too.

I am hoping that we can upstream the Genoil changes and "heal the breach", but that may or may not be possible. At the time of writing the Genoil branch is the best miner to use, and we may or may not ever get back to an official miner which is worth people's while. I hope we will, but don't bet everything on it!

@Genoil

This comment has been minimized.

Show comment
Hide comment
@Genoil

Genoil Mar 21, 2016

Contributor

I recently removed the chunking parts because I was refactoring the kernel in an attempt to squeeze a bit of extra performance out of it. It didn't work anyway and it didn't look very pretty either. Nice fix though.

Most cards that are used for mining (including Pitcairn (78x0/270/370) work fine without chunking, by setting the right ENV vars. So I'm not really considering bringing it back.

A while ago I tried a different chunking method that only had host-side chunking and no chunks kernel-side. Unfortunately it didn't work out well on AMD hardware.

Contributor

Genoil commented Mar 21, 2016

I recently removed the chunking parts because I was refactoring the kernel in an attempt to squeeze a bit of extra performance out of it. It didn't work anyway and it didn't look very pretty either. Nice fix though.

Most cards that are used for mining (including Pitcairn (78x0/270/370) work fine without chunking, by setting the right ENV vars. So I'm not really considering bringing it back.

A while ago I tried a different chunking method that only had host-side chunking and no chunks kernel-side. Unfortunately it didn't work out well on AMD hardware.

@ry60003333

This comment has been minimized.

Show comment
Hide comment
@ry60003333

ry60003333 Mar 22, 2016

@bobsummerwill I'll give it a try from that branch! It would be nice to merge the changes back and "heal" the branch, but I agree that the need for it will likely determine if that happens.

@Genoil Thanks for explaining the situation; do you happen to know what ENV variables will work on Pitcairn MSI R9 270 cards? I'm running Ubuntu 14.04 with the fglrx-updates drivers, and unfortunately haven't had any success with ENV variables. I really do appreciate your input though!

@bobsummerwill I'll give it a try from that branch! It would be nice to merge the changes back and "heal" the branch, but I agree that the need for it will likely determine if that happens.

@Genoil Thanks for explaining the situation; do you happen to know what ENV variables will work on Pitcairn MSI R9 270 cards? I'm running Ubuntu 14.04 with the fglrx-updates drivers, and unfortunately haven't had any success with ENV variables. I really do appreciate your input though!

@Genoil

This comment has been minimized.

Show comment
Hide comment
@Genoil

Genoil Mar 22, 2016

Contributor

@ry60003333 I don't own any Pitcairns, but apparently the R7 370 is now seen as one of the most efficient cards for Ethereum. With ENv vars I meant enviroment variables. Recently the list has grown to 5 of these to satisfy most modern AMD cards (export ==setx):

export GPU_FORCE_64BIT_PTR 0
export GPU_MAX_HEAP_SIZE 100
export GPU_USE_SYNC_OBJECTS 1
export GPU_MAX_ALLOC_PERCENT 100
export GPU_SINGLE_ALLOC_PERCENT=100

Contributor

Genoil commented Mar 22, 2016

@ry60003333 I don't own any Pitcairns, but apparently the R7 370 is now seen as one of the most efficient cards for Ethereum. With ENv vars I meant enviroment variables. Recently the list has grown to 5 of these to satisfy most modern AMD cards (export ==setx):

export GPU_FORCE_64BIT_PTR 0
export GPU_MAX_HEAP_SIZE 100
export GPU_USE_SYNC_OBJECTS 1
export GPU_MAX_ALLOC_PERCENT 100
export GPU_SINGLE_ALLOC_PERCENT=100

@ry60003333

This comment has been minimized.

Show comment
Hide comment
@ry60003333

ry60003333 Mar 23, 2016

@Genoil Thank you for the reply! It looks like the last environment variable was the one that I was missing. I had tried all the others, but GPU_SINGLE_ALLOC_PERCENT looks like it was the one that did the trick on Ubuntu. I really appreciate the assistance!

@Genoil Thank you for the reply! It looks like the last environment variable was the one that I was missing. I had tried all the others, but GPU_SINGLE_ALLOC_PERCENT looks like it was the one that did the trick on Ubuntu. I really appreciate the assistance!

@Genoil

This comment has been minimized.

Show comment
Hide comment
@Genoil

Genoil Mar 23, 2016

Contributor

@ry60003333 you're welcome. It's actually quite a recent requirement for some AMD cards, since the DAG has grown to about 1.4GB.

Contributor

Genoil commented Mar 23, 2016

@ry60003333 you're welcome. It's actually quite a recent requirement for some AMD cards, since the DAG has grown to about 1.4GB.

@otaku160

This comment has been minimized.

Show comment
Hide comment
@otaku160

otaku160 Mar 23, 2016

Hi @ry60003333 and @Genoil
by setting GPU_SINGLE_ALLOC_PERCENT 100 i've lost 4Mh/s -_-
i dis this because i had an issue with chunk...
my GPU is an R9 380 2GB itx with Tonga chip

Hi @ry60003333 and @Genoil
by setting GPU_SINGLE_ALLOC_PERCENT 100 i've lost 4Mh/s -_-
i dis this because i had an issue with chunk...
my GPU is an R9 380 2GB itx with Tonga chip

@chriseth

This comment has been minimized.

Show comment
Hide comment
@chriseth

chriseth Mar 23, 2016

Contributor

Could you rebase this, please? There are a lot of unrelated commits in this PR.

Contributor

chriseth commented Mar 23, 2016

Could you rebase this, please? There are a lot of unrelated commits in this PR.

@Genoil

This comment has been minimized.

Show comment
Hide comment
@Genoil

Genoil Mar 23, 2016

Contributor

@otaku160 but can you mine without the setting?

Contributor

Genoil commented Mar 23, 2016

@otaku160 but can you mine without the setting?

@cgladue

This comment has been minimized.

Show comment
Hide comment
@cgladue

cgladue Mar 23, 2016

@Genoil i downloaded your latest miner (1.0.6) and tried this, but still failing to mine.

i tried setting all the ENV vars and still cannot allocate the DAG in a single chunk ... i am using ethminer, is there a different miner i should use ?

[0] Pitcairn
CL_DEVICE_TYPE: GPU
CL_DEVICE_GLOBAL_MEM_SIZE: 2147483648
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1408867653
CL_DEVICE_MAX_WORK_GROUP_SIZE: 256

my card is a 2GB R9 270 and it shows 1.4 GB as the max memory when doing list-devices./

cgladue commented Mar 23, 2016

@Genoil i downloaded your latest miner (1.0.6) and tried this, but still failing to mine.

i tried setting all the ENV vars and still cannot allocate the DAG in a single chunk ... i am using ethminer, is there a different miner i should use ?

[0] Pitcairn
CL_DEVICE_TYPE: GPU
CL_DEVICE_GLOBAL_MEM_SIZE: 2147483648
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1408867653
CL_DEVICE_MAX_WORK_GROUP_SIZE: 256

my card is a 2GB R9 270 and it shows 1.4 GB as the max memory when doing list-devices./

@fussler

This comment has been minimized.

Show comment
Hide comment
@fussler

fussler Mar 23, 2016

yesterday i could set use export GPU_SINGLE_ALLOC_PERCENT=100 and it worked with ethminer and stratum proxy.. Today my miner on linux mint.. was doing nothing when i woke up..
restarted and today i can't get the DAG to load it just ends up with -61 again. :(
im using a radeon 78** card

fuss@fussy ~/Downloads $ export GPU_MAX_ALLOC_PERCENT=100
fuss@fussy ~/Downloads $ ethminer --list-devices
[OPENCL]:
Listing OpenCL devices.
FORMAT: [deviceID] deviceName
[0] Pitcairn
CL_DEVICE_TYPE: GPU
CL_DEVICE_GLOBAL_MEM_SIZE: 1944059904
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1031798784
CL_DEVICE_MAX_WORK_GROUP_SIZE: 256

fussler commented Mar 23, 2016

yesterday i could set use export GPU_SINGLE_ALLOC_PERCENT=100 and it worked with ethminer and stratum proxy.. Today my miner on linux mint.. was doing nothing when i woke up..
restarted and today i can't get the DAG to load it just ends up with -61 again. :(
im using a radeon 78** card

fuss@fussy ~/Downloads $ export GPU_MAX_ALLOC_PERCENT=100
fuss@fussy ~/Downloads $ ethminer --list-devices
[OPENCL]:
Listing OpenCL devices.
FORMAT: [deviceID] deviceName
[0] Pitcairn
CL_DEVICE_TYPE: GPU
CL_DEVICE_GLOBAL_MEM_SIZE: 1944059904
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1031798784
CL_DEVICE_MAX_WORK_GROUP_SIZE: 256

@Sharapoff

This comment has been minimized.

Show comment
Hide comment
@Sharapoff

Sharapoff Mar 23, 2016

Hi guys. Im yesterday connect to my rig of Asus R9 280x 3Gb one new ASUS R9 380 4Gb. After installation drivers im getting issue like this when im start eth-proxy.py file :
Traceback (most recent call last):
Failure: stratum.custom_exceptions.TransportException: SocketTransportClientFact
ory connection timed out.
After its starts show usuually normal strings,like this :
2016-03-23 15:33:47,963 INFO proxy # NEW_JOB MAIN_POOL
But its ends when im starting ethminer. Peers are disconnected and proxy not works.Before its shows that:
Creating one big buffer for the DAG
Loading single big chunk kernels
Mapping one big chunk.
DAG 15:50:56| Generating DAG file. Progress: 0 %
DAG 15:51:06| Generating DAG file. Progress: 1 %
DAG 15:51:18| Generating DAG file. Progress: 2 %
After miner shows hashrate same time (around few hours else) after falling. Im disconnected R9 380 ,removed and recollect twice DAGS. Delete all files and set it again at first from precompile version after from source and reinstall Python and all needed sources. Its not help me here. Im changed ports. And settings in my firewall(also turning off my antivir and firewall its not help ).
My specification:
OS : Windows 7 Ultimate (64 bit) fully updated on that moment.
Etherium stratum proxy version : 0.0.5
My bat file like that :
setx GPU_FORCE_64BIT_PTR 0
setx GPU_MAX_ALLOC_PERCENT 100
setx GPU_MAX_HEAP_SIZE 100
setx GPU_USE_SYNC_OBJECTS 1
ethminer.exe --farm-recheck 400 -G -F http://127.0.0.1:8080/ --cl-local-work 256 --cl-global-work 16384
Open_CL version : 1.2
GPU Kernel version; Tahiti
AMD Drivers version: 16.03
Cards now only few Asus R9 280x DC2T 3GDD5
On Windows also installed AMD Crimson 16 and "GPU Tweak 2" from official vendors sites. Guys may be anyone can help here. Thank you.

Hi guys. Im yesterday connect to my rig of Asus R9 280x 3Gb one new ASUS R9 380 4Gb. After installation drivers im getting issue like this when im start eth-proxy.py file :
Traceback (most recent call last):
Failure: stratum.custom_exceptions.TransportException: SocketTransportClientFact
ory connection timed out.
After its starts show usuually normal strings,like this :
2016-03-23 15:33:47,963 INFO proxy # NEW_JOB MAIN_POOL
But its ends when im starting ethminer. Peers are disconnected and proxy not works.Before its shows that:
Creating one big buffer for the DAG
Loading single big chunk kernels
Mapping one big chunk.
DAG 15:50:56| Generating DAG file. Progress: 0 %
DAG 15:51:06| Generating DAG file. Progress: 1 %
DAG 15:51:18| Generating DAG file. Progress: 2 %
After miner shows hashrate same time (around few hours else) after falling. Im disconnected R9 380 ,removed and recollect twice DAGS. Delete all files and set it again at first from precompile version after from source and reinstall Python and all needed sources. Its not help me here. Im changed ports. And settings in my firewall(also turning off my antivir and firewall its not help ).
My specification:
OS : Windows 7 Ultimate (64 bit) fully updated on that moment.
Etherium stratum proxy version : 0.0.5
My bat file like that :
setx GPU_FORCE_64BIT_PTR 0
setx GPU_MAX_ALLOC_PERCENT 100
setx GPU_MAX_HEAP_SIZE 100
setx GPU_USE_SYNC_OBJECTS 1
ethminer.exe --farm-recheck 400 -G -F http://127.0.0.1:8080/ --cl-local-work 256 --cl-global-work 16384
Open_CL version : 1.2
GPU Kernel version; Tahiti
AMD Drivers version: 16.03
Cards now only few Asus R9 280x DC2T 3GDD5
On Windows also installed AMD Crimson 16 and "GPU Tweak 2" from official vendors sites. Guys may be anyone can help here. Thank you.

@fussler

This comment has been minimized.

Show comment
Hide comment
@fussler

fussler Mar 23, 2016

had the same problem today too on windows.
its very fishy...

fussler commented Mar 23, 2016

had the same problem today too on windows.
its very fishy...

@BobDoe

This comment has been minimized.

Show comment
Hide comment
@BobDoe

BobDoe Mar 23, 2016

yesterday windows 10, windows defender reported a virus it was eth proxy.exe.. Win Defender deleted the eth proxy.exe so I downloaded it again and windows defender taged and deleted the file at download.

found a copy in a zip file and checked it, it was not infected so using it now

BobDoe commented Mar 23, 2016

yesterday windows 10, windows defender reported a virus it was eth proxy.exe.. Win Defender deleted the eth proxy.exe so I downloaded it again and windows defender taged and deleted the file at download.

found a copy in a zip file and checked it, it was not infected so using it now

@fussler

This comment has been minimized.

Show comment
Hide comment
@fussler

fussler Mar 23, 2016

I only wanted to see if windows also did not work.

u use windows without the bullshit defender and wall

fussler commented Mar 23, 2016

I only wanted to see if windows also did not work.

u use windows without the bullshit defender and wall

@Mayhemz

This comment has been minimized.

Show comment
Hide comment
@Mayhemz

Mayhemz Mar 23, 2016

@cgladue
guys having issues with 270x with 2gb do this
add --cl-extragpu-mem 0 when you're not connecting a display on these cards.
this fixed it for me
genoil mentioned this in https://forum.ethereum.org/discussion/2227/cuda-miner

Mayhemz commented Mar 23, 2016

@cgladue
guys having issues with 270x with 2gb do this
add --cl-extragpu-mem 0 when you're not connecting a display on these cards.
this fixed it for me
genoil mentioned this in https://forum.ethereum.org/discussion/2227/cuda-miner

@cgladue

This comment has been minimized.

Show comment
Hide comment
@cgladue

cgladue Mar 23, 2016

@Mayhemz

thanks for the suggestion, i tried it and it didnt help (i use resisters in a dummy plug anyways) it just appears that sapphire R9 270 cards have CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1408867653 which is just (as of last night) over the memory needed to load the full DAG.

i dont think this is going to be fixed unless there is a way to not load the complete dag in one huge file, perhaps chunk the DAG in 2 smaller files or something ? or is there a way to increase the value of CL_DEVICE_MAX_MEM_ALLOC_SIZE ?

cgladue commented Mar 23, 2016

@Mayhemz

thanks for the suggestion, i tried it and it didnt help (i use resisters in a dummy plug anyways) it just appears that sapphire R9 270 cards have CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1408867653 which is just (as of last night) over the memory needed to load the full DAG.

i dont think this is going to be fixed unless there is a way to not load the complete dag in one huge file, perhaps chunk the DAG in 2 smaller files or something ? or is there a way to increase the value of CL_DEVICE_MAX_MEM_ALLOC_SIZE ?

@cgladue

This comment has been minimized.

Show comment
Hide comment
@cgladue

cgladue Mar 23, 2016

@Equinox-

i see at the top you said you:

Fixed support of chunking the DAG on GPUs unable to allocate a continuous block of memory large enough for the entire DAG.

The problem was in the offsets calculated in the CL kernel.
Also removed DAG duplication on chunked buffer mapping.

but seems like still having the issue where my card has 2GB of RAM, but can only alloc 1.4GB max at once, wasnt your fix supposed to allow me to keep mining ? perhaps i dont have the right binaries, where can i download the fixed binaries ?

cgladue commented Mar 23, 2016

@Equinox-

i see at the top you said you:

Fixed support of chunking the DAG on GPUs unable to allocate a continuous block of memory large enough for the entire DAG.

The problem was in the offsets calculated in the CL kernel.
Also removed DAG duplication on chunked buffer mapping.

but seems like still having the issue where my card has 2GB of RAM, but can only alloc 1.4GB max at once, wasnt your fix supposed to allow me to keep mining ? perhaps i dont have the right binaries, where can i download the fixed binaries ?

@Mayhemz

This comment has been minimized.

Show comment
Hide comment
@Mayhemz

Mayhemz Mar 23, 2016

@cgladue
i have 2x r9 270's and this is the command im running as we speak on a win 8 64bit sys
setx GPU_FORCE_64BIT_PTR 0
setx GPU_MAX_HEAP_SIZE 100
setx GPU_USE_SYNC_OBJECTS 1
setx GPU_MAX_ALLOC_PERCENT 100
ethminer.exe -G -F address --cl-local-work 256 --cl-global-work 16384 --cl-extragpu-mem 0

and they both MEM_ALLOC_SIZE: 1408867653 like yours.

Mayhemz commented Mar 23, 2016

@cgladue
i have 2x r9 270's and this is the command im running as we speak on a win 8 64bit sys
setx GPU_FORCE_64BIT_PTR 0
setx GPU_MAX_HEAP_SIZE 100
setx GPU_USE_SYNC_OBJECTS 1
setx GPU_MAX_ALLOC_PERCENT 100
ethminer.exe -G -F address --cl-local-work 256 --cl-global-work 16384 --cl-extragpu-mem 0

and they both MEM_ALLOC_SIZE: 1408867653 like yours.

@cgladue

This comment has been minimized.

Show comment
Hide comment
@cgladue

cgladue Mar 23, 2016

@Mayhemz

What version of ethminer are you using, i am using 0.9.41-genoil-1.0.6

cgladue commented Mar 23, 2016

@Mayhemz

What version of ethminer are you using, i am using 0.9.41-genoil-1.0.6

@Equinox-

This comment has been minimized.

Show comment
Hide comment
@Equinox-

Equinox- Mar 23, 2016

@Genoil I also tried to reduce the number of registers; couldn't get it down below 67. I'll look at what you did and see if I can join the two to get it down to 64.
@cgladue I never published binaries with this fix; you would have to compile them yourself.

@Genoil I also tried to reduce the number of registers; couldn't get it down below 67. I'll look at what you did and see if I can join the two to get it down to 64.
@cgladue I never published binaries with this fix; you would have to compile them yourself.

@Mayhemz

This comment has been minimized.

Show comment
Hide comment
@Mayhemz

Mayhemz Mar 23, 2016

@cgladue
0.9.41-genoil-1.0.6b the one genoil released recently. Aslo works with the old one. i deleted my dag files first, then ran that command as above, also added setx GPU_SINGLE_ALLOC_PERCENT 100 first without --cl-extragpu-mem 0 which failed after downloading the diag, then i removed setx GPU_SINGLE_ALLOC_PERCENT 100 and used --cl-extragpu-mem 0 which worked straight away

Mayhemz commented Mar 23, 2016

@cgladue
0.9.41-genoil-1.0.6b the one genoil released recently. Aslo works with the old one. i deleted my dag files first, then ran that command as above, also added setx GPU_SINGLE_ALLOC_PERCENT 100 first without --cl-extragpu-mem 0 which failed after downloading the diag, then i removed setx GPU_SINGLE_ALLOC_PERCENT 100 and used --cl-extragpu-mem 0 which worked straight away

@cgladue

This comment has been minimized.

Show comment
Hide comment
@cgladue

cgladue Mar 23, 2016

@Mayhemz
i copy and pasted your batch file and same result, error -61 & error -38

@Equinox-
i am running windows is there any instructions on how to compile it myself ? :'(

cgladue commented Mar 23, 2016

@Mayhemz
i copy and pasted your batch file and same result, error -61 & error -38

@Equinox-
i am running windows is there any instructions on how to compile it myself ? :'(

@cgladue

This comment has been minimized.

Show comment
Hide comment
@cgladue

cgladue Mar 23, 2016

@Mayhemz
would you happen to know where i can get 1.0.6b google has no results

cgladue commented Mar 23, 2016

@Mayhemz
would you happen to know where i can get 1.0.6b google has no results

@Mayhemz

This comment has been minimized.

Show comment
Hide comment
@Mayhemz

Mayhemz Mar 23, 2016

@cgladue
what windows are you running? and are you doing this through a monitor thats connected to the pc?

Mayhemz commented Mar 23, 2016

@cgladue
what windows are you running? and are you doing this through a monitor thats connected to the pc?

@cgladue

This comment has been minimized.

Show comment
Hide comment
@cgladue

cgladue Mar 23, 2016

@Mayhemz
i am running Windows 8.1 Enterprise 8GB RAM 64-Bit, there are VGA Dummy plugs (with resistors) to trick the card into thinking there is a monitor connected, i am connected via teamviewer.

cgladue commented Mar 23, 2016

@Mayhemz
i am running Windows 8.1 Enterprise 8GB RAM 64-Bit, there are VGA Dummy plugs (with resistors) to trick the card into thinking there is a monitor connected, i am connected via teamviewer.

@cgladue

This comment has been minimized.

Show comment
Hide comment
@cgladue

cgladue Mar 23, 2016

@Mayhemz
IT WORKED!!! 1.0.6b worked by just swapping out the binaries !! thank you so much for the help. hopefully it will run ok up till the DAG grows to 2GB

cgladue commented Mar 23, 2016

@Mayhemz
IT WORKED!!! 1.0.6b worked by just swapping out the binaries !! thank you so much for the help. hopefully it will run ok up till the DAG grows to 2GB

@Sharapoff

This comment has been minimized.

Show comment
Hide comment
@Sharapoff

Sharapoff Mar 23, 2016

Guys WIn 7 64bit Ultimate. Im mined more than month without any problems with active Agnitum Outpost Firewall. And 320Gb HDD with free space more than 200Gb. Ethminer needs only 1348 Mb (info from "GPU Tweak Monitor" App) all that time so here not any memory problem in my situation(R9 280x each have 3gb). Its issues im get after detecting new R9 380 and R9 280x on one rig. Im not even start proxy or miner. And yes on of the card periodically connecting to my monitor PC via VGA slot.But for 3 years of works its a first time im getting same problem. Im cannot really understand where here a main error in OS in proxy conf or in miner may be in my Internet settings but without starting Ethminer already proxy works good(but anyway its shown that Error look here guys):
"Traceback (most recent call last):
Failure: stratum.custom_exceptions.TransportException: SocketTransportClientFact
ory connection timed out."
After that its start to find new jobs.

Guys WIn 7 64bit Ultimate. Im mined more than month without any problems with active Agnitum Outpost Firewall. And 320Gb HDD with free space more than 200Gb. Ethminer needs only 1348 Mb (info from "GPU Tweak Monitor" App) all that time so here not any memory problem in my situation(R9 280x each have 3gb). Its issues im get after detecting new R9 380 and R9 280x on one rig. Im not even start proxy or miner. And yes on of the card periodically connecting to my monitor PC via VGA slot.But for 3 years of works its a first time im getting same problem. Im cannot really understand where here a main error in OS in proxy conf or in miner may be in my Internet settings but without starting Ethminer already proxy works good(but anyway its shown that Error look here guys):
"Traceback (most recent call last):
Failure: stratum.custom_exceptions.TransportException: SocketTransportClientFact
ory connection timed out."
After that its start to find new jobs.

Equinox- added some commits Mar 9, 2016

Fixed chunking to work on graphics cards with enough overall memory b…
…ut not enough in a single block.

The problem was in the offsets calculated in the CL kernel.
Also removed DAG duplication on chunked buffer updates.
@Equinox-

This comment has been minimized.

Show comment
Hide comment
@Equinox-

Equinox- Mar 30, 2016

@isghe Does it ever find a share on the public blockchain?

@isghe Does it ever find a share on the public blockchain?

@isghe

This comment has been minimized.

Show comment
Hide comment
@isghe

isghe Mar 31, 2016

@Equinox- Yes I did: successfully with CPU mining, for nothing with CHUNK GPU miner, even if CHUNK GPU reports is almost 10 time faster than CPU miner. I don't think the problem is related to the private or public blockchain itself, but really inside the handling of the CHUNK GPU algorithm or itself.
And I am asking to myself right now... how many people is running CHUNK GPU algorithm for nothing right now? And how many people running CHUNK GPU with success?

isghe commented Mar 31, 2016

@Equinox- Yes I did: successfully with CPU mining, for nothing with CHUNK GPU miner, even if CHUNK GPU reports is almost 10 time faster than CPU miner. I don't think the problem is related to the private or public blockchain itself, but really inside the handling of the CHUNK GPU algorithm or itself.
And I am asking to myself right now... how many people is running CHUNK GPU algorithm for nothing right now? And how many people running CHUNK GPU with success?

@Equinox-

This comment has been minimized.

Show comment
Hide comment
@Equinox-

Equinox- Mar 31, 2016

@isghe I just ran it on my AMD GPU with chunking and it managed to find and submit a solution. You can try running it from my reg_reduce branch; that has a debugging utility that allows you to compare the output hash values for the GPU with the CPU. (Only the last 32 bits)

@isghe I just ran it on my AMD GPU with chunking and it managed to find and submit a solution. You can try running it from my reg_reduce branch; that has a debugging utility that allows you to compare the output hash values for the GPU with the CPU. (Only the last 32 bits)

@isghe

This comment has been minimized.

Show comment
Hide comment
@isghe

isghe Mar 31, 2016

@Equinox-
should be the GPU hash output and CPU hash output the same?

[OPENCL]:Printing program log
[OPENCL]:<program source>:102:16: warning: unused variable 'b4_0'
        uint4* b4_0 = (uint4*) b;
               ^

[OPENCL]:Failed to allocate 1 big chunk. Max allocateable memory is 402653184. Trying to allocate 4 chunks.
[OPENCL]:Creating buffer for chunk 0 size=268434944
[OPENCL]:Creating buffer for chunk 1 size=268434944
[OPENCL]:Creating buffer for chunk 2 size=268434944
[OPENCL]:Creating buffer for chunk 3 size=268435072
[OPENCL]:Loading chunk kernels
[OPENCL]:Mapping chunk 0 with size=268434944 and offset=0
[OPENCL]:Mapping chunk 1 with size=268434944 and offset=268434944
[OPENCL]:Mapping chunk 2 with size=268434944 and offset=536869888
[OPENCL]:Mapping chunk 3 with size=268435072 and offset=805304832
[OPENCL]:Creating buffer for header.
[OPENCL]:Creating mining buffer 0
[OPENCL]:Creating mining buffer 1
miner  03:46:17.000|  Mining on PoWhash #37480358… : 0 H/s = 0 hashes / 0.501 s
GPU lid=0, nonce = 1ff62c4ed0b06174, hash = d89467f3
CPU nonce=1ff62c4ed0b06174, hash=548dc4e6278a433b
GPU lid=1, nonce = 1ff62c4ed0b06175, hash = d418455f
CPU nonce=1ff62c4ed0b06175, hash=579d1ad53253ca7f
GPU lid=2, nonce = 1ff62c4ed0b06176, hash = 164de5d3
CPU nonce=1ff62c4ed0b06176, hash=358168a9a74591f
GPU lid=3, nonce = 1ff62c4ed0b06177, hash = 2fc9e59
CPU nonce=1ff62c4ed0b06177, hash=31dc602173223c39
...
GPU lid=58, nonce = 1ff62c4ed0d861ae, hash = c877a69
CPU nonce=1ff62c4ed0d861ae, hash=aea1114508de4c7f
GPU lid=59, nonce = 1ff62c4ed0d861af, hash = 5da69d25
CPU nonce=1ff62c4ed0d861af, hash=25987aa897198029
GPU lid=60, nonce = 1ff62c4ed0d861b0, hash = bd081470
CPU nonce=1ff62c4ed0d861b0, hash=f006eda3929e2144
GPU lid=61, nonce = 1ff62c4ed0d861b1, hash = ed4154bc
CPU nonce=1ff62c4ed0d861b1, hash=c26fb893031010a
GPU lid=62, nonce = 1ff62c4ed0d861b2, hash = c42de7fb
CPU nonce=1ff62c4ed0d861b2, hash=14e8c9731d570027
miner  03:46:18.509|  Mining on PoWhash #37480358… : 2092966 H/s = 1048576 hashes / 0.501 s

isghe commented Mar 31, 2016

@Equinox-
should be the GPU hash output and CPU hash output the same?

[OPENCL]:Printing program log
[OPENCL]:<program source>:102:16: warning: unused variable 'b4_0'
        uint4* b4_0 = (uint4*) b;
               ^

[OPENCL]:Failed to allocate 1 big chunk. Max allocateable memory is 402653184. Trying to allocate 4 chunks.
[OPENCL]:Creating buffer for chunk 0 size=268434944
[OPENCL]:Creating buffer for chunk 1 size=268434944
[OPENCL]:Creating buffer for chunk 2 size=268434944
[OPENCL]:Creating buffer for chunk 3 size=268435072
[OPENCL]:Loading chunk kernels
[OPENCL]:Mapping chunk 0 with size=268434944 and offset=0
[OPENCL]:Mapping chunk 1 with size=268434944 and offset=268434944
[OPENCL]:Mapping chunk 2 with size=268434944 and offset=536869888
[OPENCL]:Mapping chunk 3 with size=268435072 and offset=805304832
[OPENCL]:Creating buffer for header.
[OPENCL]:Creating mining buffer 0
[OPENCL]:Creating mining buffer 1
miner  03:46:17.000|  Mining on PoWhash #37480358… : 0 H/s = 0 hashes / 0.501 s
GPU lid=0, nonce = 1ff62c4ed0b06174, hash = d89467f3
CPU nonce=1ff62c4ed0b06174, hash=548dc4e6278a433b
GPU lid=1, nonce = 1ff62c4ed0b06175, hash = d418455f
CPU nonce=1ff62c4ed0b06175, hash=579d1ad53253ca7f
GPU lid=2, nonce = 1ff62c4ed0b06176, hash = 164de5d3
CPU nonce=1ff62c4ed0b06176, hash=358168a9a74591f
GPU lid=3, nonce = 1ff62c4ed0b06177, hash = 2fc9e59
CPU nonce=1ff62c4ed0b06177, hash=31dc602173223c39
...
GPU lid=58, nonce = 1ff62c4ed0d861ae, hash = c877a69
CPU nonce=1ff62c4ed0d861ae, hash=aea1114508de4c7f
GPU lid=59, nonce = 1ff62c4ed0d861af, hash = 5da69d25
CPU nonce=1ff62c4ed0d861af, hash=25987aa897198029
GPU lid=60, nonce = 1ff62c4ed0d861b0, hash = bd081470
CPU nonce=1ff62c4ed0d861b0, hash=f006eda3929e2144
GPU lid=61, nonce = 1ff62c4ed0d861b1, hash = ed4154bc
CPU nonce=1ff62c4ed0d861b1, hash=c26fb893031010a
GPU lid=62, nonce = 1ff62c4ed0d861b2, hash = c42de7fb
CPU nonce=1ff62c4ed0d861b2, hash=14e8c9731d570027
miner  03:46:18.509|  Mining on PoWhash #37480358… : 2092966 H/s = 1048576 hashes / 0.501 s
@Equinox-

This comment has been minimized.

Show comment
Hide comment
@Equinox-

Equinox- Mar 31, 2016

The CPU hash should end with the GPU hash. I know the kernel you're using
works on AMD cards so I'm at a loss to explain it. It's possible the
kernel still uses amd_bitalign but I thought I had disabled that.
On Mar 30, 2016 18:54, "isidoro ghezzi" notifications@github.com wrote:

@Equinox- https://github.com/Equinox-
should be the GPU hash output and CPU hash output the same?

[OPENCL]:Printing program log
[OPENCL]::102:16: warning: unused variable 'b4_0'
uint4* b4_0 = (uint4*) b;
^

[OPENCL]:Failed to allocate 1 big chunk. Max allocateable memory is 402653184. Trying to allocate 4 chunks.
[OPENCL]:Creating buffer for chunk 0 size=268434944
[OPENCL]:Creating buffer for chunk 1 size=268434944
[OPENCL]:Creating buffer for chunk 2 size=268434944
[OPENCL]:Creating buffer for chunk 3 size=268435072
[OPENCL]:Loading chunk kernels
[OPENCL]:Mapping chunk 0 with size=268434944 and offset=0
[OPENCL]:Mapping chunk 1 with size=268434944 and offset=268434944
[OPENCL]:Mapping chunk 2 with size=268434944 and offset=536869888
[OPENCL]:Mapping chunk 3 with size=268435072 and offset=805304832
[OPENCL]:Creating buffer for header.
[OPENCL]:Creating mining buffer 0
[OPENCL]:Creating mining buffer 1
miner 03:46:17.000| Mining on PoWhash #37480358… : 0 H/s = 0 hashes / 0.501 s
GPU lid=0, nonce = 1ff62c4ed0b06174, hash = d89467f3
CPU nonce=1ff62c4ed0b06174, hash=548dc4e6278a433b
GPU lid=1, nonce = 1ff62c4ed0b06175, hash = d418455f
CPU nonce=1ff62c4ed0b06175, hash=579d1ad53253ca7f
GPU lid=2, nonce = 1ff62c4ed0b06176, hash = 164de5d3
CPU nonce=1ff62c4ed0b06176, hash=358168a9a74591f
GPU lid=3, nonce = 1ff62c4ed0b06177, hash = 2fc9e59
CPU nonce=1ff62c4ed0b06177, hash=31dc602173223c39
...
GPU lid=58, nonce = 1ff62c4ed0d861ae, hash = c877a69
CPU nonce=1ff62c4ed0d861ae, hash=aea1114508de4c7f
GPU lid=59, nonce = 1ff62c4ed0d861af, hash = 5da69d25
CPU nonce=1ff62c4ed0d861af, hash=25987aa897198029
GPU lid=60, nonce = 1ff62c4ed0d861b0, hash = bd081470
CPU nonce=1ff62c4ed0d861b0, hash=f006eda3929e2144
GPU lid=61, nonce = 1ff62c4ed0d861b1, hash = ed4154bc
CPU nonce=1ff62c4ed0d861b1, hash=c26fb893031010a
GPU lid=62, nonce = 1ff62c4ed0d861b2, hash = c42de7fb
CPU nonce=1ff62c4ed0d861b2, hash=14e8c9731d570027
miner 03:46:18.509| Mining on PoWhash #37480358… : 2092966 H/s = 1048576 hashes / 0.501 s


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#203 (comment)

The CPU hash should end with the GPU hash. I know the kernel you're using
works on AMD cards so I'm at a loss to explain it. It's possible the
kernel still uses amd_bitalign but I thought I had disabled that.
On Mar 30, 2016 18:54, "isidoro ghezzi" notifications@github.com wrote:

@Equinox- https://github.com/Equinox-
should be the GPU hash output and CPU hash output the same?

[OPENCL]:Printing program log
[OPENCL]::102:16: warning: unused variable 'b4_0'
uint4* b4_0 = (uint4*) b;
^

[OPENCL]:Failed to allocate 1 big chunk. Max allocateable memory is 402653184. Trying to allocate 4 chunks.
[OPENCL]:Creating buffer for chunk 0 size=268434944
[OPENCL]:Creating buffer for chunk 1 size=268434944
[OPENCL]:Creating buffer for chunk 2 size=268434944
[OPENCL]:Creating buffer for chunk 3 size=268435072
[OPENCL]:Loading chunk kernels
[OPENCL]:Mapping chunk 0 with size=268434944 and offset=0
[OPENCL]:Mapping chunk 1 with size=268434944 and offset=268434944
[OPENCL]:Mapping chunk 2 with size=268434944 and offset=536869888
[OPENCL]:Mapping chunk 3 with size=268435072 and offset=805304832
[OPENCL]:Creating buffer for header.
[OPENCL]:Creating mining buffer 0
[OPENCL]:Creating mining buffer 1
miner 03:46:17.000| Mining on PoWhash #37480358… : 0 H/s = 0 hashes / 0.501 s
GPU lid=0, nonce = 1ff62c4ed0b06174, hash = d89467f3
CPU nonce=1ff62c4ed0b06174, hash=548dc4e6278a433b
GPU lid=1, nonce = 1ff62c4ed0b06175, hash = d418455f
CPU nonce=1ff62c4ed0b06175, hash=579d1ad53253ca7f
GPU lid=2, nonce = 1ff62c4ed0b06176, hash = 164de5d3
CPU nonce=1ff62c4ed0b06176, hash=358168a9a74591f
GPU lid=3, nonce = 1ff62c4ed0b06177, hash = 2fc9e59
CPU nonce=1ff62c4ed0b06177, hash=31dc602173223c39
...
GPU lid=58, nonce = 1ff62c4ed0d861ae, hash = c877a69
CPU nonce=1ff62c4ed0d861ae, hash=aea1114508de4c7f
GPU lid=59, nonce = 1ff62c4ed0d861af, hash = 5da69d25
CPU nonce=1ff62c4ed0d861af, hash=25987aa897198029
GPU lid=60, nonce = 1ff62c4ed0d861b0, hash = bd081470
CPU nonce=1ff62c4ed0d861b0, hash=f006eda3929e2144
GPU lid=61, nonce = 1ff62c4ed0d861b1, hash = ed4154bc
CPU nonce=1ff62c4ed0d861b1, hash=c26fb893031010a
GPU lid=62, nonce = 1ff62c4ed0d861b2, hash = c42de7fb
CPU nonce=1ff62c4ed0d861b2, hash=14e8c9731d570027
miner 03:46:18.509| Mining on PoWhash #37480358… : 2092966 H/s = 1048576 hashes / 0.501 s


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#203 (comment)

@clintljohnson

This comment has been minimized.

Show comment
Hide comment
@clintljohnson

clintljohnson Mar 31, 2016

I have two Sapphire 7870's (2GB RAM each) and have this same problem.

I have two Sapphire 7870's (2GB RAM each) and have this same problem.

@Genoil

This comment has been minimized.

Show comment
Hide comment
@Genoil

Genoil Mar 31, 2016

Contributor

@Equinox- finally found some time to fire up ye olde 7950 to see if something could be done about that reg count. I started changing bits of code by looking at yours and ended up with something similar but totally different :). Down from 78 to 70 regs now, although I made a mistake somewhere causing no valid solutions. But that should be fixable. Strangely enough, when I disable the bitalign in your kernel, the regcount goes from 76 to 78 (where's 67?), in my code it does nothing. Also, the unrolling works differently.

Will share when I get the bug fixed...

Contributor

Genoil commented Mar 31, 2016

@Equinox- finally found some time to fire up ye olde 7950 to see if something could be done about that reg count. I started changing bits of code by looking at yours and ended up with something similar but totally different :). Down from 78 to 70 regs now, although I made a mistake somewhere causing no valid solutions. But that should be fixable. Strangely enough, when I disable the bitalign in your kernel, the regcount goes from 76 to 78 (where's 67?), in my code it does nothing. Also, the unrolling works differently.

Will share when I get the bug fixed...

@Equinox-

This comment has been minimized.

Show comment
Hide comment
@Equinox-

Equinox- Mar 31, 2016

@Genoil I think when I got it to 67 it had stopped working and I had no idea where the mistake was so I had to go back in history quite a bit to figure out where the mistake was.
I have your kernel down at 69 but it also isn't working; I'll have to see if I can get that one working.
I'm honestly unsure if it is possible to even get it down to 64 registers, since 50 registers are required to store the current state, and a minimum of 11 are required during theta (by my count). I could probably get it down to 64 by writing all the ISA for keccak by hand but this means the compiler can only make 3 mistakes.
@clintljohnson I have no idea why it isn't working on your cards aren't working. I have 7770s, which to my knowledge use the same (southern islands) ISA as the 78xx series.

@Genoil I think when I got it to 67 it had stopped working and I had no idea where the mistake was so I had to go back in history quite a bit to figure out where the mistake was.
I have your kernel down at 69 but it also isn't working; I'll have to see if I can get that one working.
I'm honestly unsure if it is possible to even get it down to 64 registers, since 50 registers are required to store the current state, and a minimum of 11 are required during theta (by my count). I could probably get it down to 64 by writing all the ISA for keccak by hand but this means the compiler can only make 3 mistakes.
@clintljohnson I have no idea why it isn't working on your cards aren't working. I have 7770s, which to my knowledge use the same (southern islands) ISA as the 78xx series.

@Genoil

This comment has been minimized.

Show comment
Hide comment
@Genoil

Genoil Apr 1, 2016

Contributor

@Equinox- yeah I got to 57 at some point, but it doesn't count if it doesn't work :). i started over today and discovered the mistake, which you probably also made. Integrating rho/pi and chi to reduce the size of the b array just doesn't work.

On the upside, i managed to tune the opencl kernel for NVidia in such a way, that it is as fast as my CUDA kernel. @bobsummerwill this opens an oppurtunity for me to switch over to webthree-umbrella, get rid of CUDA and move the extra command-line goodies from 1.0.6 into the official ethminer. I'll sleep on that one :)

Contributor

Genoil commented Apr 1, 2016

@Equinox- yeah I got to 57 at some point, but it doesn't count if it doesn't work :). i started over today and discovered the mistake, which you probably also made. Integrating rho/pi and chi to reduce the size of the b array just doesn't work.

On the upside, i managed to tune the opencl kernel for NVidia in such a way, that it is as fast as my CUDA kernel. @bobsummerwill this opens an oppurtunity for me to switch over to webthree-umbrella, get rid of CUDA and move the extra command-line goodies from 1.0.6 into the official ethminer. I'll sleep on that one :)

@TheDeafMute

This comment has been minimized.

Show comment
Hide comment
@TheDeafMute

TheDeafMute Apr 2, 2016

So... What do I do to make my 7870 work...

So... What do I do to make my 7870 work...

@Equinox-

This comment has been minimized.

Show comment
Hide comment
@Equinox-

Equinox- Apr 2, 2016

Without a 7870 I can't say why it doesn't work. As it is I'd suggest
building it from each of my branches to see if either works (reg_reduce and
develop). And if neither works I don't have any ideas.
On Apr 2, 2016 00:10, "TheDeafMute" notifications@github.com wrote:

So... What do I do to make my 7870 work...


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#203 (comment)

Equinox- commented Apr 2, 2016

Without a 7870 I can't say why it doesn't work. As it is I'd suggest
building it from each of my branches to see if either works (reg_reduce and
develop). And if neither works I don't have any ideas.
On Apr 2, 2016 00:10, "TheDeafMute" notifications@github.com wrote:

So... What do I do to make my 7870 work...


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#203 (comment)

@AMPER228

This comment has been minimized.

Show comment
Hide comment
@AMPER228

AMPER228 Apr 2, 2016

(MSI R9 380 2Gb) Driv 15.7.1, Win7 x64
image

My (Easy) not work string: gives an error message (-61) and (-38)
setx GPU_FORCE_64BIT_PTR 0
setx GPU_MAX_HEAP_SIZE 100
setx GPU_USE_SYNC_OBJECTS 1
SET GPU_SINGLE_ALLOC_PERCENT = 100
ethminer.exe -G -F http://eth-eu.dwarfpool.com:80/wallet --opencl-platform 0.

My NEW string: NOT gives an error message (-61) and (-38), BUT it produces air at speeds 13MH
ethminer.exe -G --farm http://eth-Ru.dwarfpool.com:80/wallet/AMPER228 --cl-global-work 16384 -t 3 --cl-local-work 256 --farm-recheck 400 --opencl-platform 0
If we replace (-t 3) to (-t 2) and (-t 1), it still shows 13Mh

image

WTF????!!!! HELP!!!

AMPER228 commented Apr 2, 2016

(MSI R9 380 2Gb) Driv 15.7.1, Win7 x64
image

My (Easy) not work string: gives an error message (-61) and (-38)
setx GPU_FORCE_64BIT_PTR 0
setx GPU_MAX_HEAP_SIZE 100
setx GPU_USE_SYNC_OBJECTS 1
SET GPU_SINGLE_ALLOC_PERCENT = 100
ethminer.exe -G -F http://eth-eu.dwarfpool.com:80/wallet --opencl-platform 0.

My NEW string: NOT gives an error message (-61) and (-38), BUT it produces air at speeds 13MH
ethminer.exe -G --farm http://eth-Ru.dwarfpool.com:80/wallet/AMPER228 --cl-global-work 16384 -t 3 --cl-local-work 256 --farm-recheck 400 --opencl-platform 0
If we replace (-t 3) to (-t 2) and (-t 1), it still shows 13Mh

image

WTF????!!!! HELP!!!

@AMPER228

This comment has been minimized.

Show comment
Hide comment
@AMPER228

AMPER228 Apr 2, 2016

My statistics on " http://dwarfpool.com " equal to 0!!!!!
image

This race - the result of work of the laptop.
I assumed that my purse is not working, but on a laptop statistics appeared. By the way on the laptop windows 10.
Am I doing something wrong ???

AMPER228 commented Apr 2, 2016

My statistics on " http://dwarfpool.com " equal to 0!!!!!
image

This race - the result of work of the laptop.
I assumed that my purse is not working, but on a laptop statistics appeared. By the way on the laptop windows 10.
Am I doing something wrong ???

@Equinox-

This comment has been minimized.

Show comment
Hide comment
@Equinox-

Equinox- Apr 2, 2016

I'm honestly lost here, since there are now multiple AMD cards this appears to not work on.
@AMPER228 Are you sure your card is even using chunking?
I suppose I could release a binary that I know works on my 7770 for testing on other GPUs. Would this be helpful?

Equinox- commented Apr 2, 2016

I'm honestly lost here, since there are now multiple AMD cards this appears to not work on.
@AMPER228 Are you sure your card is even using chunking?
I suppose I could release a binary that I know works on my 7770 for testing on other GPUs. Would this be helpful?

@gogolplus

This comment has been minimized.

Show comment
Hide comment
@gogolplus

gogolplus Apr 3, 2016

setx GPU_FORCE_64BIT_PTR 0
setx GPU_MAX_HEAP_SIZE 100
setx GPU_USE_SYNC_OBJECTS 1
SET GPU_SINGLE_ALLOC_PERCENT = 100

form me its solved re 270x 2gb

setx GPU_FORCE_64BIT_PTR 0
setx GPU_MAX_HEAP_SIZE 100
setx GPU_USE_SYNC_OBJECTS 1
SET GPU_SINGLE_ALLOC_PERCENT = 100

form me its solved re 270x 2gb

@AMPER228

This comment has been minimized.

Show comment
Hide comment
@AMPER228

AMPER228 Apr 3, 2016

Equinox Would this be helpful?
Yes, it will be useful if you explain what actions have to perform (for testing).
And if it will work, it will be one of the solutions.
And I'm not sure your card is even using chunking.

AMPER228 commented Apr 3, 2016

Equinox Would this be helpful?
Yes, it will be useful if you explain what actions have to perform (for testing).
And if it will work, it will be one of the solutions.
And I'm not sure your card is even using chunking.

@AMPER228

This comment has been minimized.

Show comment
Hide comment
@AMPER228

AMPER228 Apr 3, 2016

(MSI R9 380 2Gb) Drivers 15.7.1, Win7 x64
RAM 12Gb. AMD FX (tm)-4100 Quad-Core Processor3.60 GHz
Everything worked.
I install the drivers graphics card (in the first series of 15.7.1)
I install a newer version of the Microsoft .NET Framework 4.5.3
I disable Windows Firewall
image

My statistic
image

AMPER228 commented Apr 3, 2016

(MSI R9 380 2Gb) Drivers 15.7.1, Win7 x64
RAM 12Gb. AMD FX (tm)-4100 Quad-Core Processor3.60 GHz
Everything worked.
I install the drivers graphics card (in the first series of 15.7.1)
I install a newer version of the Microsoft .NET Framework 4.5.3
I disable Windows Firewall
image

My statistic
image

@Equinox-

This comment has been minimized.

Show comment
Hide comment
@Equinox-

Equinox- Apr 3, 2016

I've built some binaries that force chunking here, both the debugging and the release binary. If you're having troubles with this code feel free to try these.
https://github.com/Equinox-/libethereum/releases/tag/0.1_debug

Equinox- commented Apr 3, 2016

I've built some binaries that force chunking here, both the debugging and the release binary. If you're having troubles with this code feel free to try these.
https://github.com/Equinox-/libethereum/releases/tag/0.1_debug

@neil-jones

This comment has been minimized.

Show comment
Hide comment
@neil-jones

neil-jones Apr 3, 2016

I just tried the new binary and it's still not committing new work. Ran the debug binary, and the CPU hash doesn't end with GPU hash.

GPU lid=44, nonce = ecaec71b49aa6679, hash = 4565752f
CPU nonce=ecaec71b49aa6679, hash=920a07baf47232b4
GPU lid=45, nonce = ecaec71b49aa667a, hash = bd30ba33
CPU nonce=ecaec71b49aa667a, hash=4ede7c0613894c2d
GPU lid=46, nonce = ecaec71b49aa667b, hash = 60196cfc
CPU nonce=ecaec71b49aa667b, hash=8fb30fc4535ccdff
GPU lid=47, nonce = ecaec71b49aa667c, hash = 3f76fc8e
CPU nonce=ecaec71b49aa667c, hash=b1f7297999a6898
GPU lid=48, nonce = ecaec71b49aa667d, hash = 775ba8f3
CPU nonce=ecaec71b49aa667d, hash=fa6e9ac6c169471a
GPU lid=49, nonce = ecaec71b49aa667e, hash = 70863fbb
CPU nonce=ecaec71b49aa667e, hash=ccaa5f3760c64f72
GPU lid=50, nonce = ecaec71b49aa667f, hash = 977983ef
CPU nonce=ecaec71b49aa667f, hash=d3259ee2ceb0b24a
GPU lid=51, nonce = ecaec71b49aa6680, hash = 74d7092c
CPU nonce=ecaec71b49aa6680, hash=b04ecb3beb0b44dd
GPU lid=52, nonce = ecaec71b49aa6681, hash = fde5c5ee
CPU nonce=ecaec71b49aa6681, hash=332c338bcae1fe81
GPU lid=53, nonce = ecaec71b49aa6682, hash = a9596c09
CPU nonce=ecaec71b49aa6682, hash=190d8a3cbaad09dd

What does this mean? I'm running an AMD 7570 with 2GB RAM

I just tried the new binary and it's still not committing new work. Ran the debug binary, and the CPU hash doesn't end with GPU hash.

GPU lid=44, nonce = ecaec71b49aa6679, hash = 4565752f
CPU nonce=ecaec71b49aa6679, hash=920a07baf47232b4
GPU lid=45, nonce = ecaec71b49aa667a, hash = bd30ba33
CPU nonce=ecaec71b49aa667a, hash=4ede7c0613894c2d
GPU lid=46, nonce = ecaec71b49aa667b, hash = 60196cfc
CPU nonce=ecaec71b49aa667b, hash=8fb30fc4535ccdff
GPU lid=47, nonce = ecaec71b49aa667c, hash = 3f76fc8e
CPU nonce=ecaec71b49aa667c, hash=b1f7297999a6898
GPU lid=48, nonce = ecaec71b49aa667d, hash = 775ba8f3
CPU nonce=ecaec71b49aa667d, hash=fa6e9ac6c169471a
GPU lid=49, nonce = ecaec71b49aa667e, hash = 70863fbb
CPU nonce=ecaec71b49aa667e, hash=ccaa5f3760c64f72
GPU lid=50, nonce = ecaec71b49aa667f, hash = 977983ef
CPU nonce=ecaec71b49aa667f, hash=d3259ee2ceb0b24a
GPU lid=51, nonce = ecaec71b49aa6680, hash = 74d7092c
CPU nonce=ecaec71b49aa6680, hash=b04ecb3beb0b44dd
GPU lid=52, nonce = ecaec71b49aa6681, hash = fde5c5ee
CPU nonce=ecaec71b49aa6681, hash=332c338bcae1fe81
GPU lid=53, nonce = ecaec71b49aa6682, hash = a9596c09
CPU nonce=ecaec71b49aa6682, hash=190d8a3cbaad09dd

What does this mean? I'm running an AMD 7570 with 2GB RAM

@Equinox-

This comment has been minimized.

Show comment
Hide comment
@Equinox-

Equinox- Apr 3, 2016

Not sure. Both those binaries work on my 7770s, so I'm unsure why they don't work on your card. What version of the AMD drivers do you have, what arguments are you using to launch, and could I get some more info on your exact card?

Equinox- commented Apr 3, 2016

Not sure. Both those binaries work on my 7770s, so I'm unsure why they don't work on your card. What version of the AMD drivers do you have, what arguments are you using to launch, and could I get some more info on your exact card?

@Equinox-

This comment has been minimized.

Show comment
Hide comment
@Equinox-

Equinox- Apr 3, 2016

Interesting. I just used the --opencl-device flag with ethminer and it failed to work; I'll try to figure that out.

Equinox- commented Apr 3, 2016

Interesting. I just used the --opencl-device flag with ethminer and it failed to work; I'll try to figure that out.

@neil-jones

This comment has been minimized.

Show comment
Hide comment
@neil-jones

neil-jones Apr 3, 2016

Driver version is 15.200.1045.0, and launch args are "--cl-local-work 64 --cl-global-work 4096". This is my card, except mine is the 2GB version. https://www.techpowerup.com/gpudb/b692/pegatron-hd-7570.html

Driver version is 15.200.1045.0, and launch args are "--cl-local-work 64 --cl-global-work 4096". This is my card, except mine is the 2GB version. https://www.techpowerup.com/gpudb/b692/pegatron-hd-7570.html

@Equinox-

This comment has been minimized.

Show comment
Hide comment
@Equinox-

Equinox- Apr 3, 2016

What does ethminer --list-devices show, and what are the first 20 or so lines of output by ethminer when you run it?

Equinox- commented Apr 3, 2016

What does ethminer --list-devices show, and what are the first 20 or so lines of output by ethminer when you run it?

@isghe

This comment has been minimized.

Show comment
Hide comment
@isghe

isghe Apr 3, 2016

The risk that some GPU are burning working for nothing is too much high. I think we should concentrate creating a unit test assuring that GPU algorithm is working good, before starting the GPU mining; both for full DAG and chunk DAG.

isghe commented Apr 3, 2016

The risk that some GPU are burning working for nothing is too much high. I think we should concentrate creating a unit test assuring that GPU algorithm is working good, before starting the GPU mining; both for full DAG and chunk DAG.

@neil-jones

This comment has been minimized.

Show comment
Hide comment
@neil-jones

neil-jones Apr 3, 2016

ethminer --list-devices returns:

Listing OpenCL devices.
FORMAT: [deviceID] deviceName
[0] Turks
CL_DEVICE_TYPE: GPU
CL_DEVICE_GLOBAL_MEM_SIZE: 2147483648
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 536870912
CL_DEVICE_MAX_WORK_GROUP_SIZE: 256

Here's the first bit of output:

Found suitable OpenCL device [Turks] with 2147483648 bytes of GPU memory
miner 19:41:18.318|main Getting work package...
miner 19:41:18.739|main Grabbing DAG for #63ca6f54…
miner 19:41:20.360|main Got work package:
i 19:41:20.360| Loading full DAG of seedhash: #67f3589a…
miner 19:41:20.362|main Header-hash: d118a1852b2d2e6800ad6fe232b025bff118485e82491ef3776f2c0a71c80fbd
miner 19:41:20.381|main Seedhash: 63ca6f54b1af76dd4df3b908cee464ff1f212f08352cbe7eb4422806bb0c7885
miner 19:41:20.389|main Target: 0000000225c17d04dad2965cc5a02a23e254c0c3f75d9178046aeb27ce1ca574
i 19:41:20.400|gpuminer0 workLoop 0 #00000000… #63ca6f54…
i 19:41:20.405|gpuminer0 Initialising miner...
miner 19:41:20.901|main Mining on PoWhash #d118a185… : 0 H/s = 0 hashes / 0.5 s
Using platform: AMD Accelerated Parallel Processing
i 19:41:21.969| Full DAG loaded
Using device: Turks(OpenCL 1.2 AMD-APP (1800.5))
miner 19:41:21.994|main Got work package:
miner 19:41:21.997|main Header-hash: 4667da13b8e87f49946646e5ff7b422eefa6b379343a7f1fb81b1d8120be5d0d
miner 19:41:22.004|main Seedhash: 63ca6f54b1af76dd4df3b908cee464ff1f212f08352cbe7eb4422806bb0c7885
miner 19:41:22.010|main Target: 0000000225c17d04dad2965cc5a02a23e254c0c3f75d9178046aeb27ce1ca574
Printing program log
"C:\Users\neil\AppData\Local\Temp\OCL39D.tmp.cl", line 117: warning: variable
"b4_0" was declared but never referenced
uint4* b4_0 = (uint4*) b;
^

Failed to allocate 1 big chunk. Max allocateable memory is 536870912. Trying to allocate 4 chunks.
Creating buffer for chunk 0 size=356515584
Creating buffer for chunk 1 size=356515584
Creating buffer for chunk 2 size=356515584
Creating buffer for chunk 3 size=356515712
Loading chunk kernels
Mapping chunk 0 with size=356515584 and offset=0
Mapping chunk 1 with size=356515584 and offset=356515584
Mapping chunk 2 with size=356515584 and offset=713031168
Mapping chunk 3 with size=356515712 and offset=1069546752
Creating buffer for header.
Creating mining buffer 0
Creating mining buffer 1
i 19:41:36.124|gpuminer0 workLoop 1 #63ca6f54… #63ca6f54…
miner 19:41:36.630|main Mining on PoWhash #4667da13… : 520126 H/s = 262144 hashes / 0.504 s
miner 19:41:37.353|main Got work package:
miner 19:41:37.355|main Header-hash: d33c13624abc788baf82f18712733c337a564169a0814441ae6e8beffbe0a398
miner 19:41:37.361|main Seedhash: 63ca6f54b1af76dd4df3b908cee464ff1f212f08352cbe7eb4422806bb0c7885
miner 19:41:37.368|main Target: 0000000225c17d04dad2965cc5a02a23e254c0c3f75d9178046aeb27ce1ca574
i 19:41:37.567|gpuminer0 workLoop 1 #63ca6f54… #63ca6f54…
miner 19:41:38.082|main Mining on PoWhash #d33c1362… : 513001 H/s = 262144 hashes / 0.511 s
miner 19:41:39.141|main Mining on PoWhash #d33c1362… : 1238865 H/s = 1310720 hashes / 1.058 s
miner 19:41:41.241|main Mining on PoWhash #d33c1362… : 1373135 H/s = 2883584 hashes / 2.1 s

ethminer --list-devices returns:

Listing OpenCL devices.
FORMAT: [deviceID] deviceName
[0] Turks
CL_DEVICE_TYPE: GPU
CL_DEVICE_GLOBAL_MEM_SIZE: 2147483648
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 536870912
CL_DEVICE_MAX_WORK_GROUP_SIZE: 256

Here's the first bit of output:

Found suitable OpenCL device [Turks] with 2147483648 bytes of GPU memory
miner 19:41:18.318|main Getting work package...
miner 19:41:18.739|main Grabbing DAG for #63ca6f54…
miner 19:41:20.360|main Got work package:
i 19:41:20.360| Loading full DAG of seedhash: #67f3589a…
miner 19:41:20.362|main Header-hash: d118a1852b2d2e6800ad6fe232b025bff118485e82491ef3776f2c0a71c80fbd
miner 19:41:20.381|main Seedhash: 63ca6f54b1af76dd4df3b908cee464ff1f212f08352cbe7eb4422806bb0c7885
miner 19:41:20.389|main Target: 0000000225c17d04dad2965cc5a02a23e254c0c3f75d9178046aeb27ce1ca574
i 19:41:20.400|gpuminer0 workLoop 0 #00000000… #63ca6f54…
i 19:41:20.405|gpuminer0 Initialising miner...
miner 19:41:20.901|main Mining on PoWhash #d118a185… : 0 H/s = 0 hashes / 0.5 s
Using platform: AMD Accelerated Parallel Processing
i 19:41:21.969| Full DAG loaded
Using device: Turks(OpenCL 1.2 AMD-APP (1800.5))
miner 19:41:21.994|main Got work package:
miner 19:41:21.997|main Header-hash: 4667da13b8e87f49946646e5ff7b422eefa6b379343a7f1fb81b1d8120be5d0d
miner 19:41:22.004|main Seedhash: 63ca6f54b1af76dd4df3b908cee464ff1f212f08352cbe7eb4422806bb0c7885
miner 19:41:22.010|main Target: 0000000225c17d04dad2965cc5a02a23e254c0c3f75d9178046aeb27ce1ca574
Printing program log
"C:\Users\neil\AppData\Local\Temp\OCL39D.tmp.cl", line 117: warning: variable
"b4_0" was declared but never referenced
uint4* b4_0 = (uint4*) b;
^

Failed to allocate 1 big chunk. Max allocateable memory is 536870912. Trying to allocate 4 chunks.
Creating buffer for chunk 0 size=356515584
Creating buffer for chunk 1 size=356515584
Creating buffer for chunk 2 size=356515584
Creating buffer for chunk 3 size=356515712
Loading chunk kernels
Mapping chunk 0 with size=356515584 and offset=0
Mapping chunk 1 with size=356515584 and offset=356515584
Mapping chunk 2 with size=356515584 and offset=713031168
Mapping chunk 3 with size=356515712 and offset=1069546752
Creating buffer for header.
Creating mining buffer 0
Creating mining buffer 1
i 19:41:36.124|gpuminer0 workLoop 1 #63ca6f54… #63ca6f54…
miner 19:41:36.630|main Mining on PoWhash #4667da13… : 520126 H/s = 262144 hashes / 0.504 s
miner 19:41:37.353|main Got work package:
miner 19:41:37.355|main Header-hash: d33c13624abc788baf82f18712733c337a564169a0814441ae6e8beffbe0a398
miner 19:41:37.361|main Seedhash: 63ca6f54b1af76dd4df3b908cee464ff1f212f08352cbe7eb4422806bb0c7885
miner 19:41:37.368|main Target: 0000000225c17d04dad2965cc5a02a23e254c0c3f75d9178046aeb27ce1ca574
i 19:41:37.567|gpuminer0 workLoop 1 #63ca6f54… #63ca6f54…
miner 19:41:38.082|main Mining on PoWhash #d33c1362… : 513001 H/s = 262144 hashes / 0.511 s
miner 19:41:39.141|main Mining on PoWhash #d33c1362… : 1238865 H/s = 1310720 hashes / 1.058 s
miner 19:41:41.241|main Mining on PoWhash #d33c1362… : 1373135 H/s = 2883584 hashes / 2.1 s

@Equinox-

This comment has been minimized.

Show comment
Hide comment
@Equinox-

Equinox- Apr 3, 2016

Mind trying it with the following environmental variables (setx name val or export name=val)

setx GPU_FORCE_64BIT_PTR 0
setx GPU_MAX_ALLOC_PERCENT 100
setx GPU_MAX_HEAP_SIZE 100
setx GPU_SINGLE_ALLOC_PERCENT 100
setx GPU_USE_SYNC_OBJECTS 1

Equinox- commented Apr 3, 2016

Mind trying it with the following environmental variables (setx name val or export name=val)

setx GPU_FORCE_64BIT_PTR 0
setx GPU_MAX_ALLOC_PERCENT 100
setx GPU_MAX_HEAP_SIZE 100
setx GPU_SINGLE_ALLOC_PERCENT 100
setx GPU_USE_SYNC_OBJECTS 1
@neil-jones

This comment has been minimized.

Show comment
Hide comment
@neil-jones

neil-jones Apr 4, 2016

Don't mind at all. I ran the setx commands and fired up ethminer, but it seems to be doing the same thing.

`Found suitable OpenCL device [Turks] with 2147483648 bytes of GPU memory
miner 20:00:19.871|main Getting work package...
miner 20:00:22.248|main Grabbing DAG for #63ca6f54…
i 20:00:23.769| Loading full DAG of seedhash: #67f3589a…
miner 20:00:23.769|main Got work package:
miner 20:00:23.788|main Header-hash: 58e28b07486745c0096cd73815b4abf5ec80fccb18d172bcd7d77240af9b0c08
miner 20:00:23.794|main Seedhash: 63ca6f54b1af76dd4df3b908cee464ff1f212f08352cbe7eb4422806bb0c7885
miner 20:00:23.805|main Target: 0000000225c17d04dad2965cc5a02a23e254c0c3f75d9178046aeb27ce1ca574
i 20:00:23.835|gpuminer0 workLoop 0 #00000000… #63ca6f54…
i 20:00:23.839|gpuminer0 Initialising miner...
miner 20:00:24.336|main Mining on PoWhash #58e28b07… : 0 H/s = 0 hashes / 0.5 s
Using platform: AMD Accelerated Parallel Processing
i 20:00:25.452| Full DAG loaded
Using device: Turks(OpenCL 1.2 AMD-APP (1800.5))
miner 20:00:26.439|main Mining on PoWhash #58e28b07… : 0 H/s = 0 hashes / 2.102 s
miner 20:00:27.368|main Got work package:
miner 20:00:27.371|main Header-hash: 02b47f7332c6ceee16f3f53ef0c0e4c9ac4307be8cc9aff12a735a6fe5ed1c3e
miner 20:00:27.379|main Seedhash: 63ca6f54b1af76dd4df3b908cee464ff1f212f08352cbe7eb4422806bb0c7885
miner 20:00:27.385|main Target: 0000000225c17d04dad2965cc5a02a23e254c0c3f75d9178046aeb27ce1ca574
Printing program log
"C:\Users\neil\AppData\Local\Temp\OCL765A.tmp.cl", line 117: warning: variable
"b4_0" was declared but never referenced
uint4* b4_0 = (uint4*) b;
^

Failed to allocate 1 big chunk. Max allocateable memory is 536870912. Trying to allocate 4 chunks.
Creating buffer for chunk 0 size=356515584
Creating buffer for chunk 1 size=356515584
Creating buffer for chunk 2 size=356515584
Creating buffer for chunk 3 size=356515712
Loading chunk kernels
Mapping chunk 0 with size=356515584 and offset=0
Mapping chunk 1 with size=356515584 and offset=356515584
Mapping chunk 2 with size=356515584 and offset=713031168
Mapping chunk 3 with size=356515712 and offset=1069546752
Creating buffer for header.
Creating mining buffer 0
Creating mining buffer 1
i 20:00:39.552|gpuminer0 workLoop 1 #63ca6f54… #63ca6f54…
miner 20:00:40.065|main Mining on PoWhash #02b47f73… : 513001 H/s = 262144 hashes / 0.511 s
miner 20:00:40.460|main Got work package:
miner 20:00:40.462|main Header-hash: e75e0252ef1b666754e581320719b4759f66b427d37563f8218a794e7434c96e
miner 20:00:40.468|main Seedhash: 63ca6f54b1af76dd4df3b908cee464ff1f212f08352cbe7eb4422806bb0c7885
miner 20:00:40.473|main Target: 0000000225c17d04dad2965cc5a02a23e254c0c3f75d9178046aeb27ce1ca574
i 20:00:40.590|gpuminer0 workLoop 1 #63ca6f54… #63ca6f54…
miner 20:00:41.092|main Mining on PoWhash #e75e0252… : 524288 H/s = 262144 hashes / 0.5 s
miner 20:00:41.985|main Mining on PoWhash #e75e0252… : 1174217 H/s = 1048576 hashes / 0.893 s
miner 20:00:42.893|main Mining on PoWhash #e75e0252… : 1445115 H/s = 1310720 hashes / 0.907 s
miner 20:00:43.381|main Got work package:
miner 20:00:43.383|main Header-hash: c4bae4e848a8abdb2eafeb9a54b0a83f0191bbcd9ee9ae8e6a18bc1f96289262
miner 20:00:43.389|main Seedhash: 63ca6f54b1af76dd4df3b908cee464ff1f212f08352cbe7eb4422806bb0c7885
miner 20:00:43.396|main Target: 0000000225c17d04dad2965cc5a02a23e254c0c3f75d9178046aeb27ce1ca574
i 20:00:43.625|gpuminer0 workLoop 1 #63ca6f54… #63ca6f54…
miner 20:00:44.125|main Mining on PoWhash #c4bae4e8… : 524288 H/s = 262144 hashes / 0.5 s`

Don't mind at all. I ran the setx commands and fired up ethminer, but it seems to be doing the same thing.

`Found suitable OpenCL device [Turks] with 2147483648 bytes of GPU memory
miner 20:00:19.871|main Getting work package...
miner 20:00:22.248|main Grabbing DAG for #63ca6f54…
i 20:00:23.769| Loading full DAG of seedhash: #67f3589a…
miner 20:00:23.769|main Got work package:
miner 20:00:23.788|main Header-hash: 58e28b07486745c0096cd73815b4abf5ec80fccb18d172bcd7d77240af9b0c08
miner 20:00:23.794|main Seedhash: 63ca6f54b1af76dd4df3b908cee464ff1f212f08352cbe7eb4422806bb0c7885
miner 20:00:23.805|main Target: 0000000225c17d04dad2965cc5a02a23e254c0c3f75d9178046aeb27ce1ca574
i 20:00:23.835|gpuminer0 workLoop 0 #00000000… #63ca6f54…
i 20:00:23.839|gpuminer0 Initialising miner...
miner 20:00:24.336|main Mining on PoWhash #58e28b07… : 0 H/s = 0 hashes / 0.5 s
Using platform: AMD Accelerated Parallel Processing
i 20:00:25.452| Full DAG loaded
Using device: Turks(OpenCL 1.2 AMD-APP (1800.5))
miner 20:00:26.439|main Mining on PoWhash #58e28b07… : 0 H/s = 0 hashes / 2.102 s
miner 20:00:27.368|main Got work package:
miner 20:00:27.371|main Header-hash: 02b47f7332c6ceee16f3f53ef0c0e4c9ac4307be8cc9aff12a735a6fe5ed1c3e
miner 20:00:27.379|main Seedhash: 63ca6f54b1af76dd4df3b908cee464ff1f212f08352cbe7eb4422806bb0c7885
miner 20:00:27.385|main Target: 0000000225c17d04dad2965cc5a02a23e254c0c3f75d9178046aeb27ce1ca574
Printing program log
"C:\Users\neil\AppData\Local\Temp\OCL765A.tmp.cl", line 117: warning: variable
"b4_0" was declared but never referenced
uint4* b4_0 = (uint4*) b;
^

Failed to allocate 1 big chunk. Max allocateable memory is 536870912. Trying to allocate 4 chunks.
Creating buffer for chunk 0 size=356515584
Creating buffer for chunk 1 size=356515584
Creating buffer for chunk 2 size=356515584
Creating buffer for chunk 3 size=356515712
Loading chunk kernels
Mapping chunk 0 with size=356515584 and offset=0
Mapping chunk 1 with size=356515584 and offset=356515584
Mapping chunk 2 with size=356515584 and offset=713031168
Mapping chunk 3 with size=356515712 and offset=1069546752
Creating buffer for header.
Creating mining buffer 0
Creating mining buffer 1
i 20:00:39.552|gpuminer0 workLoop 1 #63ca6f54… #63ca6f54…
miner 20:00:40.065|main Mining on PoWhash #02b47f73… : 513001 H/s = 262144 hashes / 0.511 s
miner 20:00:40.460|main Got work package:
miner 20:00:40.462|main Header-hash: e75e0252ef1b666754e581320719b4759f66b427d37563f8218a794e7434c96e
miner 20:00:40.468|main Seedhash: 63ca6f54b1af76dd4df3b908cee464ff1f212f08352cbe7eb4422806bb0c7885
miner 20:00:40.473|main Target: 0000000225c17d04dad2965cc5a02a23e254c0c3f75d9178046aeb27ce1ca574
i 20:00:40.590|gpuminer0 workLoop 1 #63ca6f54… #63ca6f54…
miner 20:00:41.092|main Mining on PoWhash #e75e0252… : 524288 H/s = 262144 hashes / 0.5 s
miner 20:00:41.985|main Mining on PoWhash #e75e0252… : 1174217 H/s = 1048576 hashes / 0.893 s
miner 20:00:42.893|main Mining on PoWhash #e75e0252… : 1445115 H/s = 1310720 hashes / 0.907 s
miner 20:00:43.381|main Got work package:
miner 20:00:43.383|main Header-hash: c4bae4e848a8abdb2eafeb9a54b0a83f0191bbcd9ee9ae8e6a18bc1f96289262
miner 20:00:43.389|main Seedhash: 63ca6f54b1af76dd4df3b908cee464ff1f212f08352cbe7eb4422806bb0c7885
miner 20:00:43.396|main Target: 0000000225c17d04dad2965cc5a02a23e254c0c3f75d9178046aeb27ce1ca574
i 20:00:43.625|gpuminer0 workLoop 1 #63ca6f54… #63ca6f54…
miner 20:00:44.125|main Mining on PoWhash #c4bae4e8… : 524288 H/s = 262144 hashes / 0.5 s`

@Equinox-

This comment has been minimized.

Show comment
Hide comment
@Equinox-

Equinox- Apr 4, 2016

I'm assuming that means the ethminer_debug outputs are invalid again (CPU hash doesn't end with GPU hash)

Equinox- commented Apr 4, 2016

I'm assuming that means the ethminer_debug outputs are invalid again (CPU hash doesn't end with GPU hash)

@neil-jones

This comment has been minimized.

Show comment
Hide comment
@neil-jones

neil-jones Apr 4, 2016

Correct; debug's CPU hash doesn't end with GPU hash.

Correct; debug's CPU hash doesn't end with GPU hash.

@Equinox-

This comment has been minimized.

Show comment
Hide comment
@Equinox-

Equinox- Apr 4, 2016

If you are having trouble with chunked mining you can try running the chunked DAG debugger. This won't actually mine anything; it uploads the DAG, runs through it to ensure integrity, then outputs CPU/GPU hash pairs. If it fails before the CPU/GPU hash pairs get printed (DAG verification fails) post the log.
https://github.com/Equinox-/libethereum/releases/tag/0.1.1

Equinox- commented Apr 4, 2016

If you are having trouble with chunked mining you can try running the chunked DAG debugger. This won't actually mine anything; it uploads the DAG, runs through it to ensure integrity, then outputs CPU/GPU hash pairs. If it fails before the CPU/GPU hash pairs get printed (DAG verification fails) post the log.
https://github.com/Equinox-/libethereum/releases/tag/0.1.1

@Genoil

This comment has been minimized.

Show comment
Hide comment
@Genoil

Genoil Apr 18, 2016

Contributor

BTW i looked into my broken chunks implementation and fixed it. It does work, but it doesn't seem to be very useful since for the majority of cards it's more a matter of setting the right environment variables to fix the allocation issues. It's also slower on AMD cards and really doesn't do anything useful on the Nvidia platform.

I also managed to get the VGPRS usage down to 56, but I got 108 scratch registers back in return, which totally kills the added value of an extra wavefront

Contributor

Genoil commented Apr 18, 2016

BTW i looked into my broken chunks implementation and fixed it. It does work, but it doesn't seem to be very useful since for the majority of cards it's more a matter of setting the right environment variables to fix the allocation issues. It's also slower on AMD cards and really doesn't do anything useful on the Nvidia platform.

I also managed to get the VGPRS usage down to 56, but I got 108 scratch registers back in return, which totally kills the added value of an extra wavefront

@Equinox-

This comment has been minimized.

Show comment
Hide comment
@Equinox-

Equinox- Apr 18, 2016

How low did you get it before scratch registers started appearing? I've
got a GCN3 disassembler/assembler I might be able to use to cut out a few
more.
On Apr 18, 2016 07:19, "Genoil" notifications@github.com wrote:

BTW i looked into my broken chunks implementation and fixed it. It does
work, but it doesn't seem to be very useful since for the majority of cards
it's more a matter of setting the right environment variables to fix the
allocation issues. It's also slower on AMD cards and really doesn't do
anything useful on the Nvidia platform.

I also managed to get the VGPRS usage down to 56, but I got 108 scratch
registers back in return, which totally kills the added value of an extra
wavefront


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#203 (comment)

How low did you get it before scratch registers started appearing? I've
got a GCN3 disassembler/assembler I might be able to use to cut out a few
more.
On Apr 18, 2016 07:19, "Genoil" notifications@github.com wrote:

BTW i looked into my broken chunks implementation and fixed it. It does
work, but it doesn't seem to be very useful since for the majority of cards
it's more a matter of setting the right environment variables to fix the
allocation issues. It's also slower on AMD cards and really doesn't do
anything useful on the Nvidia platform.

I also managed to get the VGPRS usage down to 56, but I got 108 scratch
registers back in return, which totally kills the added value of an extra
wavefront


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#203 (comment)

@Genoil

This comment has been minimized.

Show comment
Hide comment
@Genoil

Genoil Apr 18, 2016

Contributor

It was either 78 or 80 (the 80 one works real nice on CUDA-CL with maxrregs compiler option), or 56.

You use this for Theta:

    for(int i = 0; i < 5; i++)
        t[i] = a[i] ^ a[i+5] ^ a[i+10] ^ a[i+15] ^ a[i+20];

// #pragma unroll (enable to get speed back, but also +24 VGPRS) 
    for(int j = 0; j < 5; j++)
    {
            u = t[(j+4)%5] ^ ROL2(t[(j+1)%5], 1);

            for(int i = 0; i < 5; i++)
                a[i*5+j] ^= u;
    }

The dynamic indexing of a (the 1600-bit keccak state) forces the compiler to move it out of the registers. Same happens for t (the temporary 1600-bit keccak state). 25 * 2 * 2 = 100 is about 108 scratch regs. Why it then saves just 24 VGPRS is still a bit of a mystery, but it really doesn't matter.

Contributor

Genoil commented Apr 18, 2016

It was either 78 or 80 (the 80 one works real nice on CUDA-CL with maxrregs compiler option), or 56.

You use this for Theta:

    for(int i = 0; i < 5; i++)
        t[i] = a[i] ^ a[i+5] ^ a[i+10] ^ a[i+15] ^ a[i+20];

// #pragma unroll (enable to get speed back, but also +24 VGPRS) 
    for(int j = 0; j < 5; j++)
    {
            u = t[(j+4)%5] ^ ROL2(t[(j+1)%5], 1);

            for(int i = 0; i < 5; i++)
                a[i*5+j] ^= u;
    }

The dynamic indexing of a (the 1600-bit keccak state) forces the compiler to move it out of the registers. Same happens for t (the temporary 1600-bit keccak state). 25 * 2 * 2 = 100 is about 108 scratch regs. Why it then saves just 24 VGPRS is still a bit of a mystery, but it really doesn't matter.

@Genoil

This comment has been minimized.

Show comment
Hide comment
@Genoil

Genoil Apr 18, 2016

Contributor

Ah finally getting a bit of grip on that dreaded GCN compiler. Down to 23 VGPRS with an occupancy of 100%. Dramatic hashrate though. Good example why occupancy isn't everything :)

Contributor

Genoil commented Apr 18, 2016

Ah finally getting a bit of grip on that dreaded GCN compiler. Down to 23 VGPRS with an occupancy of 100%. Dramatic hashrate though. Good example why occupancy isn't everything :)

@Equinox-

This comment has been minimized.

Show comment
Hide comment
@Equinox-

Equinox- May 5, 2016

I'm going to close this. At this point the DAG size has increased even further and the few edge cases that the environmental variables don't solve don't seem to work with this either.

Equinox- commented May 5, 2016

I'm going to close this. At this point the DAG size has increased even further and the few edge cases that the environmental variables don't solve don't seem to work with this either.

@dan-da

This comment has been minimized.

Show comment
Hide comment
@dan-da

dan-da Jun 27, 2016

I'm not sure why this patch was closed.

Recently I've been unable to mine with stock ethminer using either a 7970 3 Gb or an R9 270 2 Gb card due to the DAG alloc issue. ( note: for some reason one 7970 works fine and another doesn't. )

I've tried all env variable hacks in this thread and elsewhere to no avail.

Applying this patch fixes the problem for both cards, and all is well.

dan-da commented Jun 27, 2016

I'm not sure why this patch was closed.

Recently I've been unable to mine with stock ethminer using either a 7970 3 Gb or an R9 270 2 Gb card due to the DAG alloc issue. ( note: for some reason one 7970 works fine and another doesn't. )

I've tried all env variable hacks in this thread and elsewhere to no avail.

Applying this patch fixes the problem for both cards, and all is well.

@dan-da

This comment has been minimized.

Show comment
Hide comment
@dan-da

dan-da Jun 29, 2016

For anyone interested, I created a fork of Genoil's ethminer that includes this patch. The chunking works great with R9 270 and HD 7970 and is automatic if allocating a full DAG fails.

dan-da commented Jun 29, 2016

For anyone interested, I created a fork of Genoil's ethminer that includes this patch. The chunking works great with R9 270 and HD 7970 and is automatic if allocating a full DAG fails.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.