Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant drop in x11 hashrate performance as compared to X15GPU miner #347

Closed
platinum4 opened this issue Jul 13, 2014 · 26 comments
Closed
Assignees

Comments

@platinum4
Copy link
Contributor

Comparing the most recent builds of sgminer5 (07-12 personally) versus previous builds (06-25), I have noticed a significant drop in hashrates on the x11 algorithm, which can be viewed in this possibly related issue #330

Taking a look at the two pictures posted, the sgminer5 seems to build and hash okay but it drops performance across the cards, and they are by no means synchronized. The second picture is a run from the 06-25 build, and the hashrates do not experience a loss.

@mrbrdo can you please chime in and see if this a p|thread related issue, I've already exhaused @ystarnaud who's compared most everything else, and cannot spot what may be going on.

@kenshirothefist, the source for 06-25 was pulled from your sgminer multi-algo software page. Is it possible that you could supply a link to that source once more, so we can compare across commits to see what has happened in the last 3 weeks?

@platinum4
Copy link
Contributor Author

Deleting bins, rebuilding bins, changing through every possible darkcoin-mod.cl file available, nothing seems to restore hashrates on the newer builds. This is NOT driver related, as I can replicate the maximum hashrates by deleting bins and rebuilding with 14.7RC on the 06-25 prerelease build.

@troky
Copy link
Contributor

troky commented Jul 13, 2014

@platinum4 Have you tried to use bins compiled on other (good) rig?

I have one rig that just can't compile good (with normal hashrate) bins so I always use other rigs' bins there.

@platinum4
Copy link
Contributor Author

@troky yep me and @ystarnaud ran through pretty much every swap and substitution imaginable over here in IRC, no dice.

Even directly pasting the bins into the directory of the newest build and just going from there doesn't even bring them back up. I'm on 290X should be getting a minimum of 5.4MH, even with pasted [good] bins and .cl files, 2 cards don't get above 5MH

@platinum4
Copy link
Contributor Author

This issue seems to affect other x[n] algos as it translates on up the chain.

Please can we take a whole-hearted look at this issue; we would not want to bury performance loss in the dust of all other future developments. Sometimes; it is essential to look at things from a Square One perspective.

Not driver-related, not .cl-file related, not .bin-file related; this has to do with how sgminer is now handling threads. Builds from 06-25 did not do this behavior, and I must have overlooked this issue when we decided to start a develop tree and a feature-lock tree.

Restore the hashrate to all devices during a mining instance; this is what must happen.

@platinum4
Copy link
Contributor Author

@badman74
Copy link
Contributor

have you checked to see if the clocks on all of the cards are actually the same while running
i have run into times when the mem clocks did not change or even worse crashed to 150 on a single card

@platinum4
Copy link
Contributor Author

Yeah, the top two cards are at a LOWER clockrate than the bottom card, which does NOT account for its drop in hashrate...

@platinum4
Copy link
Contributor Author

I think this may be a readout lag on ncurses, now that I'm observing closely; will report in.

@platinum4
Copy link
Contributor Author

@badman74 now you see what I was talking about after talking with bullus over in bitcointalk, right? Way different hashrates with X15GPU/X15_AMD than this one, huh? ;D

@platinum4
Copy link
Contributor Author

@badman74, I found a few culprits; help me out and see if you can replicate similar hashrates:

https://bitcointalk.org/index.php?topic=632503.msg7889004#msg7889004

@badman74
Copy link
Contributor

what i found was replacing

#define PERM_BIG_P(a)   do { \
    int r; \
    for (r = 0; r < 14; r += 2) { \
      ROUND_BIG_P(a, r + 0); \
      ROUND_BIG_P(a, r + 1); \
    } \
  } while (0)

#define PERM_BIG_Q(a)   do { \
    int r; \
    for (r = 0; r < 14; r += 2) { \
      ROUND_BIG_Q(a, r + 0); \
      ROUND_BIG_Q(a, r + 1); \
    } \
  } while (0)

in groestl.cl with

#define PERM_BIG_P(a)   do { \
    int r; \
    for (r = 0; r < 14; r ++) { \
      ROUND_BIG_P(a, r); \
    } \
  } while (0)

#define PERM_BIG_Q(a)   do { \
    int r; \
    for (r = 0; r < 14; r++) { \
      ROUND_BIG_Q(a, r); \
    } \
  } while (0)

then changing
#define SPH_LUFFA_PARALLEL 0
and
//#include "aes_helper.cl"
to
#define SPH_LUFFA_PARALLEL 1
and
#include "aes_helper.cl"
gives me 6.04mh/s on sapphire 290x with 1040/1500 OC and 15% powertune
when it was 5.5mh/s normally
unfortunately i really have no idea what that does....
it was just pulled out of the darkcoin-mod.cl from https://github.com/aznboy84/sgminer/tree/v5_0-x15
edit: i just put my 7750's back in service and it seems that #define SPH_LUFFA_PARALLEL 1 doesn't do anything for them

@ystarnaud
Copy link
Contributor

Interesting. Maybe this is 290 specific as I haven't really noticed a performance drop on R9 270/x or 78XX/79XX cards.

aes_helper.cl was commented out because nothing touches it. it just adds bloat to the kernel.

SPH_LUFFA_PARALLEL 1 makes use of a different way to calculate the luffa hash that depends on how the GPU processes the instructions. It may not work on all the GPUs. Setting this to 0 may have worked better on GPUs tested other than R9 290 and that is why it was changed.

From what I know of OpenCL, your changes to groestl would undo a GPU optimization. Again, maybe this is only the case for all non-Hawaii cards.

According to AMD specification and OpenCL, all southern island cards (7XXX and R9 series) should behave the same but again and again, the Hawaii chipset (R9 290/290x) seems to behave very differently than the rest of the series...

I'll run some tests on my 7XXX/R9 270 to see if the above changes really make a difference. We might end up needing to specify extra kernel compiler options to get this to work out for everybody.

Thanks for your work in testing these various parts of X11 on the 290s. I would have myself but I never bought those cards based on their lower cost effectiveness and issues.

@platinum4
Copy link
Contributor Author

@ystarnaud can you make luffa_parallel a definable option for us like you did with hamsi_expand_big ?

Also, I noticed that pulling a groestl.cl from X15_AMD @aznboy84 miner was smaller (65kb) than ours (67kb), and idk if it did anything, but it seems to.

Can you check this for a sample of changes and configs? https://bitcointalk.org/index.php?topic=632503.msg7913153#msg7913153

Also check the previous two postings in that thread from screenies of some all-star performance hashing at insane overclocks.

What we are ALL wondering is... WHY does @aznboy84's darkcoin-modHawaiigw64l4.bin yield a reliable +200Kb, and we can't ever seem to build a comparable one? I can get 5.75 steady with ours (2.0MB), but if want to shoot for the moon, must go back and use his (1.96MB). You and I have already compared the darkcoin-mod.cl files, all five or six versions of them...

@badman74
Copy link
Contributor

for me the SPH_LUFFA_PARALLEL 1 gave about 100kh/s, and after checking again i see that you are correct and aes_helper.cl isn't doing anything so that just leaves the change i made at the end of groestl.cl that causes the change in speed

@troky
Copy link
Contributor

troky commented Jul 20, 2014

I can confirm +100kh/s with #define SPH_LUFFA_PARALLEL 1 on 290

@ystarnaud
Copy link
Contributor

@platinum4 yeah that's what I was thinking. I'll work something out later today when I have some free time.

@ystarnaud ystarnaud self-assigned this Jul 20, 2014
@ystarnaud
Copy link
Contributor

Oh and I'm guessing this will also affect X13/14/15 since they use these algorithms as part of their hash.

@badman74
Copy link
Contributor

after looking at this couldn't we use the SPH_KECCAK_UNROLL, SPH_LUFFA_PARALLEL, and SPH_HAMSI_EXPAND_BIG instead of the *-mod.cl and *-modold.cl kernels
or am i missing some other optimizations in them

@mrbrdo
Copy link
Contributor

mrbrdo commented Jul 24, 2014

@badman74 not sure why, from my limited knowledge of the kernels the *-modold ones have the last 3 opencl kernels combined into a single one. Don't know what the reason is exactly, but it's not just changing those constants. Unless you are saying that the reason the "-mod" kernels don't work on some cards is because of those 3 constants' values.

@badman74
Copy link
Contributor

the main thing i used to recover the lost has was the change in groestl.cl
i don't know if this is the same across all cards

@ystarnaud
Copy link
Contributor

See c603cec and #358.

@ystarnaud
Copy link
Contributor

@mrbrdo I don't know the specifics but I believe lower end GPUs in the 6xxx line or older didn't have enough compute units to process more than 10 kernel objects (if not less). The extra rounds of algorithm are packed into that last kernel so that they (at least some) will be able to compute the hash.

Another thing is the values suggested above aren't just constants. Depending on the values, the kernel .cl files will process differently or unroll loops differently to offer better optimization with #ifndef #else #endif type programming. Setting these values directly in the .cl file might result in problems with the various GPUs out there while optimizing others. This is why I added fine tuning options.

@mrbrdo
Copy link
Contributor

mrbrdo commented Jul 28, 2014

@ystarnaud good work. Could you also call append_x11_compiler_options from append_x13_compiler_options instead of current code duplication? It's always nice to not have to change things at multiple places later.

@ystarnaud
Copy link
Contributor

Sure... I didn't think about that...

@ystarnaud
Copy link
Contributor

@mrbrdo done.

@platinum4
Copy link
Contributor Author

I'd say this was solved by the replacement of the last two loops in groestl.cl found here https://github.com/sgminer-dev/sgminer/blob/c603cec762454ab9f828f36dc6d67f6a1208768f/kernel/groestl.cl; the other enhancements/additions found with luffa-parallel, black-compact, and keccak-unroll came about as ancillary and are definitely beneficial to gaining that extra bit of hashrate.

The main objective of this issue has been effectively solved now by c603cec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants