Significant drop in x11 hashrate performance as compared to X15GPU miner #347

platinum4 · 2014-07-13T18:30:07Z

Comparing the most recent builds of sgminer5 (07-12 personally) versus previous builds (06-25), I have noticed a significant drop in hashrates on the x11 algorithm, which can be viewed in this possibly related issue #330

Taking a look at the two pictures posted, the sgminer5 seems to build and hash okay but it drops performance across the cards, and they are by no means synchronized. The second picture is a run from the 06-25 build, and the hashrates do not experience a loss.

@mrbrdo can you please chime in and see if this a p|thread related issue, I've already exhaused @ystarnaud who's compared most everything else, and cannot spot what may be going on.

@kenshirothefist, the source for 06-25 was pulled from your sgminer multi-algo software page. Is it possible that you could supply a link to that source once more, so we can compare across commits to see what has happened in the last 3 weeks?

platinum4 · 2014-07-13T18:32:32Z

Deleting bins, rebuilding bins, changing through every possible darkcoin-mod.cl file available, nothing seems to restore hashrates on the newer builds. This is NOT driver related, as I can replicate the maximum hashrates by deleting bins and rebuilding with 14.7RC on the 06-25 prerelease build.

troky · 2014-07-13T18:45:06Z

@platinum4 Have you tried to use bins compiled on other (good) rig?

I have one rig that just can't compile good (with normal hashrate) bins so I always use other rigs' bins there.

platinum4 · 2014-07-13T18:51:26Z

@troky yep me and @ystarnaud ran through pretty much every swap and substitution imaginable over here in IRC, no dice.

Even directly pasting the bins into the directory of the newest build and just going from there doesn't even bring them back up. I'm on 290X should be getting a minimum of 5.4MH, even with pasted [good] bins and .cl files, 2 cards don't get above 5MH

platinum4 · 2014-07-14T11:04:51Z

This issue seems to affect other x[n] algos as it translates on up the chain.

Please can we take a whole-hearted look at this issue; we would not want to bury performance loss in the dust of all other future developments. Sometimes; it is essential to look at things from a Square One perspective.

Not driver-related, not .cl-file related, not .bin-file related; this has to do with how sgminer is now handling threads. Builds from 06-25 did not do this behavior, and I must have overlooked this issue when we decided to start a develop tree and a feature-lock tree.

Restore the hashrate to all devices during a mining instance; this is what must happen.

platinum4 · 2014-07-15T12:29:02Z

@troky @mrbrdo @ystarnaud

Can we investigate these two sources, which are dated from 06-25-2014?

https://github.com/sgminer-dev/sgminer/archive/78014ab0d53c661c8d3acd4184e3ca81e802896c.zip

https://github.com/sgminer-dev/sgminer/archive/044bf709018d8509cad7bfb758670f087f924980.zip

badman74 · 2014-07-15T15:58:53Z

have you checked to see if the clocks on all of the cards are actually the same while running
i have run into times when the mem clocks did not change or even worse crashed to 150 on a single card

platinum4 · 2014-07-15T19:47:33Z

Yeah, the top two cards are at a LOWER clockrate than the bottom card, which does NOT account for its drop in hashrate...

platinum4 · 2014-07-16T14:45:23Z

I think this may be a readout lag on ncurses, now that I'm observing closely; will report in.

platinum4 · 2014-07-17T05:27:45Z

@badman74 now you see what I was talking about after talking with bullus over in bitcointalk, right? Way different hashrates with X15GPU/X15_AMD than this one, huh? ;D

platinum4 · 2014-07-17T08:32:09Z

@badman74, I found a few culprits; help me out and see if you can replicate similar hashrates:

https://bitcointalk.org/index.php?topic=632503.msg7889004#msg7889004

badman74 · 2014-07-20T00:39:29Z

what i found was replacing

#define PERM_BIG_P(a)   do { \
    int r; \
    for (r = 0; r < 14; r += 2) { \
      ROUND_BIG_P(a, r + 0); \
      ROUND_BIG_P(a, r + 1); \
    } \
  } while (0)

#define PERM_BIG_Q(a)   do { \
    int r; \
    for (r = 0; r < 14; r += 2) { \
      ROUND_BIG_Q(a, r + 0); \
      ROUND_BIG_Q(a, r + 1); \
    } \
  } while (0)

in groestl.cl with

#define PERM_BIG_P(a)   do { \
    int r; \
    for (r = 0; r < 14; r ++) { \
      ROUND_BIG_P(a, r); \
    } \
  } while (0)

#define PERM_BIG_Q(a)   do { \
    int r; \
    for (r = 0; r < 14; r++) { \
      ROUND_BIG_Q(a, r); \
    } \
  } while (0)

then changing
#define SPH_LUFFA_PARALLEL 0
and
//#include "aes_helper.cl"
to
#define SPH_LUFFA_PARALLEL 1
and
#include "aes_helper.cl"
gives me 6.04mh/s on sapphire 290x with 1040/1500 OC and 15% powertune
when it was 5.5mh/s normally
unfortunately i really have no idea what that does....
it was just pulled out of the darkcoin-mod.cl from https://github.com/aznboy84/sgminer/tree/v5_0-x15
edit: i just put my 7750's back in service and it seems that #define SPH_LUFFA_PARALLEL 1 doesn't do anything for them

ystarnaud · 2014-07-20T06:06:24Z

Interesting. Maybe this is 290 specific as I haven't really noticed a performance drop on R9 270/x or 78XX/79XX cards.

aes_helper.cl was commented out because nothing touches it. it just adds bloat to the kernel.

SPH_LUFFA_PARALLEL 1 makes use of a different way to calculate the luffa hash that depends on how the GPU processes the instructions. It may not work on all the GPUs. Setting this to 0 may have worked better on GPUs tested other than R9 290 and that is why it was changed.

From what I know of OpenCL, your changes to groestl would undo a GPU optimization. Again, maybe this is only the case for all non-Hawaii cards.

According to AMD specification and OpenCL, all southern island cards (7XXX and R9 series) should behave the same but again and again, the Hawaii chipset (R9 290/290x) seems to behave very differently than the rest of the series...

I'll run some tests on my 7XXX/R9 270 to see if the above changes really make a difference. We might end up needing to specify extra kernel compiler options to get this to work out for everybody.

Thanks for your work in testing these various parts of X11 on the 290s. I would have myself but I never bought those cards based on their lower cost effectiveness and issues.

platinum4 · 2014-07-20T06:10:13Z

@ystarnaud can you make luffa_parallel a definable option for us like you did with hamsi_expand_big ?

Also, I noticed that pulling a groestl.cl from X15_AMD @aznboy84 miner was smaller (65kb) than ours (67kb), and idk if it did anything, but it seems to.

Can you check this for a sample of changes and configs? https://bitcointalk.org/index.php?topic=632503.msg7913153#msg7913153

Also check the previous two postings in that thread from screenies of some all-star performance hashing at insane overclocks.

What we are ALL wondering is... WHY does @aznboy84's darkcoin-modHawaiigw64l4.bin yield a reliable +200Kb, and we can't ever seem to build a comparable one? I can get 5.75 steady with ours (2.0MB), but if want to shoot for the moon, must go back and use his (1.96MB). You and I have already compared the darkcoin-mod.cl files, all five or six versions of them...

badman74 · 2014-07-20T06:50:31Z

for me the SPH_LUFFA_PARALLEL 1 gave about 100kh/s, and after checking again i see that you are correct and aes_helper.cl isn't doing anything so that just leaves the change i made at the end of groestl.cl that causes the change in speed

troky · 2014-07-20T07:42:40Z

I can confirm +100kh/s with #define SPH_LUFFA_PARALLEL 1 on 290

ystarnaud · 2014-07-20T14:08:37Z

@platinum4 yeah that's what I was thinking. I'll work something out later today when I have some free time.

ystarnaud · 2014-07-20T14:10:11Z

Oh and I'm guessing this will also affect X13/14/15 since they use these algorithms as part of their hash.

badman74 · 2014-07-20T16:33:06Z

after looking at this couldn't we use the SPH_KECCAK_UNROLL, SPH_LUFFA_PARALLEL, and SPH_HAMSI_EXPAND_BIG instead of the *-mod.cl and *-modold.cl kernels
or am i missing some other optimizations in them

mrbrdo · 2014-07-24T01:45:03Z

@badman74 not sure why, from my limited knowledge of the kernels the *-modold ones have the last 3 opencl kernels combined into a single one. Don't know what the reason is exactly, but it's not just changing those constants. Unless you are saying that the reason the "-mod" kernels don't work on some cards is because of those 3 constants' values.

badman74 · 2014-07-27T20:22:03Z

the main thing i used to recover the lost has was the change in groestl.cl
i don't know if this is the same across all cards

ystarnaud · 2014-07-28T19:22:13Z

See c603cec and #358.

ystarnaud · 2014-07-28T20:47:08Z

@mrbrdo I don't know the specifics but I believe lower end GPUs in the 6xxx line or older didn't have enough compute units to process more than 10 kernel objects (if not less). The extra rounds of algorithm are packed into that last kernel so that they (at least some) will be able to compute the hash.

Another thing is the values suggested above aren't just constants. Depending on the values, the kernel .cl files will process differently or unroll loops differently to offer better optimization with #ifndef #else #endif type programming. Setting these values directly in the .cl file might result in problems with the various GPUs out there while optimizing others. This is why I added fine tuning options.

mrbrdo · 2014-07-28T21:08:44Z

@ystarnaud good work. Could you also call append_x11_compiler_options from append_x13_compiler_options instead of current code duplication? It's always nice to not have to change things at multiple places later.

ystarnaud · 2014-07-29T01:38:34Z

Sure... I didn't think about that...

ystarnaud · 2014-07-29T01:48:17Z

@mrbrdo done.

platinum4 · 2014-07-30T06:20:23Z

I'd say this was solved by the replacement of the last two loops in groestl.cl found here https://github.com/sgminer-dev/sgminer/blob/c603cec762454ab9f828f36dc6d67f6a1208768f/kernel/groestl.cl; the other enhancements/additions found with luffa-parallel, black-compact, and keccak-unroll came about as ancillary and are definitely beneficial to gaining that extra bit of hashrate.

The main objective of this issue has been effectively solved now by c603cec

platinum4 mentioned this issue Jul 20, 2014

Proposal to add --luffa-parallel option for fine tuning #357

Closed

ystarnaud self-assigned this Jul 20, 2014

platinum4 mentioned this issue Jul 28, 2014

x[n] algorithm .cl-file optimizations #371

Closed

platinum4 closed this as completed Jul 30, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significant drop in x11 hashrate performance as compared to X15GPU miner #347

Significant drop in x11 hashrate performance as compared to X15GPU miner #347

platinum4 commented Jul 13, 2014

platinum4 commented Jul 13, 2014

troky commented Jul 13, 2014

platinum4 commented Jul 13, 2014

platinum4 commented Jul 14, 2014

platinum4 commented Jul 15, 2014

badman74 commented Jul 15, 2014

platinum4 commented Jul 15, 2014

platinum4 commented Jul 16, 2014

platinum4 commented Jul 17, 2014

platinum4 commented Jul 17, 2014

badman74 commented Jul 20, 2014

ystarnaud commented Jul 20, 2014

platinum4 commented Jul 20, 2014

badman74 commented Jul 20, 2014

troky commented Jul 20, 2014

ystarnaud commented Jul 20, 2014

ystarnaud commented Jul 20, 2014

badman74 commented Jul 20, 2014

mrbrdo commented Jul 24, 2014

badman74 commented Jul 27, 2014

ystarnaud commented Jul 28, 2014

ystarnaud commented Jul 28, 2014

mrbrdo commented Jul 28, 2014

ystarnaud commented Jul 29, 2014

ystarnaud commented Jul 29, 2014

platinum4 commented Jul 30, 2014

Significant drop in x11 hashrate performance as compared to X15GPU miner #347

Significant drop in x11 hashrate performance as compared to X15GPU miner #347

Comments

platinum4 commented Jul 13, 2014

platinum4 commented Jul 13, 2014

troky commented Jul 13, 2014

platinum4 commented Jul 13, 2014

platinum4 commented Jul 14, 2014

platinum4 commented Jul 15, 2014

badman74 commented Jul 15, 2014

platinum4 commented Jul 15, 2014

platinum4 commented Jul 16, 2014

platinum4 commented Jul 17, 2014

platinum4 commented Jul 17, 2014

badman74 commented Jul 20, 2014

ystarnaud commented Jul 20, 2014

platinum4 commented Jul 20, 2014

badman74 commented Jul 20, 2014

troky commented Jul 20, 2014

ystarnaud commented Jul 20, 2014

ystarnaud commented Jul 20, 2014

badman74 commented Jul 20, 2014

mrbrdo commented Jul 24, 2014

badman74 commented Jul 27, 2014

ystarnaud commented Jul 28, 2014

ystarnaud commented Jul 28, 2014

mrbrdo commented Jul 28, 2014

ystarnaud commented Jul 29, 2014

ystarnaud commented Jul 29, 2014

platinum4 commented Jul 30, 2014