New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible degradation to 0 in random math #19

Closed
SChernykh opened this Issue Dec 1, 2018 · 6 comments

Comments

Projects
None yet
3 participants
@SChernykh
Copy link

SChernykh commented Dec 1, 2018

int src1 = kiss99(prog_rnd) % PROGPOW_REGS;
int src2 = kiss99(prog_rnd) % PROGPOW_REGS;

If src1 == src2 and we do XOR, result will be 0. This 0 will most likely spread because

0 * b = 0, a * 0 = 0
mul_hi(0, b) = 0, mul_hi(a, 0) = 0
ROTL32(0, b) = 0
ROTR32(0, b) = 0
0 & b = 0, a & 0 = 0
min(0, b) = 0, min(a, 0) = 0

The fix is to never do math operations that cancel out both arguments to 0. As far as I can see, it's only XOR currently. ASIC can add optimizations for the case when one of the numbers is 0.

Moreover, the case when src1 == src2 allows many other optimizations for ASIC that OpenCL/GPU won't do. It can use squarer instead of full multiplier for multiplication (more energy efficient), MIN/AND/OR simply become NOP, ADD becomes SHL by 1, CLZ/POPCOUNT become 2 times simpler/energy efficient. OpenCL compiler, on the other hand, is not guaranteed to take advantage of this. Compiler will be able to remove MIN/AND/OR from the generated code if src1 == src2, but it's unlikely to do more.

Again, the fix for all this is simple: never do math on the same register, always use two different registers:

int src_index = kiss99(prog_rnd) % (PROGPOW_REGS * (PROGPOW_REGS - 1));
int src1 = src_index % PROGPOW_REGS; // 0 <= src1 < PROGPOW_REGS
int src2 = src_index / PROGPOW_REGS; // 0 <= src2 < PROGPOW_REGS - 1

// src2 is not the final index yet
// Example: if we have 5 registers and src1 = 1, src2 = 3
// src1: 0 _1_ 2 3 4
// src2 = 3, but it's an index in the list of remaining registers: 0 2 3 _4_
// so the final index for src2 will be 4 = 3 + 1

if (src2 >= src1) ++src2; // 0 <= src2 < PROGPOW_REGS and src2 != src1
@chfast

This comment has been minimized.

Copy link

chfast commented Dec 4, 2018

I like the proposed solution but int src_index = kiss99(prog_rnd) % (PROGPOW_REGS * (PROGPOW_REGS - 1)); is not uniformly distributed - numbers from 0 to 127 have slightly higher probability.

@SChernykh

This comment has been minimized.

Copy link

SChernykh commented Dec 4, 2018

It was just an example of how to generate two different random numbers, I didn't mean to use it as it is. Since this code is only used for generation, straight-forward approach "generate src1, src2 randomly, repeat if they are the same" will also work.

@chfast

This comment has been minimized.

Copy link

chfast commented Dec 4, 2018

I like this solution because it is nicely optimized by a compiler and does not have a branch.

Another proposed one was:

    unsigned r;
    do {
        r = rand();
    }    
    while ((r & 31) == ((r>>16) & 31));

    return {r & 31, (r>>16) & 31};

I don't like it though because of the loop used. We will have to find test cases where the loop is executed more than once and include them in the test vectors.

@SChernykh

This comment has been minimized.

Copy link

SChernykh commented Dec 4, 2018

If kiss99 passes TestU01 as said in documentation, non-uniformness is not really a problem. The bias will be very small, something like (32*31)/2^32 ~= 2.3*10^-7

P.S. First 128 combinations will be chosen with probability 4329605/2^32, the rest 864 combinations will be chosen with probability 4329604/2^32. Not a big deal.

@ifdefelse

This comment has been minimized.

Copy link
Owner

ifdefelse commented Dec 5, 2018

This is a good catch, thought not a significant issue.

First the OpenCL/CUDA compilers are extremely good and definitely do the transforms described above. They definitely do things like min(a,a)=a or popc(a)+popc(a)=popc(a)<<1. The only way an ASIC would have an advantage would be if it had a dedicated squaring unit. However it would only be used for 1/32*1/11=0.3% of random math ops, so doesn't seem worth the area.

Second the case of a^a=0 (or a|b=0xffffffff) is not a problem. The result of math() directly feeds a merge(), which was specifically designed to handle the B input being low entropy by XOR or ADDing in the B input. The mix state will not catastrophically collapse to 0 as in the above example.

That said there's no reason not to fix this. You suggestion for generating src1/src2 from a single random call seems reasonable. I agree that a 1/2^32 bias doesn't make any meaningful difference.

At the same time we should also update the rotate amount in merge() from ((r >> 16) % 32) to (((r >> 16) % 31)+1) to prevent rotating by 0 which is a NOP.

@ifdefelse

This comment has been minimized.

Copy link
Owner

ifdefelse commented Dec 10, 2018

V0.9.2 fixes this both src1==src2 and rotate by 0. Thanks for this suggestion.
commit 824cd79

@ifdefelse ifdefelse closed this Dec 10, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment