Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How pcg32_random_bounded_divisionless() work ? #3

Closed
achan001 opened this issue May 23, 2018 · 23 comments
Closed

How pcg32_random_bounded_divisionless() work ? #3

achan001 opened this issue May 23, 2018 · 23 comments

Comments

@achan001
Copy link

I peek at the code pcg32_random_bounded_divisionless().
Why variable 'leftover' able to reject unbiased randoms ?

No bias if leftover >= range or leftover(updated) >= threshold.
I don't follow above logic ...

Does this produce unbiased ranged number ?
Or, is it an approximation, using your idea in another blog:

https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/

@achan001
Copy link
Author

Why variable 'leftover' able to reject unbiased randoms ?

Uh, I meant to reject biased randoms

random32bit =  pcg32_random(); 
multiresult = random32bit * range; 
leftover = (uint32_t) multiresult; 
return multiresult >> 32;             // [0, range)

Returned result look just like your modulo-reduction trick.

As you mentioned in the blog, above code work very well, especially for small range.
So, the question is how much the added sample rejection code contributed.

variable 'leftover' seems like a random number (MCG) to me ...

@lemire
Copy link
Owner

lemire commented May 23, 2018

So, the question is how much the added sample rejection code contributed.

The question we ask is "given that we have a perfect source of random numbers, can you prove mathematically that the ranged result is unbiased". There is no "how much" question in this context. Results are either biased or unbiased.

A separate question is whether the bias matters. Well. That depends, doesn't it? Some applications might be super sensitive to biases, others might not care.

@achan001
Copy link
Author

So, the question is how much the added sample rejection code contributed

I phrase it that way because the code will generate bias, even with true random source.
Even your comment for pcg32_random_bounded_divisionless() say so. (shuffle.c, line 55)

// map random value to [0,range) with slight bias, redraws to avoid bias if needed
static inline uint32_t pcg32_random_bounded_divisionless(uint32_t range) {
...

So, the code will return random numbers with slight bias.
Sample rejection code contributed to the benefit of redrawing.

I don't follow the criteria of redrawing, and how much it contribute.
The code implied that is a line between acceptable and unacceptable bias
It is unclear where the line is ...

@achan001
Copy link
Author

I made two programs to show that the code seems to work.
This just fills the slots using modulo-reduction trick

#include <stdio.h>              // ranged_slot.c
#include <stdlib.h>       

int main(int argc, char **argv) 
{
  unsigned range = strtoul(argv[1], NULL, 10);
  int slot[range];
  for(int i=0; i<range; i++) slot[i] = 0;
  unsigned rand = 0;
  unsigned long long rand64 = 0;    
  do {
    slot[(int)(rand64 >> 32)]++;
    rand64 += range;            // rand64 = rand * range
  } while (++rand);             // all 32-bits randoms
  for(int i=0; i<range; i++)    // show the slots
    printf("slot[%d] = %d\n", i, slot[i]);
  return 0;
}

This showed what got rejected

#include <stdio.h>              // ranged_reject.c
#include <stdlib.h>       
#define SCALED(x, n)   (unsigned) (((unsigned long long) x * n) >> 32)

int main(int argc, char **argv)
{
  unsigned range = strtoul(argv[1], NULL, 10);
  unsigned threshold = -range % range;
  unsigned rand = 0, leftover = 0;
  printf("leftover < (threshold = %u):\n", threshold);  
  do {
    if (leftover < threshold) 
      printf("0x%08x --> %u\n", rand, SCALED(rand, range));
    leftover += range;          // leftover = (rand * range) & bit32  
  } while (++rand);             // all 32-bits randoms
  return 0;
}

For range = 6, all slots (after sample rejection) are equally likely

C:\tmp>ranged_slot 6
slot[0] = 715827883
slot[1] = 715827883
slot[2] = 715827882
slot[3] = 715827883
slot[4] = 715827883
slot[5] = 715827882

C:\tmps> ranged_reject 6
leftover < (threshold = 4):
0x00000000 --> 0
0x2aaaaaab --> 1
0x80000000 --> 3
0xaaaaaaab --> 4

@lemire
Copy link
Owner

lemire commented May 24, 2018

Given your last comment, I think I now understand your point.

@lemire
Copy link
Owner

lemire commented May 24, 2018

Ok. So you get a biased return...

>ranged_slot 6
slot[0] = 715827883
slot[1] = 715827883
slot[2] = 715827882
slot[3] = 715827883
slot[4] = 715827883
slot[5] = 715827882

Some values are more likely than others. As I point out in my blog post, the difference is at most one between frequencies, so for small ranges, it is a "slight" bias.

Try this program (with rejection)...

#include <stdio.h>              // ranged_slot.c
#include <stdlib.h>

int main(int argc, char **argv)
{
  uint32_t range = strtoul(argv[1], NULL, 10);
  uint32_t slot[range];
  for(int i=0; i<range; i++) slot[i] = 0;
  uint32_t rand = 0;
  uint64_t rand64 = 0;
  uint32_t threshold = -range % range;
  do {
    if((rand64 & 0xFFFFFFFF) >= threshold) slot[(uint32_t)(rand64 >> 32)]++;
    rand64 += range;            // rand64 = rand * range
  } while (++rand);             // all 32-bits randoms
  for(int i=0; i<range; i++)    // show the slots
    printf("slot[%d] = %d\n", i, slot[i]);
  return 0;
}

We get an unbiased result...

$ ./ranged_slot 6
slot[0] = 715827882
slot[1] = 715827882
slot[2] = 715827882
slot[3] = 715827882
slot[4] = 715827882
slot[5] = 715827882

You seem to want me to tell you when a bias is ok. I don't have an answer to this question.

I'll close this, reopen if I have not answered to your satisfaction.

@lemire lemire closed this as completed May 24, 2018
@achan001
Copy link
Author

With your revised slots + rejection program, 6 slots are all equaly likely.

Is it always true ? (all 32-bits range)

Brute force to prove it always true probably take too long.
Besides, it will not explain WHY and HOW it work.

I understand that rejected cases count is same as pcg32_random_bounded()
(leftover = (rand * range) >>32, is a bijection of rand, rejection rate = threshold / 2^32)

But, why the rejected cases happened to be the slots with excess counts ?
It probably is not an accident.

Explanation is still welcome.

@lemire
Copy link
Owner

lemire commented May 24, 2018

With your revised slots + rejection program, 6 slots are all equaly likely.

Yes.

Is it always true ? (all 32-bits range)

Yes.

Brute force to prove it always true probably take too long.

Definitively.

Besides, it will not explain WHY and HOW it work.

Indeed. I am working on a short paper right now. It is not yet ready. I will publish it soon.

I understand that rejected cases count is same as pcg32_random_bounded()
(leftover = (rand * range) >>32, is a bijection of rand, rejection rate = threshold / 2^32)

Yes. All these algorithms rely on the same math.

But, why the rejected cases happened to be the slots with excess counts ? It probably is not an accident.

Indeed, that is no accident. It is the desired result.

Explanation is still welcome.

Email me at lemire@gmail.com and I will share the draft of my paper.

@achan001
Copy link
Author

// map random value to [0,range) with slight bias, redraws to avoid bias if needed

You might consider changing above comment (split.c, line 55)
I had mis-read it to imply the code accept slight bias, redraw if bias unacceptable

This might be better:

// map random value to [0,range) without bias (by redraws if needed)

@lemire
Copy link
Owner

lemire commented May 25, 2018

You are correct. Done.

@achan001
Copy link
Author

I think I can visualize why the code removed all bias. (Note: not a proof)

After threshold values are rejected, remaining values are divisible by range.
So, if reject correctly picked threshold values, the result have no bias.

The reason it reject the slot with excess count is because all rejected cases
are all very very close to edge of 2 slots. It almost seems as if it could fall to
the wrong slots. Edge cases just barely able to squeeze into that particular slot.

So, if we have to reject anything, pick the edge cases.

@lemire
Copy link
Owner

lemire commented May 27, 2018

After threshold values are rejected, remaining values are divisible by range.

That's the gist of the proof.

@achan001
Copy link
Author

Thanks.

Since possible values are evenly spaced, the slot with the edge case
allow most room for other values.

Uh, I think I just proved it ...

@lemire
Copy link
Owner

lemire commented May 27, 2018

I think you did, indeed.

@achan001
Copy link
Author

To confirm my edge case idea valid, I tried top edge instead of bottom

uint32_t unbiased = ~(-range % range);    // unbiased = ~threshold
while (leftover > unbiased) ...           // rejected same slots as leftover < threshold

For unknown reason, top edge rejection sometimes run faster

range_reject.c (post 5) , range = 6:
speedup = 5.393 sec / 3.595 sec = 1.501X, or 50% faster

@lemire
Copy link
Owner

lemire commented May 28, 2018

It seems unlikely that you'd see a 50% speed difference. I will start with the usual warning: never benchmark on a laptop... always benchmark on a server configured for testing.

I cannot reproduce your speed difference. Try this gist:

https://gist.github.com/lemire/0ead15045a4c174799338b231bacf199

$ gcc -O3 -o isflipped isflipped.c
$ ./isflipped
initbase(buffer,len,range)                                  	:  4.799 cycles per operation (best) 	4.875 cycles per operation (avg)
flippedbase(buffer,len,range)                               	:  4.790 cycles per operation (best) 	4.799 cycles per operation (avg)
$ ./isflipped
initbase(buffer,len,range)                                  	:  4.798 cycles per operation (best) 	4.874 cycles per operation (avg)
flippedbase(buffer,len,range)                               	:  4.793 cycles per operation (best) 	4.799 cycles per operation (avg)
$ clang -O3 -o isflipped isflipped.c
$ ./isflipped
initbase(buffer,len,range)                                  	:  6.029 cycles per operation (best) 	6.030 cycles per operation (avg)
flippedbase(buffer,len,range)                               	:  6.029 cycles per operation (best) 	6.051 cycles per operation (avg)
$ ./isflipped
initbase(buffer,len,range)                                  	:  6.029 cycles per operation (best) 	6.030 cycles per operation (avg)
flippedbase(buffer,len,range)                               	:  6.029 cycles per operation (best) 	6.030 cycles per operation (avg)

@achan001
Copy link
Author

achan001 commented May 28, 2018

ranged_reject.c (post 5) does not have a PRNG (not even slots!)

With cost of PRNG added, the effect of top edge vs bottom edge is just noise.

My goal is to confirm top edge rejected the same slots ... it does !
The speedup is just a curious observation (w/ strawberryperl.com supplied gcc 7.1.0)

@achan001
Copy link
Author

achan001 commented May 28, 2018

Your pre-screening test for pcg32_random_bounded_divisionless_flipped is wrong

Pre-screened cases must include all the top edge cases.
It should be like this:

if (leftover > ~range) ...     // pre-screening

If range > 0, above can be optimized a bit more:

if (leftover > -range) ...     // pre-screening, range > 0

@lemire
Copy link
Owner

lemire commented May 28, 2018

Your pre-screening test for pcg32_random_bounded_divisionless_flipped is wrong

Right. It still does not affect the running speed.

With cost of PRNG added, the effect of top edge vs bottom edge is just noise.

Makes sense!

@achan001
Copy link
Author

You might consider adding this top edge case to your unfinished draft.
It showed that the threshold rejection test is just picking the edge cases.

Which edge cases to pick does not matter (as long as it is the same side)

@lemire
Copy link
Owner

lemire commented May 28, 2018

@achan001 Mathematically, this needs to be pointed out, but there is no reason to have two distinct implementations unless one has benefits, somehow.

@lemire
Copy link
Owner

lemire commented May 29, 2018

Sorry for the delay, I posted the reference tonight:

@achan001
Copy link
Author

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants