New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How pcg32_random_bounded_divisionless() work ? #3
Comments
Uh, I meant to reject biased randoms
Returned result look just like your modulo-reduction trick. As you mentioned in the blog, above code work very well, especially for small range. variable 'leftover' seems like a random number (MCG) to me ... |
The question we ask is "given that we have a perfect source of random numbers, can you prove mathematically that the ranged result is unbiased". There is no "how much" question in this context. Results are either biased or unbiased. A separate question is whether the bias matters. Well. That depends, doesn't it? Some applications might be super sensitive to biases, others might not care. |
I phrase it that way because the code will generate bias, even with true random source.
So, the code will return random numbers with slight bias. I don't follow the criteria of redrawing, and how much it contribute. |
I made two programs to show that the code seems to work.
This showed what got rejected
For range = 6, all slots (after sample rejection) are equally likely
|
Given your last comment, I think I now understand your point. |
Ok. So you get a biased return...
Some values are more likely than others. As I point out in my blog post, the difference is at most one between frequencies, so for small ranges, it is a "slight" bias. Try this program (with rejection)...
We get an unbiased result...
You seem to want me to tell you when a bias is ok. I don't have an answer to this question. I'll close this, reopen if I have not answered to your satisfaction. |
With your revised slots + rejection program, 6 slots are all equaly likely. Is it always true ? (all 32-bits range) Brute force to prove it always true probably take too long. I understand that rejected cases count is same as pcg32_random_bounded() But, why the rejected cases happened to be the slots with excess counts ? Explanation is still welcome. |
Yes.
Yes.
Definitively.
Indeed. I am working on a short paper right now. It is not yet ready. I will publish it soon.
Yes. All these algorithms rely on the same math.
Indeed, that is no accident. It is the desired result.
Email me at lemire@gmail.com and I will share the draft of my paper. |
You might consider changing above comment (split.c, line 55) This might be better:
|
You are correct. Done. |
I think I can visualize why the code removed all bias. (Note: not a proof) After threshold values are rejected, remaining values are divisible by range. The reason it reject the slot with excess count is because all rejected cases So, if we have to reject anything, pick the edge cases. |
That's the gist of the proof. |
Thanks. Since possible values are evenly spaced, the slot with the edge case Uh, I think I just proved it ... |
I think you did, indeed. |
To confirm my edge case idea valid, I tried top edge instead of bottom
For unknown reason, top edge rejection sometimes run faster range_reject.c (post 5) , range = 6: |
It seems unlikely that you'd see a 50% speed difference. I will start with the usual warning: never benchmark on a laptop... always benchmark on a server configured for testing. I cannot reproduce your speed difference. Try this gist: https://gist.github.com/lemire/0ead15045a4c174799338b231bacf199
|
ranged_reject.c (post 5) does not have a PRNG (not even slots!) With cost of PRNG added, the effect of top edge vs bottom edge is just noise. My goal is to confirm top edge rejected the same slots ... it does ! |
Your pre-screening test for pcg32_random_bounded_divisionless_flipped is wrong Pre-screened cases must include all the top edge cases.
If range > 0, above can be optimized a bit more:
|
Right. It still does not affect the running speed.
Makes sense! |
You might consider adding this top edge case to your unfinished draft. Which edge cases to pick does not matter (as long as it is the same side) |
@achan001 Mathematically, this needs to be pointed out, but there is no reason to have two distinct implementations unless one has benefits, somehow. |
Sorry for the delay, I posted the reference tonight:
|
Thank you. |
I peek at the code pcg32_random_bounded_divisionless().
Why variable 'leftover' able to reject unbiased randoms ?
No bias if leftover >= range or leftover(updated) >= threshold.
I don't follow above logic ...
Does this produce unbiased ranged number ?
Or, is it an approximation, using your idea in another blog:
https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
The text was updated successfully, but these errors were encountered: