-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate Mwc256XXA64 #18
Comments
Sorry @tkaitchuck for not replying sooner; yes I have seen this. I haven't had time to take a look yet, so the only thing I can say is that we like to have third-party review before switching to a new RNG (none of us here are experts on this). Perhaps though @vigna might be interested? |
I started to read the link. After a few lines, there's a phrase "Xoshiro128+ does not pass the PractRand test suite" linked to a site that does not discuss xoshiro128+ at all. The "also fails BigCrush", linked to a page talking about another completely different generator whose linear engine is by Marsaglia. And finally "has non-linear problems" pointing at a paper about linear problems 🤦🏻♂️. I really don't have time to really read this, sorry. 😂 At a glance, it's a standard MWC + some xorshift/add like the parallel generator by Agner Fog and many others—designs like this have been common in the last decades. It seems fine—it would be nice to have a C version and proper benchmarks word-at-a-time: filling arrays is OK but that's not the standard way you use a generator. Personally (my taste) to Marsaglia's MWCs I prefer the Goresky–Klapper generalization, which makes it possible to get very good spectral scores, like in
at the expense of an additional multiplication. The generator above is excellent without any scrambling. MWC have in general bad spectral scores because the multiplier is too small with respect to the modulus (MWCs and generalized MWCs simulate a multiplicative congruential generator). That's why usually they have very large state (e.g., Marsaglia mother-of-all) or some scrambling on top—to hide the defects. Both mechanisms have the usual issue of multiplicative generators—initial states differing by a small (and also not-so-small) multiplicative constant will generate strongly correlated sequences, because any multiplicative relationship between initial states is carried on forever. But that is fairly easy to fix that with careful initialization, and it is lessened by the scrambling. Just, don't let the user initialize the thing directly. |
Sorry, that should have read |
Good point. The code as written does not allow user initialized values to specify X3 or C, so trivial multiples should already be impossible with the code as-is. |
I'd go for PractRand, it is much better than BigCrush to do interleaving tests because you can easily measure when the tests start to fail. My 2€¢. I don't understand how you can avoid small-constant multiples with a test—there could be millions of generators. You cannot keep track of all the seeds you used. |
I'm not sure what you mean. What I meant to say was, when it is not initialized from a random source, but from user's code: while the seed is 256 bits, the user is only allowed to specify 128 of those bits. The upper 64 and lower 64 are hard coded constants. |
Perfect—that's what I meant by "Just, don't let the user initialize the thing directly.". |
BTW, I have no idea why you claim "For some reason this is not disclosed on the homepage" talking about linearity of the low bits of xoroshiro128+. My home page has a complete description of the linearity problems, and also of the Hamming-weight bias: "If, however, one has to generate only 64-bit floating-point numbers (by extracting the upper 53 bits) xoshiro256+ is a slightly (≈15%) faster generator with analogous statistical properties. For general usage, one has to consider that its lowest bits have low linear complexity and will fail linearity tests; however, low linear complexity of the lowest bits can have hardly any impact in practice, and certainly has no impact at all if you generate floating-point numbers using the upper bits (we computed a precise estimate of the linear complexity of the lowest bits). If you are tight on space, xoroshiro128++/xoroshiro128** (XOR/rotate/shift/rotate) and xoroshiro128+ have the same speed and use half of the space; the same comments apply. They are suitable only for low-scale parallel applications; moreover, xoroshiro128+ exhibits a mild dependency in Hamming weights that generates a failure after 5 TB of output in our test. We believe this slight bias cannot affect any application." There are many other debatable claims in your text, but this really jumps out. There is an entire page dedicated to the issue (and linked in the text above) as so much misinformation has been spread about it: http://prng.di.unimi.it/lowcomp.php |
Except it's not just the low bits, nor is it just one test: https://www.pcg-random.org/posts/xoroshiro-fails-truncated.html I would not go so far as to say "that is likely to impact an application". I agree these are subtle problems. But a big part of PRNGs is thinking about headroom. My expectation would be that on the "PRNG shootout" page under the "failures" column it would link to http://prng.di.unimi.it/lowcomp.php |
The above discussion about the keys gave me a really great idea: If the underlying MCG were an LCG there wouldn't be such sequences and the period would be larger. Because the modulus is already prime any number should work. I'll do a second post with all the details. |
That is a false statement. There is no linearity test including only the upper bits (say, the bits used for floating-point generation) that fails. If you have any evidence of that, please show it. Evidence is a linearity test from BigCrush or PractRand failing on such bits. Since there is a mathematical proof stating that this is not gonna happen, I am curious to see such evidence.
The page does not claim that—this is another false statement. "moreover, xoroshiro128+ exhibits a mild dependency in Hamming weights that generates a failure after 5 TB of output in our test". This dependency can be detected by different tests (such as DC6 test in PractRand), but our test is presently the most sensitive and picks up the bias much before (e.g., PractRand needs 32TB), which is why we refer to it. In any case, your claim of non-disclosure remains false (like most claims in your document, in fact). |
There are no LCGs with prime modulus. See Knuth, TAoCP II. |
That is literally what the blog I linked to above tested. They threw out the lower 32 bits and ran PractRand and it fails at 2^47 bits. I'm not doing original research here. |
Re lcg: Experimentally confirmed: Adding an increment to the Mwc has no effect on the period. I also tested Mwc32XXA8 on PractRand with various variations to test the failure point. If I add an increment it makes no discernable difference and also fails after 2^29 for all tested increment values. If I change it so the application passes the full seed instead of just 2 of the 4 components and do the same test it fails after 2^20 (1MB) outputs (Though I suspect it would have actually failed even sooner, it just didn't run any statistical tests until the first 1mb of output.) This is expected as the even and odd outputs should be just a shift difference from one another. So adding the increment seems to be just as effective as simply only allowing the user to set two of the four parts of the key. Both have zero cost on X86. So I don't see much of an advantage one way or the other. Doing both doesn't really seem to provide any noticeable advantage. I ran these same tests on the reduced period 40 bit state version with similar results. All else being equal I think it may be better to leave out the increment for the benefit of non-x86 architectures and simplicity of explanation, and just continue the policy of hardcoding X3 and C. |
If your notion of scientific evidence is a blog, the next step is chemtrails 😂.. I've checked the blog you linked—there is no linear test failing the upper 32 bits of xoroshiro128+, as predicted by theory. After half a petabyte of testing there's a Hamming-weight dependency test failing (BCFN_FF), which is expected—as I already remarked, and as explained on the home page, Hamming-weight dependencies will show at 5TB, much before, with a more sensitive test. If you don't like that mild bias, pick another generator. It appears, however, that you are very confused about what a linearity test is. Linearity tests are of two kinds: tests that measure the jumps in linear complexity of an output bit as measured by the Berlekamp–Massey algorithm, like the LinearComp test from TestU01, and tests that measure the distribution of the ranks of random matrices on F₂ on a subset of bits (possibly all), like the MatrixRank test from TestU01. The only test of this kind in PractRand is BRank (the second kind), and there is no failure of this kind in the upper 32 bits of xoroshiro128+, even after half a petabyte of testing. Your claims, thus, remain false. Considering your confusion about very basic testing techniques, the slew of mistakes in your document, and your comments above about fiddling experimentally with the sophisticated (and fragile) mechanisms of MWCs or LCGs without understanding the mathematics behind them, I'm sorry but I would not trust particularly the generators you are proposing now. |
The only claims I made were:
If you don't like that these were posted on a blog they are very easy results to reproduce. I did not make any claims about which type of tests fail, why they fail, or say anything about Matrices or Hamming-weight. Nor did I say anything about what these failures should mean to an application. You seem to be arguing about something I did not say.
If there is a factual inaccuracy, please post the relevant sentence and I will correct it. You are correct that I do not understand the theory so deeply that I can operate without empirical testing. Hence the generator design, testing methodology, and post explaining it, rely heavily on empirical tests. My knowledge is not so deep as to be able to look at a failing PractRand or BigCrush result and say that it is not a problem. I don't pretend to know how applications use random numbers, so I treat any test failure as a failure. If there is any bug in the code, please point it out and I will fix it. If there is a test or experiment that should to be run, name it and I will run it. |
I read your blog post and I'm a bit confused about your argument against Rand's algorithm of choice for Later you look at randograms of xoshiro32++ and other reduced-state generators. However, this is only qualitative, as you acknowledge. You also look at the number of BigCrush failures for the reduced variants. I'm not sure whether the sum of failures is a good metric, it strongly depends on the tests you chose. Some were designed with specific RNG algorithms in mind, so this metric is biased against them! More importantly, it is not clear how your results generalize to variants with larger state. (Xoshiro32++ is also not an official variant, and it is not clear which constants you used. There is precedence for using a xoroshiro32++ variant that might be worth looking into.) If I understood correctly, the number of BigCrush failures for reduced variants is the only metric (besides runtime performance) by which you want to improve over xoshiro256++? |
I am not proposing that. I am only asking that Mwc256XXA64 be evaluated for inclusion here as an option because it offers a useful alternative to PGG in that it has a 256 bit state and faster performance. The comparison with xoshiro is simply because it is the default generator today, hence any new algorithm must be measured against it.
I was referring to what you mentioned here: rust-random/rand#905 (comment) I do not think that the full size version of either PCG or xoshiro256++ have any statistical problems. My analysis with the reduced size versions on PractRand and BigCrush seems to confirm this.
Yes. Agreed. The point where it fails PractRand is far more relevant, and I cover that too. The two methodologies result in closely correlated results.
Test suites are by their very nature regression suites. They have tests added to them that generators have been found to fail. This is their purpose. It is "biased" in the sense that any new algorithm which is built afterwards will make sure not to make the mistakes that any previous one did. Of course it is not guaranteed that it won't have some entirely new problem. At any given time the most that can be said of any algorithm is that it passes every test invented so far. The problem with this methodology is that it doesn't allow for progress. Usually what happens is that algorithms because they cannot be proven correct are adopted first and then eventually a problem is found, or it is considered "good" because it became popular without anyone finding a flaw first. However in the meantime most other ideas are dead on arrival because they aren't popular enough to be considered good, and can't be validated. So little progress is made. By looking at where things fail, I am attempting to break out of this model and have a way to validate a new algorithm. The theory being that if there is some statistical flaw that no existing test can catch at full scale (Like exists in middle square) it may be found in a reduced size version. Algorithms like SHISHUA trivially fail this analysis, and hence should be looked upon with suspicion. Where as PCG and xoshiro++ both do very well. As does MwcXXA.
They generalize very well. I am working on a follow up blog post on this. I'll update here when it is ready. The constants were provided by David Blackman who is one of the authors on the paper.
Mwc256XXA64 should be compared against PCG64 as it is a member of that family. It is after all an permuted Mcg internally. Compared to PCG64 it offers twice the state size, both rightward and leftward propagation in the state update function, better looking randograms, fewer bigcrush failures at different size points, continues to pass PractRand longer with various state sizes, the ability to add some vectorization, a more than a 2x performance increase despite the larger state, and a 128 bit version with vastly better performance on 32 bit hardware. Many of these also apply comparing with xoshiro256++, though it also does well on 32 bit hardware. |
Perhaps it's time (or past!) to draw this "issue" to some kind of conclusion:
This thread has generated some commentary but no serious evaluation and is unlikely to — doing so is considerably beyond the scope of the project (which, lets be clear, is a community effort only, lacking both time and expertise required for useful evaluation).
I am undecided on this — we don't have any particular criteria for inclusion in this repository, but I think it is worth having at least one: that PRNGs be published elsewhere and this repository not be their primary "home". This PRNG is already published (in a way) at https://github.com/tkaitchuck/Mwc256XXA64, so adding it here isn't really useful in my opinion. (It might however be useful to publish on crates.io, which does not currently appear to be the case.)
I agree with @vks here — "the number of failing tests" is not a particularly useful metric, even if trying to beat tests is a worthwhile goal. To my (inexpert) eye, evaluating "scaled down" versions of the generators may be the most useful form of "metric" in your analysis, and it does appear that you have designed a reasonably good generator, but convincing me (or "the maintainers of this project") should not be your goal (see note above regarding evaluation). I think the only other thing worth discussing here is whether we add to the comparison in the book — I don't see any reason we shouldn't (though it should have some rewording: these aren't exactly "our RNGs"). @vks do you agree? Filling out the table column should be easy enough. A prerequisite is that this is published on crates.io (as mentioned above, this might just as well happen from the existing repository). |
@dhardy I would be happy to add Mwc to the benchmarks here, but this does not require it to live in the same repository.
I agree with putting more RNGs in the comparison table. (We might even want to add the Mersenne Twister for reference.) Maybe "Comparison of RNG algorithms" is a better heading?
Fair enough. I misunderstood your initial issue description as proposing a different algorithm for
This is a bit off-topic, but I think the 20 000 CPU hours in the paper refer to PCG with unknown increment? Anyway, rust-random/rand#905 is not about being worried that PCG is predictable (this is expected for a non-CSPRNG), but about the claim that it is difficult to predict. As far as I can see, we are not claiming that any of our non-CSPRNG are "difficult to predict", including xoshiro. (We did in the past, which is what rust-random/rand#905 was about.) Xoshiro is not expected to be unpredictable.
I think we are in agreement about the value of tests. I'm just saying that the metric of looking at the number of failed tests without more context is problematic. Some tests are extremely specific (I'm thinking of Marsaglia's clever binary rank test), so any algorithm that is fundamentally different would be expected to pass that test. Should such an algorithm be rewarded for this, while possibly failing a more important test?
Those are nice improvements! As @dhardy said, we cannot really evaluate the quality of the randomness. Note that doubling the state size is expected to double performance for small RNGs due to SIMD. You can just evaluate two RNGs at the same time. Better performance on 32 bit hardware is certainly good to have! |
I have published the crate here: https://crates.io/crates/pcg-mwc |
@tkaitchuck the API looks fine. I would suggest in the code you put both 32- and 64-bit versions into (private) sub-modules, then use I think I may as well close this now. |
Done |
Background
There has been a lot of discussion about what to do about SmallRng, when special instructions such as aes-ni is not available:
rust-random/rand#910, rust-random/rand#767, rust-random/rand#603
Previously it was PCG, but there were some issues:
rust-random/rand#907, rust-random/rand#905
Including the small state. It was switched to Xoshiro which was an improvement but is still not ideal.
I set about creating a better option for this case.
Design
I've written up the design and more about the motivation here:
https://tom-kaitchuck.medium.com/designing-a-new-prng-1c4ffd27124d
Explanation of quality is detailed there as well.
The code is checked in here:
https://github.com/tkaitchuck/Mwc256XXA64
@rust-random If there is agreement that this is worth adding as an option I can create a PR to add a
rand_mwc
crate.The text was updated successfully, but these errors were encountered: