New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Created PCG random number generator #82
Conversation
Apparently |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I can't figure out how to do that without writing multiplatform code, which really seems excessive! Kotlin 1.3 will add functions like getTimeNanos(), plus the whole kotlin.random package. Until that's released, is there some place to grab a single number worth of entropy from without using platform specific code?
I know of klock, but I agree that waiting for 1.3 makes sense as it is imminent. In fact, I've already started work on build system support for 1.3.0-rc-190 which is currently blocked by a few upstreams bugs that have since been closed but not released yet. I'd say let's wait for 1.3, and open some issues to track these regressions in the meantime.
Apparently synchronized() isn't supported in Kotlin/Native. In fact, it seems they're pushing a totally different threading model in which memory is never directly shared between threads. How do you suggest handling this?
I think my long term vision for the rng is still a worker pool approach as per the discussion in #80 which should fit in well with the approach native uses with frozen objects and worker threads. Thus I'd recommend for now we just replace the synchronized
blocks with NOPs on native and leave it unsafe on that platform. I'm pretty sure koma can't be used in a multi-threaded k/native environment right now anyway, as most of that code was written pre-frozen objects and needs an update.
/** | ||
* This file implements random number generation using a PCG-XSH-RR generation. Some elements of this file are based | ||
* on the pcg-c-basic library (https://github.com/imneme/pcg-c-basic), which is also distributed under the Apache 2.0 | ||
* license. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IANAL but I believe the APL 2.0 requires us to retain the original copyright notices that were posted in the original source code, along with a notice that we've modified it. In particular:
You must cause any modified files to carry prominent notices stating that You changed the files; and
You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
My understanding is that translations to another language like this count as a "derivative work" (see https://www.rosenlaw.com/lj19.htm for a discussion).
I've also been trying to keep a running list of files that are owned by other projects at the end of the LICENSE file as I expect that list to grow.
I'd rather be too careful than step on anybody's toes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'll fix that.
koma-tests/test/koma/RandomTests.kt
Outdated
val tol = 4*sqrt(count/bins) | ||
for (i in 0 until bins) { | ||
assert(binCounts[i] >= count/bins - tol) | ||
assert(binCounts[i] <= count/bins + tol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know right now the seed is hard-coded by default, but that will hopefully change once we add a source of time-based randomness. I'd therefore suggest that in each of these tests we:
- Explicitly set the seed (one or more times) at the beginning of the test
- Assert that the output equals exactly the value we know to be the output for that seed
The idea is that we could catch subtle bugs which change the numerical characteristics on different platforms or implementations, and not allow our tests to fail stochastically in CI. For example, I'd expect the javascript backend to produce a different value because its numerics are different (js internally does 64-bit math for everything).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add more tests that check for specific values to make sure nothing has changed. But for the statistical tests it's important to use a different seed every time (which they currently don't, but Kotlin 1.3 will fix that). Otherwise you don't really know if the code works, only that the test passes for one particular seed.
Suppose you write a test that statistically should pass 99.9% of the time, but there's a bug that makes it fail 50% of the time. If you use a different seed each time, you'll quickly discover the problem because the test will keep failing. But if you use a fixed seed, there's a 50% chance you'll get unlucky and pick one for which the test passes. And it will keep on passing forever and you'll never know the code is broken.
(Yes, I've made that mistake before and gotten bitten by it!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, so the way I usually handle these cases is pick a set of static seeds (doesn't have to be one, could be a set of 100+ pre-picked numbers, but always the same ones) that I check on every commit (koma-tests
runs in the CI every time a commit is pushed) and then have a truly random testing mode which can be manually enabled, perhaps with a -PenableRandomTests
. When this gradle flag is set, the seeds are set completely randomly when it runs the random tests.
The reason to not allow truly randomized tests in the default test set that the CI runs (and that the user runs by default when they type ./gradlew test
) is to prevent someone's new commit to randomly fail because we happened to draw samples from a tail (i.e. there is a valid output of a true rng that generates a thousand 1.0
values in a row, it's just very unlikely). I've chased down many a CI failure that seemed to be a bug in new code that turned out to just be unlikely seed selection in the test. The enableRandomTests
mode can also run much longer (and have much lower tolerances) since it isn't the thing the user will be running by default.
We could also set the CI to run the truly randomized tests once a week, so that they aren't just forgotten and never used. That is, perform a periodic deep testing of the system.
class RandomTests { | ||
|
||
@Test | ||
fun testReference() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great test as it alleviates us from needing to implement our own statistical correctness suite (or call down into TestU01 ourself). How hard would it be to do something similar for all the output types (Long, Double, ...)? Alternatively, we could try and check our outputs directly against another JVM or native implementation of PCG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reference implementation only generates 32 bit integers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The java re-implementation seems to provide all the Random
variants, but theirs may be just as susceptible as ours to a bug.
Hmmm... perhaps I'll open a new issue on this, add it to the list of future improvements (i.e. "Test our rng against a statistical suite, including the Box-Muller output")
Sounds good.
That makes sense. It seems consistent with the recommended approach to threading on native. I can create a function |
I've revised the distribution tests to each run 20 times with reproducible seed values. |
I'm not sure what's going on with the CI failure. If I'm reading it correctly, it's saying that everything passes on Windows and Mac, but one test fails on Linux. Any idea what could be causing that? The test in question passes on my computer. |
Yep thats what's happening. I'm able to reproduce the failure locally on Ubuntu 18.04. Looks like the issue is that you aren't resetting the spare output of Box-Muller when setting the seed. This causes it to use the spare value for the first Gaussian requested after the seed was reset instead of generating a new one. However it only does that if a spare is available, so a 50% chance. This causes it to yield a different sequence with the same seed and ultimately the test to complain on ubuntu when run with gradle caching enabled. This patch fixes it for me: --- a/koma-core-api/common/src/koma/internal/randn.kt
+++ b/koma-core-api/common/src/koma/internal/randn.kt
@@ -84,6 +84,7 @@ class KomaRandom {
*/
fun setSeed(seed: Long) {
syncNotNative(this) {
+ gaussianIsValid = false
state1 = 0
state2 = 0
nextLongUnsafe() Should hopefully fix the CI as well. |
And the tests must get run in a different order on some platforms than others, so on some platforms it's produced an even number of values before then and on others it has produced an odd number. It's fixed now. |
Yeah. And if you turn off the daemon, or run it again without running This PR looks good here. I'll open those follow-up issues we mentioned and bring it in. |
Implements #80.
When creating the default random number generator, I've currently hardcoded the seed value as 0. It probably would be better to make it nondeterministic so you'll get different random numbers every time you run your program. But I can't figure out how to do that without writing multiplatform code, which really seems excessive! Kotlin 1.3 will add functions like
getTimeNanos()
, plus the wholekotlin.random
package. Until that's released, is there some place to grab a single number worth of entropy from without using platform specific code?