Created PCG random number generator #82

peastman · 2018-10-27T23:56:50Z

Implements #80.

When creating the default random number generator, I've currently hardcoded the seed value as 0. It probably would be better to make it nondeterministic so you'll get different random numbers every time you run your program. But I can't figure out how to do that without writing multiplatform code, which really seems excessive! Kotlin 1.3 will add functions like getTimeNanos(), plus the whole kotlin.random package. Until that's released, is there some place to grab a single number worth of entropy from without using platform specific code?

peastman · 2018-10-28T20:21:49Z

koma-core-api/common/src/koma/internal/randn.kt:63:9: error: using 'synchronized(Any, () -> R): R' is an error. Do not use 'synchronized' function in Kotlin/Native code
        synchronized(this) {
        ^

Apparently synchronized() isn't supported in Kotlin/Native. In fact, it seems they're pushing a totally different threading model in which memory is never directly shared between threads. How do you suggest handling this?

kyonifer

But I can't figure out how to do that without writing multiplatform code, which really seems excessive! Kotlin 1.3 will add functions like getTimeNanos(), plus the whole kotlin.random package. Until that's released, is there some place to grab a single number worth of entropy from without using platform specific code?

I know of klock, but I agree that waiting for 1.3 makes sense as it is imminent. In fact, I've already started work on build system support for 1.3.0-rc-190 which is currently blocked by a few upstreams bugs that have since been closed but not released yet. I'd say let's wait for 1.3, and open some issues to track these regressions in the meantime.

Apparently synchronized() isn't supported in Kotlin/Native. In fact, it seems they're pushing a totally different threading model in which memory is never directly shared between threads. How do you suggest handling this?

I think my long term vision for the rng is still a worker pool approach as per the discussion in #80 which should fit in well with the approach native uses with frozen objects and worker threads. Thus I'd recommend for now we just replace the synchronized blocks with NOPs on native and leave it unsafe on that platform. I'm pretty sure koma can't be used in a multi-threaded k/native environment right now anyway, as most of that code was written pre-frozen objects and needs an update.

kyonifer · 2018-10-28T20:39:05Z

koma-core-api/common/src/koma/internal/randn.kt

+/**
+ * This file implements random number generation using a PCG-XSH-RR generation.  Some elements of this file are based
+ * on the pcg-c-basic library (https://github.com/imneme/pcg-c-basic), which is also distributed under the Apache 2.0
+ * license.


IANAL but I believe the APL 2.0 requires us to retain the original copyright notices that were posted in the original source code, along with a notice that we've modified it. In particular:

You must cause any modified files to carry prominent notices stating that You changed the files; and You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and

My understanding is that translations to another language like this count as a "derivative work" (see https://www.rosenlaw.com/lj19.htm for a discussion).

I've also been trying to keep a running list of files that are owned by other projects at the end of the LICENSE file as I expect that list to grow.

I'd rather be too careful than step on anybody's toes.

Good point. I'll fix that.

kyonifer · 2018-10-28T20:46:27Z

koma-tests/test/koma/RandomTests.kt

+        val tol = 4*sqrt(count/bins)
+        for (i in 0 until bins) {
+            assert(binCounts[i] >= count/bins - tol)
+            assert(binCounts[i] <= count/bins + tol)


I know right now the seed is hard-coded by default, but that will hopefully change once we add a source of time-based randomness. I'd therefore suggest that in each of these tests we:

Explicitly set the seed (one or more times) at the beginning of the test

Assert that the output equals exactly the value we know to be the output for that seed

The idea is that we could catch subtle bugs which change the numerical characteristics on different platforms or implementations, and not allow our tests to fail stochastically in CI. For example, I'd expect the javascript backend to produce a different value because its numerics are different (js internally does 64-bit math for everything).

I can add more tests that check for specific values to make sure nothing has changed. But for the statistical tests it's important to use a different seed every time (which they currently don't, but Kotlin 1.3 will fix that). Otherwise you don't really know if the code works, only that the test passes for one particular seed.

Suppose you write a test that statistically should pass 99.9% of the time, but there's a bug that makes it fail 50% of the time. If you use a different seed each time, you'll quickly discover the problem because the test will keep failing. But if you use a fixed seed, there's a 50% chance you'll get unlucky and pick one for which the test passes. And it will keep on passing forever and you'll never know the code is broken.

(Yes, I've made that mistake before and gotten bitten by it!)

Sure, so the way I usually handle these cases is pick a set of static seeds (doesn't have to be one, could be a set of 100+ pre-picked numbers, but always the same ones) that I check on every commit (koma-tests runs in the CI every time a commit is pushed) and then have a truly random testing mode which can be manually enabled, perhaps with a -PenableRandomTests. When this gradle flag is set, the seeds are set completely randomly when it runs the random tests.

The reason to not allow truly randomized tests in the default test set that the CI runs (and that the user runs by default when they type ./gradlew test) is to prevent someone's new commit to randomly fail because we happened to draw samples from a tail (i.e. there is a valid output of a true rng that generates a thousand 1.0 values in a row, it's just very unlikely). I've chased down many a CI failure that seemed to be a bug in new code that turned out to just be unlikely seed selection in the test. The enableRandomTests mode can also run much longer (and have much lower tolerances) since it isn't the thing the user will be running by default.

We could also set the CI to run the truly randomized tests once a week, so that they aren't just forgotten and never used. That is, perform a periodic deep testing of the system.

kyonifer · 2018-10-28T20:50:51Z

koma-tests/test/koma/RandomTests.kt

+class RandomTests {
+
+    @Test
+    fun testReference() {


This is a great test as it alleviates us from needing to implement our own statistical correctness suite (or call down into TestU01 ourself). How hard would it be to do something similar for all the output types (Long, Double, ...)? Alternatively, we could try and check our outputs directly against another JVM or native implementation of PCG.

The reference implementation only generates 32 bit integers.

The java re-implementation seems to provide all the Random variants, but theirs may be just as susceptible as ours to a bug.

Hmmm... perhaps I'll open a new issue on this, add it to the list of future improvements (i.e. "Test our rng against a statistical suite, including the Box-Muller output")

koma-core-api/common/src/koma/internal/randn.kt

peastman · 2018-10-29T00:44:51Z

I'd say let's wait for 1.3, and open some issues to track these regressions in the meantime.

Sounds good.

Thus I'd recommend for now we just replace the synchronized blocks with NOPs on native and leave it unsafe on that platform.

That makes sense. It seems consistent with the recommended approach to threading on native. I can create a function syncNotNative() to calls synchronized() on JVM and JS, and is a NOP on native.

peastman · 2018-10-29T04:55:16Z

I've revised the distribution tests to each run 20 times with reproducible seed values.

peastman · 2018-10-29T05:32:47Z

I'm not sure what's going on with the CI failure. If I'm reading it correctly, it's saying that everything passes on Windows and Mac, but one test fails on Linux. Any idea what could be causing that? The test in question passes on my computer.

kyonifer · 2018-10-30T02:06:17Z

I'm not sure what's going on with the CI failure. If I'm reading it correctly, it's saying that everything passes on Windows and Mac, but one test fails on Linux. Any idea what could be causing that? The test in question passes on my computer.

Yep thats what's happening. I'm able to reproduce the failure locally on Ubuntu 18.04. Looks like the issue is that you aren't resetting the spare output of Box-Muller when setting the seed. This causes it to use the spare value for the first Gaussian requested after the seed was reset instead of generating a new one. However it only does that if a spare is available, so a 50% chance. This causes it to yield a different sequence with the same seed and ultimately the test to complain on ubuntu when run with gradle caching enabled. This patch fixes it for me:

--- a/koma-core-api/common/src/koma/internal/randn.kt
+++ b/koma-core-api/common/src/koma/internal/randn.kt
@@ -84,6 +84,7 @@ class KomaRandom {
      */
     fun setSeed(seed: Long) {
         syncNotNative(this) {
+            gaussianIsValid = false
             state1 = 0
             state2 = 0
             nextLongUnsafe()

Should hopefully fix the CI as well.

peastman · 2018-10-30T03:30:03Z

And the tests must get run in a different order on some platforms than others, so on some platforms it's produced an even number of values before then and on others it has produced an odd number. It's fixed now.

kyonifer · 2018-10-30T04:20:10Z

Yeah. And if you turn off the daemon, or run it again without running :clean first, or mess with the persistent caching it didn't happen. So the ordering is extremely brittle. Luckily this one reproduced on a clean first run on Linux.

This PR looks good here. I'll open those follow-up issues we mentioned and bring it in.

peastman added 2 commits October 27, 2018 16:49

Created PCG random number generator

5973ce6

Fixed an error

08f3dc0

kyonifer requested changes Oct 28, 2018

View reviewed changes

peastman added 4 commits October 28, 2018 18:28

Changes based on feedback

8efde54

Removed an unused import

20d6cb6

Repeat distribution tests with lots of fixed seeds

f14f70f

Optimized creating random arrays

687248e

Discard cached gaussian when setting seed

20ecb85

kyonifer merged commit e392717 into kyonifer:master Oct 30, 2018

This was referenced Oct 30, 2018

RNG needs to support time-based default seed #84

Open

Better random number generator #80

Closed

peastman deleted the random branch October 30, 2018 15:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Created PCG random number generator #82

Created PCG random number generator #82

peastman commented Oct 27, 2018

peastman commented Oct 28, 2018

kyonifer left a comment

kyonifer Oct 28, 2018

peastman Oct 29, 2018

kyonifer Oct 28, 2018

peastman Oct 29, 2018

kyonifer Oct 29, 2018

kyonifer Oct 28, 2018

peastman Oct 29, 2018

kyonifer Oct 29, 2018

peastman commented Oct 29, 2018

peastman commented Oct 29, 2018

peastman commented Oct 29, 2018

kyonifer commented Oct 30, 2018

peastman commented Oct 30, 2018

kyonifer commented Oct 30, 2018

Created PCG random number generator #82

Created PCG random number generator #82

Conversation

peastman commented Oct 27, 2018

peastman commented Oct 28, 2018

kyonifer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peastman commented Oct 29, 2018

peastman commented Oct 29, 2018

peastman commented Oct 29, 2018

kyonifer commented Oct 30, 2018

peastman commented Oct 30, 2018

kyonifer commented Oct 30, 2018