Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

math/rand: documentation on rng.go is lacking important context and information #36133

Open
Plazmaz opened this issue Dec 13, 2019 · 7 comments
Open

Comments

@Plazmaz
Copy link

@Plazmaz Plazmaz commented Dec 13, 2019

Does this issue reproduce with the latest release?

Yes

What did you expect to see?

Details about the algorithm used, links to source material, context about why this algorithm was used and how it differs from other methods.

What did you see instead?

go/src/math/rand/rng.go

Lines 7 to 12 in c2edcf4

/*
* Uniform distribution
*
* algorithm by
* DP Mitchell and JA Reeds
*/

A single vague comment listing two names, without the title of sources used, the name of the algorithm used, or any details whatsoever. The names alone don't seem to be sufficient for finding the source material (at least based on a good amount of googling)
It seems like I'm not the only person having this issue:
https://www.seehuhn.de/blog/134.html

Could provide some more context on what algorithm is being used, why it was chosen, and how it differs from something like mersenne twister? I think it would be valuable knowledge, and a good addition to that documentation.

@robpike
Copy link
Contributor

@robpike robpike commented Dec 13, 2019

See #21835.
Perhaps this package should suggest using x/exp/rand.

@ALTree ALTree changed the title Documentation on rng.go is lacking important context and information math/rand: documentation on rng.go is lacking important context and information Dec 13, 2019
@zephyrtronium
Copy link
Contributor

@zephyrtronium zephyrtronium commented Dec 14, 2019

For whoever might update the documentation: the default Source is a lagged Fibonacci generator with j=273, k=607.

To answer how it differs from Mersenne Twister, LFG is essentially a simple LFSR over elements in GF(2^64), which as #21835 mentions has a number of empirical and theoretical problems; MT is a linear recurrence over a polynomial in GF(2^19937) in its usual implementation with another linear transform applied to its outputs, giving it strong theoretical performance. (I can offer more details on the differences if desired later, but I have a final to take now.)

While I agree that math/rand probably ought to recommend a different generator for many applications, I find it odd for a standard package to recommend something in x/exp. How close is x/exp/rand to being promoted to x/math/rand or similar?

@robpike
Copy link
Contributor

@robpike robpike commented Dec 14, 2019

The proposal has not even been accepted yet. If it does get accepted, it will still be a while, as the process described in the proposal makes clear.

@dmitshur dmitshur added this to the Backlog milestone Dec 23, 2019
@dmitshur
Copy link
Contributor

@dmitshur dmitshur commented Dec 23, 2019

@Plazmaz
Copy link
Author

@Plazmaz Plazmaz commented Dec 4, 2020

So I've done quite a bit of digging since originally opening this issue. Something I can't find an answer to is why the LCG used to seed the initial algorithm uses Schrage's method instead of 64-bit arithmetic. I'm wondering if that was an intentional decision or simply an artifact of the original systems this implementation was written for. It seems like using 32-bit logic has a few downsides:

  1. Seeds are truncated to 32 bits for use in the LCG (I am not positive that's why this was done, but it seems to be the case)
  2. It is computationally less efficient to use two single-width multiplications than one double-width multiplication, and 64-bit operations are used quite often elsewhere in this code.

If someone has more context or info on the decision behind that, that would be great. It's possible I would need to ask one of the original authors (Ken Thompson seems like the person who'd know, just looking at this comment https://groups.google.com/g/golang-nuts/c/RZ1G3_cxMcM/m/_7J7tnHhsU4J).

@randall77
Copy link
Contributor

@randall77 randall77 commented Dec 4, 2020

Seeds are truncated to 32 bits for use in the LCG (I am not positive that's why this was done, but it seems to be the case)

They are not quite truncated, as they are moduloed (is that a word?) with an odd number first. But yes, half the bits of entropy are thrown away.

Just a guess, but seedrand does division/modulo on int32s, and maybe that was considered expensive if changed to int64 on 32-bit machines. Maybe that was a big deal ~10 years ago, but I don't think that's a big problem nowadays.

@Plazmaz
Copy link
Author

@Plazmaz Plazmaz commented Dec 4, 2020

I mean, it's being explicitly cast to an int32. The effect is either truncation of the higher bits or modulus, they're the same thing in this context. Regardless, the rest of this code is littered with 64 bit logic. To seed the generator, for each value in the state vector, three values from seedrand are used to generate an int64, which is then XOR'd with the 64-bit value in the state vector, so I think there'd still be plenty of problems on a 32-bit machine. Anyways, I am genuinely curious as to the intent behind this, and I have my own theories for sure, but if someone is actually aware of why this was done, that would be helpful. Otherwise I will see if I can get into contact with people involved in the original implementation and find out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants