-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stdlib: add Random.int_in_range
and similar functions
#12459
Conversation
c13f863
to
a9f9da3
Compare
Answering my own question:
I guess this is for backwards compatibility, to preserve pseudo-random sequences with a given seed? In that case, the new function Waiting for confirmation. If my new understanding is correct, I believe it would be worthwhile to clarify the implementation by adding a small comment and by renaming |
I believe that the reason for |
Oh, that makes sense. In that case |
More on this question of intaux vs. int63aux:
It would be nice if |
It would also be nice to have the following properties:
I believe that the implementation as proposed does satisfy these properties: with Documenting them somewhere may be useful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that these functions are useful to have, and I find the naming choices tasteful.
I believe that the implementations are correct (including the conditional usage of intaux
), except possibly for corner cases of 32-64-compatibility of int_in_range
-- which I would prefer to see fixed.
In other words, I am heading towards approval. Stdlib PRs need two maintainer approvals, and PRNG stuff should get an opinion from @xavierleroy in any case.
I see only one documented compatibility property:
It trivially holds because Re-reading #9489, I see there was a special case for JS-of-OCaml and its 32-bit integers that disappeared in OCaml 5. Oh, well, nobody noticed :-) This will have to be discussed separately. |
Thanks for the background! I agree that the 32-64-compatibility is desirable, and that the current implementation does not satisfy it. I will add a test to check that. About what you said:
Unless I’m misunderstanding something,
I don’t follow here: are you saying that 32-64-compatibility implies to produce the same result on 32-bit and 64-bit platforms when 2^30 <= I agree with the other compatibility properties you mention. I will document them and try to add tests for them as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I usually prefer semi-open intervals [min, max+1)
to closed intervals [min, max]
, but I agree that it's nice to be able to take max = max_int
.
Trying to clarify the discussion: the "32-64-compatibility" property is ill-defined. There are THREE sizes for OCaml's type
The fancy footwork in #9489 was to guarantee consistent results for |
Concerning the equalities |
a9f9da3
to
8fbc154
Compare
I rebased, addressed all minor remarks but one, and in follow-up commits I addressed the "32-64-compatibility" question (the fact that, for 31-bit input,
No effort has been made to extend this compatibility to 32-bit integers (as found in JavaScript environments). Future work, as Xavier said. Regarding the other compatibility property (the fact that On my machine, I have one build failure that I cannot understand:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something is wrong in int31_in_range_aux
and more generally in all this fancy footwork for 31/32/63 bit compatibility. Time to go back to the drawing board. In the meantime, it might be good to add tests to testsuite/tests/lib-random/chi2.ml : bugs like "it should draw numbers in [-1000,1000] but the result is always positive" are nicely caught by such statistical tests.
cbfa2ac
to
ed4ffd9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the code for int_in_range
carefully, going so far as to draw the flowchart, and I think it is correct, although there may be ways to write this logic more concisely. (I'll play with it later as I try to reintroduce the JS-of-OCaml special case that went away between 4.14 and 5.0.)
The chi-square tests may have biases, see below.
The 32/64 compatibility test is a good idea. It seems to be failing on 32-bit x86, see Github CI logs, but I didn't investigate.
For the record, here is Euclidean division from truncating division :
|
ed4ffd9
to
ef5801c
Compare
I just pushed a fix to the chi2 tests. Because of the way I had chosen constants, the coverage of some tests was very poor. In particular, it was not catching the biased test spotted during review. Now, if we replace unsigned divisions with signed divisions, “suspicious results” are reported as expected. Quoting the commit message:
Do you need anything from me? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At long last I was able to return to this code. The current state looks good to me in terms of functionality and algorithms used. The code is sometimes hard to read, see below for an example.
Cc @Octachron.
FYI: starting from this PR, I have two extensions in preparation:
|
@gmevel we discussed this PR at today's triaging meeting; it seems that the ball is in your court. When you have time, could you consider Xavier's last-minute comments (and decide to change your code or not) so that we can merge? |
I have submitted my changes as a pull request against this pull request: gmevel#1 . We will also need a second reviewer from the core dev team, since this is a standard library change. |
My apologies, I misunderstood that Xavier’s commits were part of a PR that would supersede this one, or that he would push his commits to this PR. Do you just need me to pull Xavier’s commits (I just did)? |
c00700a
to
de20393
Compare
Should the JavaScript compatibility be mentioned in the doc? As the 32-bit OCaml / 64-bit OCaml already is. |
As far as I can say, this PR is good for merging. (Plus, I'm tired of reviewing it.) A second approval from a core developer is required, however. |
Would @damiendoligez or @Ekdohibs be interested in reviewing this PR? |
stdlib/random.ml
Outdated
We must have [-2{^nbits - 1} <= min <= max < 2{^nbits - 1}]. | ||
Moreover, for the iteration to converge quickly, the interval | ||
[[min, max]] should have width at least [2{^nbits - 1}]. *) | ||
let rec int_in_range_alt s ~min ~max ~nbits = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we document that near the 2^{nbits - 1}
limit, we have a quite high standard deviation in the number of samples (around 2+ε
)?
The suffix _alt
is also quite unclear, and it is not reused for the other variants of those functions: Maybe int_in_large_range
would be more readable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, done (7ebfbe4).
Explain a non-obvious bit of the implementation.
New functions: - `Random.int_in_range` - `Random.int32_in_range` - `Random.int64_in_range` - `Random.nativeint_in_range` - `Random.State.int_in_range` - `Random.State.int32_in_range` - `Random.State.int64_in_range` - `Random.State.nativeint_in_range`
- move the note about @before 5.0 to the top of the module, because it affects all functions rather than `bits` specifically - document all the Invalid_argument raised
This was already the case of `Random.full_int`. Comments have been added in the implementation to clarify that point. A test has been added, which covers `full_int` and `int_in_range`.
- `intaux` -> `int31aux` (because this one is for 31-bit integers) - `int63aux` -> `full_int_aux` (because this one is for any value of type `int`, regardless of the int size)
There are chi2 tests for `int_in_range` and similar, with intervals whose length is comprised between 2^k and 2^(k+1) for some k; in some tests, the length is 256*p, where p is some prime number such that 2^(k-8) < p < 2^(k-7). Previously, p had been chosen to be the smallest prime number larger than 2^(k-8), which made it very unlikely to draw samples larger than min_ + 2^k; hence, the coverage of the test was very poor. Now, p is chosen near 2^(k-8) * 3/2.
…bility JS-of-OCaml uses JavaScript's 32-bit integers. This commit makes sure that we get the same results with JavaScript and with 64-bit OCaml. For `Random.full_int` this was guaranteed in OCaml 4.14 but the code to do so was wrongly removed in 5.0.
also: - reword "fit in X bits" to "fit in X-bit signed integers" (disambiguation) - rename [int_in_range_gen] to [int_in_range_aux] (consistency with [int_aux])
Rename `int_in_range_aux` to `int_in_large_range`. Comment on the expected number of draws.
59ac2c4
to
57a498a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two rejection samplings are correct and restore the OS-independence of integer sampling.
Playing around with nodes confirmed that the implementation works and is testable with the js_of_ocaml
backend (even if I am now tempted to generalize a bit the chi2 testing).
And in term of API, the |
Great! Thanks @Octachron, this should now be ready to merge. The last remaining issue is the commit history. It is not so clean right now and I wonder what we would want to do about it. |
Squash! With a killer commit message that I'm about to write. |
This PR adds the following functions to
Random
:and their counterpart in the
Random.State
sub-module:Motivation
The already-existing function for drawing integers uniformly in an interval, namely
Random.full_int
, has limited functionality:(1) expressiveness: it does not support the full range of integers (it can only draw integers between 0 and
max_int
−1), so we cannot easily use it to emulate drawing from an arbitrary interval;(2) code clarity: even when its range suffices, it must be used in a convoluted way if wanting to draw from an interval whose lower bound is not zero; namely, the code pattern
min + Random.full_int (max - min + 1)
, which is not terribly explicit, prone to errors (it is easy to forget the+1
); worse, this code is broken in the general case becausemax - min + 1
may overflow [this is a re-statement of point (1)].Hence these new functions are useful. and they are not trivial to implement, (and indeed the implementation in Containers has the mentioned bug, for instance).
Interface
The proposed interface tries to be as abundantly explicit as possible about what the two integer arguments are: inclusive bounds (by contrast with exclusive bounds, or a length for the second argument, or something else yet…). This goes with the name "
int_in_range
" and labeled arguments "~min ~max
".Related functions in third-party libraries:
int_range : int -> int -> State.t -> int
.int_incl : int -> int -> int
and similar forint32
,int64
,nativeint
(it also hasfloat_range : float -> float -> float
where the upper bound is exclusive).Future work
An analogous function for
float
should be provided, but this PR does not tackle it. This might be as simple asmin +. float state (max -. min)
(as Base does it), but I do not feel bold enough to venture into the floating-point land (the subtraction scares me). Also, the documentation states that the bounds are inclusive, but the implementation makes it clear that both zero and the upper bound are in fact exclusive (unlessbound = 0.0
).Side comments
I also added one explanatory comment in the internal implementation of
Random
.Regarding said implementation: why isn’t
intaux
replaced withint63aux
entirely? In 32-bit OCaml, both functions are identical (so "int63aux
" is a misnomer). In 64-bit OCaml,intaux
produces the same thing asint63aux
, only with a restricted input range, and (if I get it right) a higher probability of re-drawing.intaux
does not even spare some random bits, it draws 64 bits of randomness anyway.