Implement backwards-compatible 'random' redesign #3619

occivink · 2016-12-04T08:51:46Z

This should preserve the behavior for 0 or 1 argument.

The seeding is a bit arbitrary (8*32 bits of random data for the ~~19937~~ 624*32 bits of internal state in the engine) but the initialization step of the algorithm is here to make the most of the initial data. It's definitely better than 32 bits seeding to produce 64 bits output.

Regarding performance, I'm not sure if there is any concern to be had. The first invocation should be slower due to initialization, but insignificantly so ($CMD_DURATION reports 0ms on my end).

ridiculousfish · 2016-12-04T09:01:50Z

Heh, this is the first C++11-only feature usage AFAICT.

ridiculousfish · 2016-12-04T09:08:58Z

The code looks good to me. Very modern.

From my reading, nobody seems to really like or want the step parameter, and the order of parameters is hard to remember. As written, we are also vulnerable to divide by 0, and probably LLONG_MIN/-1, leading to crashes. Let's just eliminate the step variant, unless someone champions it and wants to tackle the overflow issues.

occivink · 2016-12-04T09:15:06Z

Thank you. step is actually being checked for being strictly positive so I believe it should be okay.
Regarding overflow issues, the checks against start > end and step <= 0 should take care of them.

ridiculousfish · 2016-12-04T09:18:11Z

You're right, I missed those checks. How about end-start on line 1836 and 1840? It looks like that may overflow if start is negative.

occivink · 2016-12-04T10:47:48Z

Indeed, this is really a minefield. I tried to come up with a solution but couldn't find a clean one. It might be better to just remove step.

faho · 2016-12-04T11:45:30Z

src/builtin.cpp

+    long long result;
+    if (end - start < step) {
+        // nine nine nine nine nine nine
+        result = start;


I'd hate to lose the dilbert reference, but I'm not sure returning something deterministic is the right thing to do. Error?

I don't know, I'm of the opinion that if it is technically possible to produce a result we might as well do it, even if it doesn't make sense. Same reason as to why the start == end case is accepted.
It's less potential errors for scripts to handle (for example choose with only one argument).

I'm inclined to argue that this case and start == end is an error since it always returns a constant. I don't like giving people enough rope to hang themselves. The "choose with only one argument" case is interesting in that the naive implementation would call random 1 (count $list) hence the reason you're allowing it. The problem with that logic is it fails if $list is empty as you're then running random 1 0 which will return one and $list[1] is obviously wrong. Shells by their nature tend to be lenient but in this case I think we're being too lenient and thus likely to mask serious usage errors.

krader1961 · 2016-12-05T02:08:37Z

src/builtin.cpp

-
-    int argc = builtin_count_args(argv);
+    static bool seeded = false;
+    static std::mt19937_64 engine;


I'm still opposed to using this RNG engine. For one thing as the link makes it crystal clear that initializing the RNG with fewer seed bits than it requires can cause surprising behavior. Second, we do not need the guarantees it provides. We shouldn't even hint via code inspection that our RNG is suitable for cryptographic applications.

I do not see any legitimate argument for our random implementation to have a range larger than 0 to 4 GiB (or -2 GiB to 2 GiB). If someone can provide such an argument then it is sufficient to call a RNG that returns 32 bit values and merge them to result in a 64 bit value. Yes, doing that can produce statistical anomalies but, again, we should not even pretend to produce sequences that satisfy strong statistical guarantees. Our random numbers are meant for casual applications such as picking a value at random from a small set of values.

krader1961 · 2016-12-05T02:10:29Z

src/builtin.cpp

-
+    int argc = builtin_count_args(argv);
+    static const struct woption long_options[] = {{L"help", no_argument, 0, 'h'},
+                                                 {0,0,0,0}};


Please use NULL, or even better nullptr, for the args that represent pointers. I know that a literal zero is equivalent and large parts of the fish code does so (I'm slowly changing those). We shouldn't introduce more such bogosities 😄

krader1961 · 2016-12-05T02:12:18Z

src/builtin.cpp

+        // nine nine nine nine nine nine
+        result = start;
+    } else {
+        std::uniform_int_distribution<long long> dist(start, start+(end-start)/step);


You can run make style to ensure your code follows our documented style. In this case it would add whitespace around those binops.

krader1961 · 2016-12-05T02:16:15Z

share/functions/choose.fish

@@ -0,0 +1,6 @@
+function choose --description "Chooses a random item from a list"


I'd prefer to see this implemented via random choose (or random choice or random select) via a random function. See what we do with the history function to augment the history builtin. Adding new commands runs the risk of causing problems for someone with an external command by the same name. So we should not do so if there is a reasonable alternative.

I agree, let's not introduce a new function choose but instead make it a feature of random

krader1961 · 2016-12-05T02:18:37Z

While I obviously have a strong opinion about a couple aspects of this change overall I like it and greatly appreciate your taking the time to create an implementation, @occivink. The cherry on top of the sundae (metaphorically) would be at least a handful of unit tests to verify basic behavior such as bounds checking.

occivink · 2016-12-05T17:23:53Z

Thanks for the feedback, I'll take care of the open points that were raised.

Regarding the engine, I really think we shouldn't one use with a period of smaller than the maximum range we want to produce. Even if we restricted start and end to 32 bits and used std::minstd_rand0, it would be insufficient to ever produce the full range. Seeding the mersenne twister with 256 bits is not ideal, but it should give plenty of possible initial states and solves the most important issue mentioned in the article (the first value not covering the full 64 bit range).
And I think that it's better for somebody to look at the implementation and conclude that it's sufficient for their use (even if it might not be and then that's on them), than the opposite (i.e. somebody assuming that it's enough and getting really inadequate results, but we could have done better).
If you're categorically against it, I'll yield but I won't be really happy about it.

krader1961 · 2016-12-05T18:50:17Z

...somebody assuming that it's enough and getting really inadequate results...

This was discussed extensively. The decision was that we would not implement a random suitable for applications needing hard guarantees. People should use tools like openssl where appropriate. See @ridiculousfish's comment. As part of this change the man page should be modified to make it crystal clear no one should trust our implementation to be safe for use in cryptographic or equally demanding applications even if we use the Mersenne Twister engine. We do not have the expertise or desire to take on the responsibility for providing such guarantees. This is also why I am opposed to using a 64-bit generator. It makes it too likely someone will think they can safely use it in situations where it is not appropriate.

ridiculousfish · 2016-12-06T08:48:35Z

Where I come down is:

fish promises to generate random numbers that are good enough for a command line shell, which is a very low bar, therefore we
use whatever PRNG is easiest, least likely to be wrong, and least likely to raise eyebrows/questions.

Based on that, I think any of the (super over-designed, OMG) C++11 engines are fine. ~~MT is especially fine, since it's most widely used wikipedia says so so least likely to raise eyebrows.~~

Regarding whether to output 32 bit or 64 bit output: it seems to be no harder to use 64 bits. It's just changing the type, right? So we might as well just do 64 bit now and save ourselves the embarrassment in 10 years time.

Totally agree with krader to document that our PRNs are in no way suitable for cryptographic purposes.

Edit I just noticed that MT is pretty porky, at 2.5 KB state. fish uses 1.4 MB currently according to Activity Monitor, so the MT's contribution is significant (~1.3%). Let's just use a LCG engine, which has a puny state. shells are often invoked to recover from OOM scenarios, so we ought to be quite lean.

occivink · 2016-12-06T10:19:46Z

it seems to be no harder to use 64 bits
Let's just use a LCG engine

The C++11 default typedefs for LCG engines only support 32 bits output. Other possibilities include:

using non-STL constants for an LCG, such as "Knuth's preferred 64-bit LCG" (as mentioned in the article or wikipedia). Dangerously close to rolling our own prng.
ranlux_48 from the STL.
Ditching 64 bits output for 32 bits.

ranlux_48 seems like a pretty conservative choice to me.

krader1961 · 2016-12-07T04:14:55Z

use whatever PRNG is easiest, least likely to be wrong

That's why I'm arguing for one of the simpler 32 bit engines. One reason is how do you manually seed the MT engine given how many seed bits it requires. A simple 32 bit, or even 64 bit, int isn't sufficient. And that's all that random $seed can provide.

As for 32 versus 64 bit PRNGs I still think that given how random is used in shell scripts even 32 bits should be more than sufficient even a decade from now. If someone really needs a range larger than 4 billion then the shells random command is not the right tool for the job. Note that the range is not the same thing as the period. From what I can glean by googling it looks like the two randlux engines have a significantly larger period than the range of values they return.

The documentation at http://en.cppreference.com/w/cpp/numeric/random is awful regarding the characteristics of the various engines. And I suspect everyone else commenting on this change is just as confused as I am regarding which makes the most sense given our requirements.

occivink · 2016-12-11T12:36:38Z

Okay this should take care of the points that were raised.

The overflow handling is rather bulky but I'm reasonably confident in it (special mention to clang's undefined behaviour sanitizer). I'd understand if you'd rather completely drop STEP for simplicity of implementation.
I'm still somewhat concerned by the use of an engine with a period of 2^31-1 to cover 64 bits output, but at least uniform_int_distribution is making the result uniform the entire range. So really there should only be a problem if you use random as many times as the period, which is already a weird use-case.
I've allowed the trivial case of only one entry for random choice.

The documentation at http://en.cppreference.com/w/cpp/numeric/random is awful regarding the characteristics of the various engines.

No objections here.

ridiculousfish · 2016-12-16T03:08:43Z

I'm happy with this as is and would like to squash-merge it. Thank you again! I'd like to try to simplify some of the overflow checking but that can happen after merge. Any further comments @krader1961 ?

krader1961 · 2016-12-16T06:05:15Z

LGTM. I appreciate the comprehensiveness of the change. There are a handful of whitespace style issues and make lint warned about one semi-serious problem:

implicit conversion loses integer precision: 'long long' to 'result_type' (aka 'unsigned int')

for the engine.seed(seed); statement. Also, two lines later drop the } else {. Just do

            if (!parse_error) {
                engine.seed(seed);
                return STATUS_BUILTIN_OK;
            }
            return STATUS_BUILTIN_ERROR;

Even better would be to invert the logic:

if (parse_error) return STATUS_BUILTIN_ERROR;
engine.seed(seed);
return STATUS_BUILTIN_OK;

occivink · 2016-12-16T18:47:04Z

Thank you again! I'd like to try to simplify some of the overflow checking but that can happen after merge.

I'd appreciate, I probably made this more complicated than necessary.

@krader1961: how are you getting that message? cppcheck is not warning me of anything in my changes when I run make lint-all.

krader1961 · 2016-12-16T19:01:39Z

Don't know why cppcheck isn't giving you that warning because it should. See http://www.cplusplus.com/reference/random/linear_congruential_engine/seed/ where it says

result_type is a member type, defined as an alias of the first class template parameter (UIntType).
default_seed is a member constant, defined as 1u.

occivink · 2016-12-16T19:11:17Z

Not sure why either, there are a lot of other hints but nothing on src/builtin.cpp. Can you tell me if this fixes it?

std::seed_seq seq{ seed };
engine.seed(seq);

ridiculousfish · 2016-12-16T21:32:33Z

Well it's a decision we have to make - the seed value for the standard engine is 32 bits, but the interface allows specifying a 64 bit seed. Assuming we don't really care, I think the right fix is to just cast to the smaller size:

engine.seed(static_cast<uint32_t>(seed));

Or the more precise and annoying:

engine.seed(static_cast<std::minstd_rand::result_type>(seed));

krader1961 · 2016-12-17T00:31:02Z

I was going to recommend the same solution that @ridiculousfish just provided. Keep it 64-bits at the user level for consistency and to give us flexibility if we change the implementation such that a 64-bit seed would be useful. Suppress the warning by explicitly casting the value to indicate we know we're throwing away information.

occivink · 2016-12-17T09:26:47Z

Alright, I was hoping that seed_seq would turn my 64-bit input into a 32-bit sequence automatically, but it doesn't and I don't want to do that manually. Truncating is a good enough solution imo.

I couldn't find the whitespace issues you were talking about.
Btw, make style doesn't fix whitespace around binary operators.

krader1961 · 2016-12-21T00:53:15Z

Squash merged as 7996e15 and 1ace742. Many thanks, @occivink, for your hard work on this.

occivink · 2016-12-21T11:36:26Z

Likewise, I appreciate the time that you spent with me on this.

Implement part of the 'random' redesign proposal

dbb8676

ridiculousfish mentioned this pull request Dec 4, 2016

random should be redesigned #2642

Closed

faho reviewed Dec 4, 2016

View reviewed changes

krader1961 reviewed Dec 5, 2016

View reviewed changes

floam added the enhancement label Dec 6, 2016

random: error out in trivial cases

011a11d

occivink added 6 commits December 11, 2016 16:38

add unsigned long long variant to fish_wcsto*

215cfd8

random: add 'choice' variant

8bc19f0

random: add overflow checks

18160c8

random: add test

ee62a58

random: change engine to minstd_rand

118fafb

fish_wcstoull: disallow minus as leading character

a259347

random: Silence downcast warning

5574148

krader1961 closed this Dec 21, 2016

krader1961 added this to the fish 2.5.0 milestone Dec 21, 2016

github-actions bot locked as resolved and limited conversation to collaborators Apr 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement backwards-compatible 'random' redesign #3619

Implement backwards-compatible 'random' redesign #3619

occivink commented Dec 4, 2016 •

edited

Loading

ridiculousfish commented Dec 4, 2016

ridiculousfish commented Dec 4, 2016

occivink commented Dec 4, 2016

ridiculousfish commented Dec 4, 2016 •

edited

Loading

occivink commented Dec 4, 2016

faho Dec 4, 2016

occivink Dec 4, 2016

krader1961 Dec 5, 2016

krader1961 Dec 5, 2016

krader1961 Dec 5, 2016

krader1961 Dec 5, 2016

krader1961 Dec 5, 2016

ridiculousfish Dec 5, 2016

krader1961 commented Dec 5, 2016

occivink commented Dec 5, 2016

krader1961 commented Dec 5, 2016

ridiculousfish commented Dec 6, 2016 •

edited

Loading

occivink commented Dec 6, 2016

krader1961 commented Dec 7, 2016

occivink commented Dec 11, 2016

ridiculousfish commented Dec 16, 2016 •

edited

Loading

krader1961 commented Dec 16, 2016

occivink commented Dec 16, 2016

krader1961 commented Dec 16, 2016

occivink commented Dec 16, 2016

ridiculousfish commented Dec 16, 2016

krader1961 commented Dec 17, 2016

occivink commented Dec 17, 2016

krader1961 commented Dec 21, 2016

occivink commented Dec 21, 2016 •

edited

Loading

		@@ -0,0 +1,6 @@
		function choose --description "Chooses a random item from a list"

Implement backwards-compatible 'random' redesign #3619

Implement backwards-compatible 'random' redesign #3619

Conversation

occivink commented Dec 4, 2016 • edited Loading

ridiculousfish commented Dec 4, 2016

ridiculousfish commented Dec 4, 2016

occivink commented Dec 4, 2016

ridiculousfish commented Dec 4, 2016 • edited Loading

occivink commented Dec 4, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krader1961 commented Dec 5, 2016

occivink commented Dec 5, 2016

krader1961 commented Dec 5, 2016

ridiculousfish commented Dec 6, 2016 • edited Loading

occivink commented Dec 6, 2016

krader1961 commented Dec 7, 2016

occivink commented Dec 11, 2016

ridiculousfish commented Dec 16, 2016 • edited Loading

krader1961 commented Dec 16, 2016

occivink commented Dec 16, 2016

krader1961 commented Dec 16, 2016

occivink commented Dec 16, 2016

ridiculousfish commented Dec 16, 2016

krader1961 commented Dec 17, 2016

occivink commented Dec 17, 2016

krader1961 commented Dec 21, 2016

occivink commented Dec 21, 2016 • edited Loading

occivink commented Dec 4, 2016 •

edited

Loading

ridiculousfish commented Dec 4, 2016 •

edited

Loading

ridiculousfish commented Dec 6, 2016 •

edited

Loading

ridiculousfish commented Dec 16, 2016 •

edited

Loading

occivink commented Dec 21, 2016 •

edited

Loading