-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement backwards-compatible 'random' redesign #3619
Conversation
Heh, this is the first C++11-only feature usage AFAICT. |
The code looks good to me. Very modern. From my reading, nobody seems to really like or want the |
Thank you. |
You're right, I missed those checks. How about |
Indeed, this is really a minefield. I tried to come up with a solution but couldn't find a clean one. It might be better to just remove |
long long result; | ||
if (end - start < step) { | ||
// nine nine nine nine nine nine | ||
result = start; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd hate to lose the dilbert reference, but I'm not sure returning something deterministic is the right thing to do. Error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, I'm of the opinion that if it is technically possible to produce a result we might as well do it, even if it doesn't make sense. Same reason as to why the start == end
case is accepted.
It's less potential errors for scripts to handle (for example choose
with only one argument).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to argue that this case and start == end
is an error since it always returns a constant. I don't like giving people enough rope to hang themselves. The "choose
with only one argument" case is interesting in that the naive implementation would call random 1 (count $list)
hence the reason you're allowing it. The problem with that logic is it fails if $list
is empty as you're then running random 1 0
which will return one and $list[1]
is obviously wrong. Shells by their nature tend to be lenient but in this case I think we're being too lenient and thus likely to mask serious usage errors.
|
||
int argc = builtin_count_args(argv); | ||
static bool seeded = false; | ||
static std::mt19937_64 engine; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still opposed to using this RNG engine. For one thing as the link makes it crystal clear that initializing the RNG with fewer seed bits than it requires can cause surprising behavior. Second, we do not need the guarantees it provides. We shouldn't even hint via code inspection that our RNG is suitable for cryptographic applications.
I do not see any legitimate argument for our random implementation to have a range larger than 0 to 4 GiB (or -2 GiB to 2 GiB). If someone can provide such an argument then it is sufficient to call a RNG that returns 32 bit values and merge them to result in a 64 bit value. Yes, doing that can produce statistical anomalies but, again, we should not even pretend to produce sequences that satisfy strong statistical guarantees. Our random numbers are meant for casual applications such as picking a value at random from a small set of values.
|
||
int argc = builtin_count_args(argv); | ||
static const struct woption long_options[] = {{L"help", no_argument, 0, 'h'}, | ||
{0,0,0,0}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use NULL, or even better nullptr, for the args that represent pointers. I know that a literal zero is equivalent and large parts of the fish code does so (I'm slowly changing those). We shouldn't introduce more such bogosities 😄
// nine nine nine nine nine nine | ||
result = start; | ||
} else { | ||
std::uniform_int_distribution<long long> dist(start, start+(end-start)/step); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can run make style
to ensure your code follows our documented style. In this case it would add whitespace around those binops.
@@ -0,0 +1,6 @@ | |||
function choose --description "Chooses a random item from a list" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to see this implemented via random choose
(or random choice
or random select
) via a random
function. See what we do with the history
function to augment the history builtin. Adding new commands runs the risk of causing problems for someone with an external command by the same name. So we should not do so if there is a reasonable alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, let's not introduce a new function choose
but instead make it a feature of random
While I obviously have a strong opinion about a couple aspects of this change overall I like it and greatly appreciate your taking the time to create an implementation, @occivink. The cherry on top of the sundae (metaphorically) would be at least a handful of unit tests to verify basic behavior such as bounds checking. |
Thanks for the feedback, I'll take care of the open points that were raised. Regarding the engine, I really think we shouldn't one use with a period of smaller than the maximum range we want to produce. Even if we restricted |
This was discussed extensively. The decision was that we would not implement a |
Where I come down is:
Based on that, I think any of the (super over-designed, OMG) C++11 engines are fine. Regarding whether to output 32 bit or 64 bit output: it seems to be no harder to use 64 bits. It's just changing the type, right? So we might as well just do 64 bit now and save ourselves the embarrassment in 10 years time. Totally agree with krader to document that our PRNs are in no way suitable for cryptographic purposes. Edit I just noticed that MT is pretty porky, at 2.5 KB state. fish uses 1.4 MB currently according to Activity Monitor, so the MT's contribution is significant (~1.3%). Let's just use a LCG engine, which has a puny state. shells are often invoked to recover from OOM scenarios, so we ought to be quite lean. |
The C++11 default typedefs for LCG engines only support 32 bits output. Other possibilities include:
|
That's why I'm arguing for one of the simpler 32 bit engines. One reason is how do you manually seed the MT engine given how many seed bits it requires. A simple 32 bit, or even 64 bit, int isn't sufficient. And that's all that As for 32 versus 64 bit PRNGs I still think that given how The documentation at http://en.cppreference.com/w/cpp/numeric/random is awful regarding the characteristics of the various engines. And I suspect everyone else commenting on this change is just as confused as I am regarding which makes the most sense given our requirements. |
Okay this should take care of the points that were raised. The overflow handling is rather bulky but I'm reasonably confident in it (special mention to clang's undefined behaviour sanitizer). I'd understand if you'd rather completely drop
No objections here. |
I'm happy with this as is and would like to squash-merge it. Thank you again! I'd like to try to simplify some of the overflow checking but that can happen after merge. Any further comments @krader1961 ? |
LGTM. I appreciate the comprehensiveness of the change. There are a handful of whitespace style issues and
for the
Even better would be to invert the logic:
|
I'd appreciate, I probably made this more complicated than necessary. @krader1961: how are you getting that message? |
Don't know why cppcheck isn't giving you that warning because it should. See http://www.cplusplus.com/reference/random/linear_congruential_engine/seed/ where it says
|
Not sure why either, there are a lot of other hints but nothing on
|
Well it's a decision we have to make - the seed value for the standard engine is 32 bits, but the interface allows specifying a 64 bit seed. Assuming we don't really care, I think the right fix is to just cast to the smaller size:
Or the more precise and annoying:
|
I was going to recommend the same solution that @ridiculousfish just provided. Keep it 64-bits at the user level for consistency and to give us flexibility if we change the implementation such that a 64-bit seed would be useful. Suppress the warning by explicitly casting the value to indicate we know we're throwing away information. |
Alright, I was hoping that I couldn't find the whitespace issues you were talking about. |
Likewise, I appreciate the time that you spent with me on this. |
#2642
This should preserve the behavior for 0 or 1 argument.
The seeding is a bit arbitrary (8*32 bits of random data for the
19937624*32 bits of internal state in the engine) but the initialization step of the algorithm is here to make the most of the initial data. It's definitely better than 32 bits seeding to produce 64 bits output.Regarding performance, I'm not sure if there is any concern to be had. The first invocation should be slower due to initialization, but insignificantly so ($CMD_DURATION reports 0ms on my end).