Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uniform_real_distribution produces variates in its range _inclusive_ rather than exclusive. #19141

Open
llvmbot opened this issue Feb 7, 2014 · 25 comments
Labels
bugzilla Issues migrated from bugzilla libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.

Comments

@llvmbot
Copy link
Member

llvmbot commented Feb 7, 2014

Bugzilla Link 18767
Version unspecified
OS All
Reporter LLVM Bugzilla Contributor
CC @Quuxplusone,@hfinkel,@thiagomacieira

Extended Description

The following program should complete successfully.

#include <cassert>
#include <random>
#include <cmath>

int main() {
    double range[] = {1.0, std::nextafter(1.0, 2.0)};
    
    std::uniform_real_distribution<double> dist(range[0], range[1]);
    std::default_random_engine eng;
        
    for (int i=0; i<10; ++i) {
        double v = dist(eng);
        assert(range[0] <= v && v < range[1]);
    }
}

Using libc++ it instead produces an assertion failure. Other implementations (vc++ and libstdc++) exhibit the same behavior.

@llvmbot
Copy link
Member Author

llvmbot commented Feb 7, 2014

There is an inconsistency between uniform_int_distribution and uniform_real_distribution:

§26.5.8.2.1p1 says:

| A uniform_int_distribution random number distribution produces random
| integers i, a ≤ i ≤ b

And §26.5.8.2.2p1 says:

| A uniform_real_distribution random number distribution produces random
| numbers x, a ≤ x < b

However this clashes with the requirements for uniform_real_distribution's constructor:

| Requires: a ≤ b

If a == b then it's impossible for uniform_real_distribution to produce a number that satisfies a ≤ x < b, see DR 2168: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3893.html#2168

I'm guessing all the standard libraries you mention assume that §26.5.8.2.2p1 means <= instead of <.

@llvmbot
Copy link
Member Author

llvmbot commented Feb 8, 2014

I don't think this is due to a misunderstanding of the spec by implementors. I think it is due to a misunderstanding of how floating point round off can cause an unintended result.

The libc++ implementation is:

template<class _RealType>
template<class _URNG>
inline _LIBCPP_INLINE_VISIBILITY
typename uniform_real_distribution<_RealType>::result_type
uniform_real_distribution<_RealType>::operator()(_URNG& __g, const param_type& __p)
{
//     return (__p.b() - __p.a())
//         * _VSTD::generate_canonical<_RealType, numeric_limits<_RealType>::digits>(__g)
//         + __p.a();
}

And generate_canonical appears to be correctly coded to deliver results in the half-open range [0, 1). However when the domain is small enough, the above formulation, I believe, tends to result in __p.b() sometimes due to round off error, even though generate_canonical() never returns 1.

I believe a fix is simply:

template<class _RealType>
template<class _URNG>
inline _LIBCPP_INLINE_VISIBILITY
typename uniform_real_distribution<_RealType>::result_type
uniform_real_distribution<_RealType>::operator()(_URNG& __g, const param_type& __p)
{
    typename uniform_real_distribution<_RealType>::result_type __r;
    do
    {
        __r = (__p.b() - __p.a())
                * _VSTD::generate_canonical<_RealType, numeric_limits<_RealType>::digits>(__g)
                + __p.a();
    } while (__r >= __p.b());
    return __r;
}

@llvmbot
Copy link
Member Author

llvmbot commented Feb 11, 2014

fix
I've attached a patch which fixes as described above, and adds the test.

@Quuxplusone
Copy link
Contributor

Quuxplusone commented Feb 11, 2014

Howard, I don't understand how your proposed patch (or mathematical explanation) would deal with Seth's original program modified in light of Jonathan's observation. Consider:

#include <cassert>
#include <random>
#include <cmath>

int main() {
    double range[] = {1.0, 1.0};
    std::uniform_real_distribution<double> dist(range[0], range[1]);
    std::default_random_engine eng;
    for (int i=0; i<10; ++i) {
        double v = dist(eng);
        assert(range[0] <= v && v < range[1]);
    }
}

This program obviously fails the assertion with current libc++. But after your proposed patch, the first call to dist(eng) loops forever instead. Is that an improvement? I propose that no, it's not.

It looks to me as if Jonathan's observation is valid and the Standard has a defect in this area; it doesn't make sense to patch libc++ until the defect has been filed and resolved one way or the other.

@llvmbot
Copy link
Member Author

llvmbot commented Feb 11, 2014

Thanks Arthur. I think it would be a good idea for a debug mode uniform_real_distribution make noise (assert, whatever) if a == b. The uniform_real_distribution requirement of:

a <= b

is just a silly type-o in the standard. It should read:

a < b

and libc++ should just assume that the standard says that.

In Seth's test case: a < b.

My patch causes Seth's test to turn from failing to passing.

The case you and Jonathan are discussing is a different case than the one Seth is demonstrating.

@llvmbot
Copy link
Member Author

llvmbot commented Feb 11, 2014

Hmmm… One reasonable response to a == b, even in release mode, is to return a nan. Thoughts on that?

@llvmbot
Copy link
Member Author

llvmbot commented Feb 12, 2014

If it is desired that illegal limits result in a nan being returned from the operator(), here is my recommendation:

template<class _RealType>
template<class _URNG>
inline _LIBCPP_INLINE_VISIBILITY
typename uniform_real_distribution<_RealType>::result_type
uniform_real_distribution<_RealType>::operator()(_URNG& __g, const param_type& __p)
{
    typename uniform_real_distribution<_RealType>::result_type __r;
    if (!(__p.a() < __p.b()))
        __r = numeric_limits<_RealType>::quiet_NaN();
    else
    {
        do
        {
            __r = (__p.b() - __p.a())
                * _VSTD::generate_canonical<_RealType, numeric_limits<_RealType>::digits>(__g)
                + __p.a();
        } while (__r >= __p.b());
    }
    return __r;
}

The if statement is carefully crafted such that if __p.a() or __p.b() is a nan, or if __p.a() >= __p.b(), then you get a nan result (and all just with the one test).

@llvmbot
Copy link
Member Author

llvmbot commented Feb 12, 2014

I agree with Howard that the spec makes sense as-is and is not a defect: a must be less than b and the behavior is undefined otherwise, and generally this asymmetry between integral uniform distributions and real uniform distributions is desired.

I think a debug mode assertion and an infinite loop otherwise is fine.

@Quuxplusone
Copy link
Contributor

@​Howard: I still think the appropriate "fix" (IMHO) is to release-note that the behavior of uniform_real_distribution is the subject of a Defect Report, and then wait for the DR to be resolved. IMHO libc++'s current behavior is strictly better than any of the conceivable alternatives.

Nitpick on your patch: is it guaranteed that "__r >= __p.b()" must be meaningful for a user-defined _RealType, or should you rewrite that as "!(__r < __p.b())"?

Nitpick on returning NaN: some platforms may not support NaN, and on those that do you'll have to decide what kind of NaN to return (signaling, non-signaling...) IMHO the least bad of the terrible alternatives is to simply return "__p.a()" in that case, which of course is what my preferred implementation (current ToT) already does. :)

@llvmbot
Copy link
Member Author

llvmbot commented Feb 12, 2014

Clarification:

Seth wrote:

the spec makes sense as-is and is not a defect:
a must be less than b and the behavior is undefined otherwise

The spec doesn't actually say this, but it should. The spec is defective.

Quoting N3797:

In 26.5.8.2.2 [rand.dist.uni.real]/p1:

A uniform_real_distribution random number distribution produces random numbers x, a <= x < b,

which is good. But then in p2 says:

Requires: a <= b

which is bad. It should say in p2:

Requires: a < b

This seems like an obvious defect to me with only one way to correct it. It seems so obvious to me that my recommendation is for libc++ to go ahead and assume that p2 is corrected to:

Requires: a < b

@llvmbot
Copy link
Member Author

llvmbot commented Feb 12, 2014

@​Howard: I still think the appropriate "fix" (IMHO) is to
release-note that the behavior of uniform_real_distribution
is the subject of a Defect Report, and then wait for the
DR to be resolved.

This bug report is not about the case that a == b. The defect you are referring to is only about the case that a == b.

Nitpick on your patch: is it guaranteed that "__r >= __p.b()" must be
meaningful for a user-defined _RealType, or should you rewrite that
as "!(__r < __p.b())"?

26.5.4.4 [rand.req.genl]/p1/bd says:

d) that has a template type parameter named RealType is undefined
unless the corresponding template argument is cv-unqualified and
is one of float, double, or long double.

So I think "__r >= __p.b()" is fine.

Nitpick on returning NaN: some platforms may not support NaN,

To the best of my knowledge, libc++ is not supporting such a platform, nor claiming to be 100% portable to all platforms.

and on those that do you'll have to decide what kind of NaN
to return (signaling, non-signaling…)

In my patch I chose non-signaling. A signaling nan is nothing but a 25 year old bug in IEC 60559. They are not useful, unless non-portably transformed into exceptions. The idea for them (exceptions) was useful, just before its time.

IMHO the least bad of the terrible alternatives is to simply
return "__p.a()" in that case, which of course is what my preferred
implementation (current ToT) already does.

Ok, but this does not conform to [rand.dist.uni.real]/p1, a part of the current spec which is not broken. That being said, if p2 is fixed, this result will be undefined behavior, so libc++ can do anything it wants.

@llvmbot
Copy link
Member Author

llvmbot commented Feb 12, 2014

Clarification for all:

Two distinct issues are being discussed here, and I get the feeling that nobody is aware of this.

Issue 1:

When a < b, but b is just a tiny bit bigger than a, libc++ current returns results greater than or equal to b, which violates [rand.dist.uni.real]/p1.

Issue 2:

What should happen when !(a < b)? This is a real issue, but it is not the issue reported in this bug report. Complicating this question is a defect in the standard that says it is ok for a == b.

@Quuxplusone
Copy link
Contributor

@​Howard: The Standard has

In [rand.dist.uni.real]/p1:
A uniform_real_distribution random number distribution produces random numbers x, a <= x < b,
And then in p2:
Requires: a <= b

We agree that this is an "obvious defect".

In your opinion the best way to correct it is to s/<=/</ in p2.
In my opinion the best way to correct it is to s/</<=/ in p1.

The advantages of s/</<=/ in p1 are:

  • it matches the currently shipping implementations
  • it brings [rand.dist.uni.real] into agreement with [rand.dist.uni.int], allowing people to use them in template contexts without special cases
  • it has arguably nice mathematical properties, such as making uniform_real_distribution(a,b)() generate exactly the same set of numbers as -uniform_real_distribution(-b,-a)()

The advantages of s/<=/</ in p2 are:

  • ?????

Perhaps you or Seth could speak to WHY the "asymmetry between integral uniform distributions and real uniform distributions is desired," because I'm not getting it.

@llvmbot
Copy link
Member Author

llvmbot commented Feb 12, 2014

In your opinion the best way to correct it is to s/<=/</ in p2.
In my opinion the best way to correct it is to s/</<=/ in p1.

Ah! I did not realize this, thank you for clarifying.

Note that no matter which way this is decided in the standard, Seth's test case is not impacted. In this test case a < b. And so no matter what the committee decides for the case a == b should not impact how we address this bug report.

I've asked Walter Brown to comment on this issue. He is the lead author of N1932, which first proposed uniform_real_distribution in 2006.

@Quuxplusone
Copy link
Contributor

In your opinion the best way to correct it is to s/<=/</ in p2.
In my opinion the best way to correct it is to s/</<=/ in p1.

Ah! I did not realize this, thank you for clarifying.

Aha. We're getting closer and closer to being on the same page. ;)

Note that no matter which way this is decided in the standard,
Seth's test case is not impacted. In this test case a < b. And
so no matter what the committee decides for the case a == b
should not impact how we address this bug report.

You're still missing one point: If the defect is corrected by adopting "my" fix (s/</<=/ in p1), then Seth will have no bug report at all; libc++ ALREADY IMPLEMENTS that behavior! Seth was effectively complaining that libc++ had (accidentally) jumped the gun in implementing "my" fix before the DR had been addressed by the Committee.

I've asked Walter Brown to comment on this issue. He is the lead
author of N1932, which first proposed uniform_real_distribution in 2006.

Excellent. Walter should be able to shed much more light on the mathematical intent of uniform_real_distribution.

Interestingly, the original draft of N1932 had the same uniform_int_distribution as C++11, but its uniform_real_distribution had the precondition "a <= b" (same as C++11) and the postcondition "a < x < b" (different from C++11's "a <= x < b").

[So we still have two competing philosophies: In my view, C++11 corrected N1932's "a < x" to "a <= x" but forgot to correct "x < b" to "x <= b". In your view, C++11 corrected N1932's "a < x" to "a <= x" but forgot to correct "a <= b" to "a < b".]

@llvmbot
Copy link
Member Author

llvmbot commented Feb 12, 2014

But then in p2 says:

Requires: a <= b

which is bad. It should say in p2:

Requires: a < b

Well, p2 only says that a must be <= b, but it seems to me the spec fails to define the behavior when a==b because the equation defining the behavior in p1, p(x | a, b) = 1/(b − a), is undefined when a==b. So I think even in the current spec the behavior is undefined when a==b.

But I do agree it is a defect that p2 doesn't explicitly disallow a==b as well.

About your earlier comment:

And generate_canonical appears to be correctly coded to deliver results in the half open range [0, 1).

I'm attaching a program that seems to contradict this. I wonder if you could take a look and see if perhaps there's something wrong with my test?

@llvmbot
Copy link
Member Author

llvmbot commented Feb 12, 2014

@llvmbot
Copy link
Member Author

llvmbot commented Feb 12, 2014

Walter Brown asked me to post this comment on his behalf:

Thank you for inviting me to comment on the LWG issue associated with this bug report. Alas I seem currently unable to login to the bug tracker, so I hope you will be able to copy-and-paste the entirety of this note on my behalf; I apologize in advance for its length.

I have spent several hours now refreshing my memory by rereading substantial portions of the 20+ papers I (and others) wrote while <random> was being readied in the pre- and post-TR1 eras.

In brief, I am unconvinced that there is any "defect" or "type-o" in the C++11 specification of uniform_real_distribution, much less any "obvious" or "silly" one. Please permit me to elaborate.

The most recent change to the specification of uniform_real_distribution seems to have been in 2006. Prior to then, the requirement on the operator() functions was that they produce values x such that a < x < b. Since then we have instead required a <= x < b, as cited a few times in the bug report. We made that change in order to cater to requests from programmers who, at least in the C++ world, tend to be more comfortable with half-open intervals than they are with the fully open intervals that statisticians and mathematicians often prefer. (And I am, after all, a programmer!)

Please note that under no circumstances would we ever, ever, ever consider having a closed interval here. Sadly, the bug report mostly quotes the C++ standard out of context on this point, as did the LWG issue report, failing to cite the adjoining mathematics that carefully specifies the requirement on the associated probability density function, namely that it have the constant value given by 1/(b-a). This means that users must ensure that a!=b whenever operator() is called, else the resulting behavior is as undefined as is mathematical division by zero. This is not an accident, and must not be changed. ("Black holes are where God divided by zero." -- Steven Wright :)

[FWIW, my guidance to implementors here is to follow your existing/prevailing policy re users who violate a documented precondition: You can terminate(), or assert() first, or throw, or enter an infinite loop, or return a predetermined out-of-bandwidth value, or do whatever. But such actions are IMO at best a courtesy to users; as we all know, the C++ standard does not specify what happens when a precondition is violated.]

However, we don't need the same strict a!=b requirement when constructing an object of a uniform_real_distribution type. Such an object can live perfectly well with equal values of a and b. (Actually, we could have even permitted b < a, but that would have incurred unnecessary complexity and performance cost for implementations when calculating the behind-the-scenes scale factor. Based on this engineering assessment, we carefully crafted the user requirement to allow implementations to use the simple and cheap subtraction b-a.)

It can, of course, be argued that an object having a==b is inherently impotent, since it can never satisfy the precondition of its operator() and hence ought never be called. However, we had this argument many years ago, and it turns out to be false: It may be less well known, but there is in fact a well-specified overload of operator() that ignores the values with which the invoking object was constructed. Consider the following code fragment, which ought to compile happily after supplying appropriate header, namespace, and other context:

using dist_t = uniform_real_distribution<>;
using parm_t = typename dist_t::param_type;

default_random_engine e{};
dist_t d{0.0, 0.0};  // so far so good

auto variate = d(e, parm_t{4.5, 6.78});
 // in this call, ignore the distribution param values
 // that were supplied via d's c'tor, and instead use
 // the param values supplied here via parm_t's c'tor

I see no reason to forbid such code, as it seems perfectly well-formed, well-specified, well-behaved, and incredibly useful in applications whose distribution's parameters are in continual flux. Moreover, I see no inconsistency between the respective specified preconditions for constructing and invoking such a distribution object.

I intend to recommend, when the LWG issue is raised (likely later this week at the WG21 meeting), that the issue report be closed as Not A Defect. With your permission, I may turn this note into a short paper to document the reasoning underlying my position.

Finally, regarding the perceived inconsistency in specification between the uniform_real- and uniform_int- distributions, it is the latter that is the odd man out. We specified uniform_int_distribution with a closed range because we wanted to allow the full range of available values; we found no way, using a single integer type, to write an unambiguous half-open interval that encompasses all the values of that type.

Thanks again for drawing my attention to this report. Best,

-- WEB

@llvmbot
Copy link
Member Author

llvmbot commented Feb 12, 2014

Very interesting report on generate_canonical Seth!

I've taken a look and it looks like libc++ is precisely following the specification. What is happening is that the standard specifies in this case that the result should be:

4294967295.f / 4294967296.f

which theoretically should be just less than 1. However float does not have enough mantissa digits to accurately represent 4294967295. If you print both of these numbers out using std::hexfloat, which gives you the exact binary representation, you get:

0x1p+32
0x1p+32

And subsequently the result becomes 1.

I thought about doing everything in long double and then casting to RealType. However the standard explicitly says:

Calculates a quantity … using arithmetic of type RealType.

I'm open to suggestions from anyone listening on this.

@Quuxplusone
Copy link
Contributor

Quuxplusone commented Feb 12, 2014

Re Walter's explanation:
I admit I'm still reflexively skeptical, but apparently there is some kind of mathematical reason to prefer open intervals, so I withdraw my objection. I do still think that the defect deserves at least an explicit informative note in the Standard, though; it's very confusing to have an apparent typo in the official Standard, even though one's confusion can be easily cleared up after only 48 hours of discussion with the library implementor and the writer of the original proposal. ;)

Also, this morning I realized another reason that fully open intervals are more appropriate for RealType: under "my" closed-interval proposal, what on earth would we return for std::uniform_real_distribution<float>(-INFINITY, +INFINITY)?

@llvmbot
Copy link
Member Author

llvmbot commented Feb 13, 2014

what on
earth would we return for std::uniform_real_distribution<float>(-INFINITY, +INFINITY)?

Could that work at all? How do you distribute random numbers uniformly across an infinite range?

@llvmbot
Copy link
Member Author

llvmbot commented Sep 5, 2014

Is there any progress on this? I recently (re-)discovered the problem, see http://stackoverflow.com/questions/25668600/is-1-0-a-valid-random-number

@llvmbot
Copy link
Member Author

llvmbot commented Nov 5, 2015

Unaware of this thread, I raised issue #​1 yet again here:
http://stackoverflow.com/questions/33532853/c-bug-in-uniform-real-distribution

My apolitical and implementation-agnostic opinions on this subject are as follows:

On issue #​1:

a. It seems to me very natural to have [a,b) as the output range for uniform_real_distribution<>(a,b). The main reason is that drawing x from urd(0.0,2.0) is equivalent to drawing uniform bool b and then x from urd(0.0+b,1.0+b). With both-open or both-closed ranges, this wouldn't be true. For an integral domain, it is much easier/natural to refer to "the point before b", and decompose, e.g., [0,19] as [0,9]+[10,19]. Over the real domain, this can only be achieved with arguably "strange" things like std::nextafter.

b. I find it particularly undesirable to have urd<float> differ in behaviour from urd<double>. A user should not need to worry about floating point rounding issues when reading the specification of urd<>. Concretely,

log(1.0 - urd<Float>(0.0, 1.0))

should not be returning NaNs depending on the Float type. Because of this issue, it is my opinion (as a user) that this is a bug with SL.

On issue #​2:

c. IMO, the issue of what to do when a==b is more theoretical than #​1. Walter's argument, that this is implicitly disallowed by the mentioning of the pdf, is not entirely convincing. There's absolutely no need to allow this type of ambiguity in a standard by writing a<=b and then relying on indirect inferences to disallow the possibility that a==b.

@mclow
Copy link
Contributor

mclow commented Oct 13, 2017

*** Bug #34280 has been marked as a duplicate of this bug. ***

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 9, 2021
@llvm llvm deleted a comment from mclow Dec 28, 2021
@9t8
Copy link

9t8 commented Sep 8, 2022

Is there a solution? C++20 makes it "clear" that the [a, b) behavior is intended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

No branches or pull requests

4 participants