Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

random.choice IndexError due to double-rounding #68755

Closed
stevendaprano opened this issue Jul 5, 2015 · 61 comments
Closed

random.choice IndexError due to double-rounding #68755

stevendaprano opened this issue Jul 5, 2015 · 61 comments
Assignees
Labels
3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@stevendaprano
Copy link
Member

BPO 24567
Nosy @tim-one, @rhettinger, @mdickinson, @pitrou, @vstinner, @stevendaprano, @bitdancer, @skrah, @serhiy-storchaka
PRs
  • bpo-24567: Random subnormal.diff #7954
  • [3.7] bpo-24567: Random subnormal.diff (GH-7954) #7955
  • [3.6] bpo-24567: Random subnormal.diff (GH-7954) #7956
  • Files
  • choice_timings.txt
  • shuffle_timings.txt
  • handle_double_rounding2.diff
  • handle_double_rounding3.diff
  • sample_timings.txt
  • random_double_round.diff: For Python 3.6 and 3.7
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2018-06-27.07:07:38.703>
    created_at = <Date 2015-07-05.03:04:00.634>
    labels = ['3.7', 'type-bug', 'library']
    title = 'random.choice IndexError due to double-rounding'
    updated_at = <Date 2018-06-27.09:37:20.023>
    user = 'https://github.com/stevendaprano'

    bugs.python.org fields:

    activity = <Date 2018-06-27.09:37:20.023>
    actor = 'rhettinger'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2018-06-27.07:07:38.703>
    closer = 'rhettinger'
    components = ['Library (Lib)']
    creation = <Date 2015-07-05.03:04:00.634>
    creator = 'steven.daprano'
    dependencies = []
    files = ['39897', '39898', '39905', '39908', '39937', '47650']
    hgrepos = []
    issue_num = 24567
    keywords = ['patch']
    message_count = 61.0
    messages = ['246291', '246297', '246303', '246317', '246318', '246336', '246352', '246363', '246366', '246484', '246530', '246531', '246532', '246534', '246536', '246537', '246548', '246550', '246551', '246565', '246566', '246581', '246615', '246616', '246623', '246628', '246630', '246636', '246639', '246641', '246642', '246650', '246656', '246657', '246664', '246668', '246674', '246815', '267785', '267874', '275594', '277199', '277200', '304734', '320392', '320394', '320423', '320434', '320443', '320474', '320479', '320502', '320511', '320516', '320519', '320521', '320526', '320549', '320553', '320555', '320563']
    nosy_count = 10.0
    nosy_names = ['tim.peters', 'rhettinger', 'mark.dickinson', 'pitrou', 'vstinner', 'steven.daprano', 'r.david.murray', 'skrah', 'serhiy.storchaka', 'Serge Anuchin']
    pr_nums = ['7954', '7955', '7956']
    priority = 'normal'
    resolution = 'wont fix'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue24567'
    versions = ['Python 2.7', 'Python 3.6', 'Python 3.7']

    @stevendaprano
    Copy link
    Member Author

    While investigating bpo-24546, I discovered that at least some versions of Python on 32-bit Linux have a double-rounding bug in multiplication which may lead to an unexpected IndexError in random.choice.

    See http://bugs.python.org/issue24546 for additional discussion, but the short version is that due to double-rounding, if random.random returns (1. - 2.**-53), int(i * random.random()) may return 1 for some i >= 2049, and random.choice will then evaluate seq[len(seq)], raising IndexError.

    @stevendaprano stevendaprano added the type-bug An unexpected behavior, bug, or error label Jul 5, 2015
    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jul 5, 2015

    I've finally gotten hold of a gcc (5.1.0) that produces the unwanted code.

    For a *different* test case, even the -O0 and -O2 results differ, since
    with -O2 gcc uses mpfr (I think!) for constant folding, which produces
    the correct result.

    Using -mfpmath=sse seems to fix all test cases that I've tried.

    @vstinner
    Copy link
    Member

    vstinner commented Jul 5, 2015

    There is a simple fix. If the result is invalid, drop it and generate a new
    number. It should keep the distribution uniform.

    Another fix is to only use integers. Why do we use float in choice by the
    way?

    @mdickinson
    Copy link
    Member

    Note that this affects other Random methods, too. There are several places in the source where something of the form int(i * random.random()) is used to select an integer in the half-open range [0, i).

    And a nitpick: it's a stretch to describe double-rounding on multiplication (or any other arithmetic operation) as a bug: Python-the-language makes no promises about adhering to IEEE 754 arithmetic rule. (In fact, it makes no promises about using IEEE 745 formats in the first place.)

    @mdickinson
    Copy link
    Member

    IEEE 745

    IEEE 754. Stupid fingers.

    @tim-one
    Copy link
    Member

    tim-one commented Jul 5, 2015

    I suppose the simplest "fix" would be to replace relevant instances of

    int(random() * N)

    with

    min(int(random() * N), N-1)

    That would be robust against many kinds of arithmetic quirks, and ensure that platforms with and without such quirks would, if forced to the same internal random state, yield the same result when random() happens to return 1. - 2.**-53.

    Victor, what - precisely - do you have in mind with "only use integers"? The current method was used because it's simple, efficient, and obvious. Python 3.4.3 appears to (usually) use a different method, though, relying on C code to generate random bytes without going through floats.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jul 6, 2015

    Python-the-language makes no promises about adhering to IEEE 754 arithmetic rule.

    Still, I think we could switch to -mfpmath=sse, which would at least
    guarantee consistent behavior across different gcc versions.

    @mdickinson
    Copy link
    Member

    Still, I think we could switch to -mfpmath=sse, which would at least
    guarantee consistent behavior across different gcc versions.

    ... which would fix fsum on 32-bit Linux, too. (I thought there was an open issue for this, but now I can't find it.)

    @tim-one
    Copy link
    Member

    tim-one commented Jul 6, 2015

    Mark, closest I could find to a substantive SSE-vs-fsum report is here, but it was closed (because the fsum tests were changed to ignore the problem ;-) ):

    http://bugs.python.org/issue5593

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jul 9, 2015

    Qt had a similar initiative regarding -msse2 -mfpmath:

    http://lists.qt-project.org/pipermail/development/2013-December/014530.html

    They say that also Visual Studio 2012 has switched to sse2 by default.

    The only problem are the Linux distributions that are stuck in i386 land
    and who would probably disable the flags in their builds.

    @pitrou
    Copy link
    Member

    pitrou commented Jul 9, 2015

    I suppose the simplest "fix" would be to replace relevant instances of

    int(random() * N)

    with

    min(int(random() * N), N-1)

    That sounds simple and generic. It skews the distribution a tiny little bit, but it doesn't sound significant (perhaps Mark would disagree).

    @tim-one
    Copy link
    Member

    tim-one commented Jul 9, 2015

    It skews the distribution a tiny little bit, ...

    But it doesn't - that's the point ;-)

    If double-rounding doesn't occur at all (which appears to be the case on most platforms), absolutely nothing changes (because min(int(random() * N), N-1) == int(random() * N) on such boxes).

    If double-rounding does occur, double-rounding itself may change results "all over the place", and I haven't tried to analyze what effects that has on the distribution. In the comparative handful of cases where int(random() * N) == N on such boxes, clamping that back to N-1 just yields the same result we would have gotten on a box that didn't do double-rounding.

    @vstinner
    Copy link
    Member

    vstinner commented Jul 9, 2015

    Again, why not using only integers?

    Pseudo-code to only use integers:

    def randint(a, b):
        num = b - a
        if not num:
            return a
        nbits = (num + 1).bit_length()
        while True:
            x = random.getrandbits(nbits)
            if x <= num:
                break
        return a + x

    (It doesn't handle negative numbers nor empty ranges.)

    @tim-one
    Copy link
    Member

    tim-one commented Jul 9, 2015

    Victor, if people want to use getrandbits(), we should backport the Python3 code, not reinvent it from scratch.

    Note too Mark's comment: "There are several places in the source where something of the form int(i * random.random()) is used". The min() trick is relatively easy to apply via simple, local editing of such places.

    Your "It doesn't handle negative numbers nor empty ranges" should be a hint about what a pain it is to rewrite everything wholesale to use a _fundamentally_ different method.

    @vstinner
    Copy link
    Member

    Victor, if people want to use getrandbits(), we should backport the Python3 code, not reinvent it from scratch.

    Sorry, I don't understand your comment. Backport what? getrandbits() is available on Python 2 and Python 3, for Random and SystemRandom.

    I propose to rewrite Random.randint() in random.py.

    Your "It doesn't handle negative numbers nor empty ranges" should be a hint about what a pain it is to rewrite everything wholesale to use a _fundamentally_ different method.

    In fact, I didn't check if the my code works for negative numbers. A few more lines should be enough to handle them. I just wanted to show a short pseudo-code to discuss the idea.

    I don't understand your point on the pain. It's easy to replace int(i * random.random()) with randint(0, i) (sorry, I'm too lazy to check if it should be i, i-1 or i+1 :-)).

    The Glib library used floats in their g_rand_int_range() method, but the function now uses only integers. The GSL library also uses integers. I'm surprised that Python uses floats to generate random numbers. For me, they are less reliable than integers, I always fear a bias caused by complex rounding and corner cases.

    Do other libraries and/or programming languages use float to generate a random integer?

    @tim-one
    Copy link
    Member

    tim-one commented Jul 10, 2015

    Victor, don't ask me, look at the code: the random.choice() implementations in Python 2 and Python 3 have approximately nothing in common, and "the bug" here should already be impossible in Python 3 (but I can't check that, because I don't have a platform that does double-rounding). I already pointed out (in my 2015-07-05 15:42 note) that Python 3 uses "only integers" in most cases.

    "It's easy to replace int(i * random.random()) with randint(0, i)". You didn't say that was your intent, and I didn't guess it. In that case you also have to weigh in the considerable extra expense of adding another Python-level function call. The speed of these things is important, and it's annoying enough just to add the expense of calling the C-level min().

    @pitrou
    Copy link
    Member

    pitrou commented Jul 10, 2015

    I propose to rewrite Random.randint() in random.py.

    If that would give a different sequence of random numbers, I'm not sure that's acceptable in a bugfix release. Raymond can shed a light.

    @vstinner
    Copy link
    Member

    As I wrote, glib switch from floats to integers to generate random
    numbers. To provide reproductible "random" numbers, an environment
    variable was added to select the old PRNG.

    Anyway, if we modify random.py, the generated numbers should be different, no?

    To me, it's always a strange concept of having reproductible "random"
    numbers :-) But I understand the use case. For example, to run a test
    suite, we want randomization and to be able to replay a test suite in
    the same order.

    @pitrou
    Copy link
    Member

    pitrou commented Jul 10, 2015

    As I wrote, glib switch from floats to integers to generate random
    numbers.

    And as I wrote, this would be accepted in a feature release. Not necessarily a bugfix release.

    @tim-one
    Copy link
    Member

    tim-one commented Jul 10, 2015

    Anyway, if we modify random.py, the generated
    numbers should be different, no?

    Not in a bugfix release. The min() trick changes no results whatsoever on a box that doesn't do double-rounding.

    On a box that does do double-rounding, the only difference in results is that the min() trick stops a nasty exception in a relative handful of cases (& makes no difference to any case in which that exception isn't raised).

    The sequence of results may be different on platforms with double-rounding and without double-rounding, but that's always been true. The min() trick changes nothing about that either, except to prevent unintended exceptions on double-rounding boxes.

    Note that switching to use SSE2 instead also changes nothing on boxes that don't do double-rounding. It would change some results (beyond _just_ stopping bogus exceptions) on boxes that do double-rounding.

    @vstinner
    Copy link
    Member

    Ok, it looks like most people are in favor of min(). Can anyone propose a patch?

    @serhiy-storchaka
    Copy link
    Member

    There are other affected methods: randrange(), randint(), shuffle(), sample().

    @rhettinger
    Copy link
    Contributor

    [Antoine]

    If that would give a different sequence of random numbers,
    I'm not sure that's acceptable in a bugfix release. Raymond
    can shed a light.

    You're right. It is not acceptable to give a different sequence of random numbers within a bugfix release.

    [Victor]

    Ok, it looks like most people are in favor of min().
    Can anyone propose a patch?

    I will work up a patch, but I can't say that I feel good about making everyone pay a price for a problem that almost no one ever has.

    @rhettinger
    Copy link
    Contributor

    FWIW, here are some variants (with differing degrees of brevity, clarity, and performance):

       def choice(self, seq):
            """Choose a random element from a non-empty sequence."""
            n = len(seq)
            i = int(self.random() * n)
            if i == n:
                i = n - 1
            return seq[i]
    
        def choice(self, seq):
            """Choose a random element from a non-empty sequence."""
            try:
                return seq[int(self.random() * len(seq))]                                                                                                         
            except IndexError:
                return seq[-1]
    
        def choice(self, seq):
            """Choose a random element from a non-empty sequence."""
            n = len(seq)
            return seq[min(int(self.random() * n), n-1)]

    @rhettinger rhettinger self-assigned this Jul 11, 2015
    @tim-one
    Copy link
    Member

    tim-one commented Jul 12, 2015

    [Raymond]

    I can't say that I feel good about making everyone pay
    a price for a problem that almost no one ever has.

    As far as I know, nobody has ever had the problem. But if we know a bug exists, I think it's at best highly dubious to wait for a poor user to get bitten by it. Our bugs aren't their fault, and we have no idea in advance how much our bugs may cost them.

    Note that there's no need to change anything on boxes without double-rounding, and those appear to be the majority of platforms now, and "should" eventually become all platforms as people migrate to 64-bit platforms. So, e.g.,

        if double_rounding_happens_on_this_box:
            def choice(...):
                # fiddled code
        else:
            def choice(...):
                # current code just indented a level

    Then most platforms pay nothing beyond a single import-time test. Note that there's already code to detect double-rounding in test_math.py, but in that context not to _fix_ a problem but to ignore fsum() tests that fail in its presence.

    And just to be annoying ;-) , here's another timing variation:

        i = int(random() * n)
        return seq[i - (i == n)]

    @rhettinger
    Copy link
    Contributor

    See the attached timings. The performance hit isn't bad and the code's beauty isn't marred terribly. Yes, we'll fix it, but no I don't have to feel good about it ;-)

    @serhiy-storchaka
    Copy link
    Member

    For shuffle I would write "if j < i".

    I think 3.x should be fixed as well as 2.7. And I like Tim's suggestion about import-time patching.

    @serhiy-storchaka
    Copy link
    Member

    Unfortunately a link to Rietveld is not available for review and inlined comments.

    In sample() the selected set can be initialized to {n} to avoid additional checks in the loop for large population. In the branch for small population, non-empty pool can be extended ("pool.append(pool[-1])") to avoid additional check in the loop.

    In choice() I would write the condition as "i == n > 0" to avoid indexing with negative index. Custom collection can has non-standard behavior with negative indices. This doesn't add additional cost in normal case.

    @tim-one
    Copy link
    Member

    tim-one commented Jul 12, 2015

    Hmm. Looks like the answer to my question came before, via "Custom collection can has non-standard behavior with negative indices." Really? We're worried about a custom collection that assigns some crazy-ass meaning to a negative index applied to an empty sequence? Like what? I don't really care about supporting insane sequences ;-)

    @tim-one
    Copy link
    Member

    tim-one commented Jul 12, 2015

    [Serhiy Storchaka]

    ... I want to say that double rounding causes not
    only bias from ideal distribution, but a difference
    between platforms

    That's so, but not too surprising. On those platforms users will see differences between "primitive" floating addition and multiplication (etc) operations too.

    There's a tradeoff here. In Python's earliest days, the bias was strongly in favor of letting platform C quirks shine through. Partly to make Python-C interoperability easiest, but at least as much because hiding cross-platform fp quirks is a major effort and even Guido had only finite time ;-) As the years go by, the bias has been switching in favor of hiding platform quirks. I hope Python 3 continues in that vein, but Python 2 probably needs to stay pretty much where it is.

    @serhiy-storchaka
    Copy link
    Member

    See the attached timings for sample(). Patched sample2 is at worst 4% slower than original sample, and optimized sample3 is between sample and sample2. In any case the difference is pretty small, so I'm good with Raymond's variant if it looks better for him.

    Please note that 3.x also needs the patch.

    @serhiy-storchaka
    Copy link
    Member

    Ping.

    @rhettinger
    Copy link
    Contributor

    Will get to it shortly.

    @mdickinson
    Copy link
    Member

    A similar bug affects the new choices method: in the choices source, if random() * total happens to round up to total, the bisect call returns an out-of-range index.

    There are two ways that that could happen: (1) double rounding, as in this issue (which will occur very rarely and is hard to reproduce), and (2) total being subnormal (easy to reproduce, but unlikely to occur in practice).

    >>> from random import choices
    >>> choices(500, population=[1, 2], weights=[1e-323, 1e-323])
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/mdickinson/Python/cpython/Lib/random.py", line 360, in choices
        return [population[bisect(cum_weights, random() * total)] for i in range(k)]
      File "/Users/mdickinson/Python/cpython/Lib/random.py", line 360, in <listcomp>
        return [population[bisect(cum_weights, random() * total)] for i in range(k)]
    IndexError: list index out of range

    @serhiy-storchaka
    Copy link
    Member

    Ping.

    @serhiy-storchaka serhiy-storchaka added the 3.7 (EOL) end of life label Sep 22, 2016
    @serhiy-storchaka
    Copy link
    Member

    I'll write that as "'i == n and n > 0" which reads better and runs faster in the common case (look at the disassembly of each).

    bpo-27236

    @serhiy-storchaka
    Copy link
    Member

    Do you mind to create a pull request Raymond?

    @serhiy-storchaka serhiy-storchaka added the stdlib Python modules in the Lib dir label Oct 22, 2017
    @rhettinger
    Copy link
    Contributor

    Attaching a patch for 3.6 and 3.7. Haven't had a chance to run timings yet. Since it involves adding code to the inner loop, presumably it will have a noticeable affect on speed. Am not sure how to test this (I don't seem to be able to reproduce the OP's problem on any system I have access to). I'm not even sure whether the double rounding still exists any extant platform and if so whether we could just turn it off as part of the configuration.

    At any rate, I don't think the code in 2.7 should change at all. That tool is almost at end-of-life and we should be very conservative with it at this point.

    @tim-one
    Copy link
    Member

    tim-one commented Jun 25, 2018

    There are a couple bug reports here that have been open for years, and it's about time we closed them.

    My stance: if any platform still exists on which "double rounding" is still a potential problem, Python _configuration_ should be changed to disable double rounding on such platforms (there's usually a C compiler flag that can make this happen, but even if there isn't a couple lines of inline assembler could be added at Python startup to set the Pentium's FPU "precision control" bits to "round to 53 bits" mode).

    random is a red herring here! We don't want gratuitously different results across 754 boxes in any operations.

    @mdickinson
    Copy link
    Member

    Python _configuration_ should be changed to disable double rounding on such platforms

    That sounds good to me, with the nitpick that setting an x87 FPU to 53-bit precision still doesn't avoid potential double rounding on underflow, and I'm not aware of any easy way to "fix" x87-using code to avoid double rounding on underflow.

    That said, the vast majority of practical applications, including random.choice, math.fsum, dtoa.c, aren't going to be affected by double rounding on underflow, so it should only rear its ugly head in corner cases.

    If we do this, can we also persuade Guido to Pronounce that Python implementations assume IEEE 754 format and semantics for floating-point?

    @mdickinson
    Copy link
    Member

    [Tim]

    a couple lines of inline assembler could be added at Python startup to
    set the Pentium's FPU "precision control" bits to "round to 53 bits" mode

    On second thoughts, I don't think this will work; or at least, not easily. The libm on such a platform may expect the FPU to be in 64-bit precision mode. So we'd potentially need to switch the precision before calling any math library function and then switch it back again afterwards. And then there may be 3rd party libraries that need the same treatment.

    On the existence of platforms with double rounding, last time I checked, 32-bit Linux was the most obvious offender on Intel hardware. Windows used to use the x87 but set the precision to 53 bits, and I think newer versions may well use SSE2 in preference to x87. macOS has used SSE2 for a good number of releases now.

    And indeed, I just tested Python 3.5.2 on Ubuntu 16.04.4 LTS 32-bit (running in a VM on a not-too-ancient MacBook Pro), and it exhibits double rounding:

    >>> (1e16 + 2.9999).hex()
    '0x1.1c37937e08002p+53'
    >>> (1/2731).hex()
    '0x1.7ff4005ffd002p-12'
    >>> 1/(1 - 2**-53)
    1.0

    Expected results without double rounding:

    >>> (1e16 + 2.9999).hex()
    '0x1.1c37937e08001p+53'
    >>> (1/2731).hex()
    '0x1.7ff4005ffd001p-12'
    >>> 1/(1 - 2**-53)
    1.0000000000000002

    @tim-one
    Copy link
    Member

    tim-one commented Jun 25, 2018

    Mark, do you believe that 32-bit Linux uses a different libm? One that fails if, e.g., SSE2 were used instead? I don't know, but I'd sure be surprised it if did. Very surprised - compilers have been notoriously unpredictable in exactly when and where extended precision gets used in compiled code, so sane code (outside of assembler) doesn't rely on it.

    I'd be similarly surprised if hypothetical 3rd party libraries _assuming_ extended arithmetic existed. Any sane person writing such a library would take it upon themselves to force extended precision on entry (if that's what they wanted), and restore the original FPU control state on exit.

    I'm no more worried about this than, say, worried about that some dumbass platform may set the rounding mode to "to plus infinity" by default - and I wouldn't hesitate there either for Python startup to force it to nearest/even rounding. Sure, there _may_ be some library out there for such a platform that assumes +Inf rounding, but I fundamentally don't care ;-)

    In any case, random remains a red herring. There are potential gratuitous numeric differences all over the place.

    @mdickinson
    Copy link
    Member

    [Tim]

    Mark, do you believe that 32-bit Linux uses a different libm?

    I don't know. But I'd expect to see accuracy losses as a result of forcing 53-bit precision, and I wouldn't be totally surprised to see other more catastrophic failure modes (infinite iterations, for example). But I don't have any concrete information on the subject.

    There's this from (an unofficial mirror of) the glibc x86 sources: https://github.com/bminor/glibc/blob/43b1048ab9418e902aac8c834a7a9a88c501620a/sysdeps/x86/fpu_control.h#L69

    #define _FPU_EXTENDED 0x300 /* libm requires double extended precision. */

    But it's not clear to me whether that comment is intended to be a statement of fact, or indeed whether it's still true or just an ancient comment that should have been culled long ago.

    Like I said, I just don't know, but I'd be worried about the law of unintended consequences.

    Of course, with my Enthought hat on, I don't care, because we stopped worrying about 32-bit Linux long ago... :-)

    @mdickinson
    Copy link
    Member

    Some relevant notes from Vincent Lefèvre here: http://www.vinc17.net/research/extended.en.html

    @tim-one
    Copy link
    Member

    tim-one commented Jun 26, 2018

    Mark, ya, I agree it's most prudent to let sleeping dogs lie.

    In the one "real" complaint we got (bpo-24546) the cause was never determined - but double rounding was ruled out in that specific case, and no _plausible_ cause was identified (short of transient HW error).

    As the years go by, my interest in "doing something" about a bug nobody has apparently encountered in the wild, and remains theoretically possible on an ever-diminishing number of platforms ("32-bit Linux not using faster SSE arithmetic" seems to be the universe), has faded somewhat ;-)

    So I'd close as "won't fix". If I had to do something, I'd add different implementations under an "if this box does double-rounding" import-time test so nobody else has to pay for it.

    But I'm cool with whatever Raymond settles for.

    @tim-one
    Copy link
    Member

    tim-one commented Jun 26, 2018

    [Mark]

    If we do this, can we also persuade Guido to Pronounce that
    Python implementations assume IEEE 754 format and semantics
    for floating-point?

    On its own, I don't think a change to force 53-bit precision _on_ 754 boxes would justify that. That's just changing oddball boxes that already implement 754 to do so in the way most other 754 boxes currently do it (and in the way all future 754 platforms are expected to do it).

    To go beyond that seems to require addressing bigger questions, like:

    • _Are_ there currently, or expected to be, non-754 Python platforms? There have been in the past.

    • What does "754 semantics" actually mean? For example, as these bug reports point out, 754 is silent about whether the possibility of double rounding - even in primitive atomic operations - is "a bug" or "a feature" :-( Would we insist that Python may not work on platforms that flush denorms to 0? Etc.

    • What, concretely, would change in CPython if the Pronouncement were made? As I recall, we inherit most fp behaviors from the platform C compiler and libraries. That is, how would the Pronouncement benefit CPython? For example, do we have significant blobs of code that are trying to cater to non-754 boxes?

    Not that I'm asking for answers here. If there's something to be pursued here, I'd suggest it's better as a topic in python-ideas or python-dev.

    @vstinner
    Copy link
    Member

    The error goes avoid if we stop using floating point number to generate random integers. Can't we drop/deprecate support of RNG only providing getrandom() but not getrandbits()? I'm talking about the master branch.

    Things are simpler when you only manipulate integers...

    @tim-one
    Copy link
    Member

    tim-one commented Jun 26, 2018

    Victor, look at Raymond's patch. In Python 3, randrange() and friends already use the all-integer getrandbits(). He's changing three other lines, where some variant of int(random() * someinteger) is being used in an inner loop for speed.

    Presumably the

            return int(random() * n)
    

    line in the n >= maxsize branch of _randbelow_without_getrandbits() should also get clamped.

    @vstinner
    Copy link
    Member

    Let me see random_double_round.diff.

    In master:

        def shuffle(self, x, random=None):
            """Shuffle list x in place, and return None.
        Optional argument random is a 0-argument function returning a
        random float in [0.0, 1.0); if it is the default None, the
        standard random.random will be used.
    
        """
    

    This method has a weird API. What is the point of passing a random function, whereas shuffle() is already a method of an object which generates random numbers?! The bug only affects the case when random is set. I proposed to deprecate this argument and remove it later.

                return [population[_min(_int(random() * total), total)]
                        for i in range(k)]
    

    Why not using _randbelow() here? For speed according to:

    some variant of int(random() * someinteger) is being used in an inner loop for speed.

    Why not optimizing _randbelow() in this case? Like implementing it in C?

    @tim-one
    Copy link
    Member

    tim-one commented Jun 27, 2018

    [Victor]

    This method [shuffle()] has a weird API. What is
    the point of passing a random function,
    ... I proposed to deprecate this argument and remove it later.

    I don't care here. This is a bug report. Making backward-incompatible API changes does nothing to repair existing programs.

    ...
    return [population[_min(_int(random() * total), total)]
    for i in range(k)]

    Why not using _randbelow() here? For speed

    Yup.

    Why not optimizing _randbelow() in this case? Like
    implementing it in C?

    This is a bug report, not an invitation to redesign APIs and complicate implementations in non-trivial ways. If you're keen on that, I suggest opening a different issue and writing the patch.

    I'd personally oppose the API change (pointless thrashing), but it's possible the speed benefit of C acceleration of _randbelow() would outweigh the considerable downsides of moving code from Python source to C source.

    But none of that is needed to "fix the bug" at hand, if - indeed - people still think it's worth fixing at all. In the years that have passed since this started, I no longer do.

    @rhettinger
    Copy link
    Contributor

    I concur with Tim. Marking the original bug as "won't fix" for the reasons that he listed.

    In a separate PR, I'll modify the last line random.choices() to be "return [population[bisect(cum_weights, random() * total, 0, hi)] for i in range(k)]". That will make the function more robust (handling the exotic case with subnormal weights).

    @rhettinger
    Copy link
    Contributor

    New changeset ddf7171 by Raymond Hettinger in branch 'master':
    bpo-24567: Random subnormal.diff (bpo-7954)
    ddf7171

    @rhettinger
    Copy link
    Contributor

    New changeset 0eaf7b9 by Raymond Hettinger (Miss Islington (bot)) in branch '3.7':
    bpo-24567: Random subnormal.diff (GH-7954) (GH-7955)
    0eaf7b9

    @rhettinger
    Copy link
    Contributor

    New changeset acda5ea by Raymond Hettinger (Miss Islington (bot)) in branch '3.6':
    bpo-24567: Random subnormal.diff (GH-7954) (GH-7956)
    acda5ea

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants