Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

True division of integers could be more accurate #46136

Closed
mdickinson opened this issue Jan 12, 2008 · 12 comments
Closed

True division of integers could be more accurate #46136

mdickinson opened this issue Jan 12, 2008 · 12 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@mdickinson
Copy link
Member

BPO 1811
Nosy @tim-one, @terryjreedy, @mdickinson, @tiran
Files
  • int_truediv.py: More accurate truediv for integers: example code
  • long_division.patch: patch for Objects/longobject.c
  • long_division2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/mdickinson'
    closed_at = <Date 2009-12-27.15:10:11.018>
    created_at = <Date 2008-01-12.05:20:28.608>
    labels = ['interpreter-core', 'type-bug']
    title = 'True division of integers could be more accurate'
    updated_at = <Date 2009-12-27.15:10:11.016>
    user = 'https://github.com/mdickinson'

    bugs.python.org fields:

    activity = <Date 2009-12-27.15:10:11.016>
    actor = 'mark.dickinson'
    assignee = 'mark.dickinson'
    closed = True
    closed_date = <Date 2009-12-27.15:10:11.018>
    closer = 'mark.dickinson'
    components = ['Interpreter Core']
    creation = <Date 2008-01-12.05:20:28.608>
    creator = 'mark.dickinson'
    dependencies = []
    files = ['9137', '10363', '15667']
    hgrepos = []
    issue_num = 1811
    keywords = ['patch']
    message_count = 12.0
    messages = ['59798', '59824', '59830', '59838', '59839', '59988', '64034', '67031', '91020', '96834', '96840', '96910']
    nosy_count = 4.0
    nosy_names = ['tim.peters', 'terry.reedy', 'mark.dickinson', 'christian.heimes']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue1811'
    versions = ['Python 2.7', 'Python 3.2']

    @mdickinson
    Copy link
    Member Author

    Division of two longs can produce results that are needlessly
    inaccurate:

    >>> from __future__ import division
    >>> 10**40/10**39
    10.000000000000002

    The correct result is, of course, 10.0, which is exactly representable
    as a float.

    The attached snippet of Python code shows that things don't have to be
    this way.

    @mdickinson mdickinson added type-bug An unexpected behavior, bug, or error interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Jan 12, 2008
    @tiran
    Copy link
    Member

    tiran commented Jan 12, 2008

    How fast is your algorithm compared to the current algorithm and where
    starts the break even zone? Most users use only small integers so it
    might be a good idea to use a simpler algorithm for longs with Py_SIZE()
    == 1. This is important for py3k.

    @mdickinson
    Copy link
    Member Author

    It would be easy and safe to just use a/b = float(a)/float(b) if both a and b are less
    than 2**53 (assuming IEEE doubles). Then there wouldn't be any loss of speed for
    small integers.

    For large integers the algorithm I posted should run in time linear in the number of
    digits of max(a, b), at least in the worst case (though with appropriate optimizations
    it could be made much faster for 'random' inputs). The current algorithm has
    essentially O(1) runtime.

    To get proper timings I'd have to code this up properly. I'll do this, unless there's
    a consensus that it would be a waste of time :)

    @tiran
    Copy link
    Member

    tiran commented Jan 12, 2008

    Mark Dickinson wrote:

    To get proper timings I'd have to code this up properly. I'll do this, unless there's
    a consensus that it would be a waste of time :)

    Why don't you ask Tim? He is the right person for the discussion. I'm
    only an interested amateur mathematician.

    Christian

    @mdickinson
    Copy link
    Member Author

    Tim: is this worth fixing?

    @mdickinson
    Copy link
    Member Author

    A related problem is that float(n) isn't always correctly rounded for an integer
    n. A contrived example:

    >>> n = 2**68 + 2**16 - 1
    >>> float(n)
    2.9514790517935283e+20

    Here the difference between float(n) and the true value of n is around 0.99998
    ulps; a correctly rounded float() would have error at most 0.5 ulps.

    I don't regard this as terribly serious: from looking at the code, I *think*
    it's always true that the error is strictly less than 1 ulp, which is just
    enough to guarantee that float(n) == n whenever n is exactly representable as a
    float.

    In contrast, the division of two integers can produce results that are up to 3.5
    ulps out from the true value. This is, in my opinion, a worryingly large error
    for a simple calculation.

    @terryjreedy
    Copy link
    Member

    To my mind, the inaccurate result is a bug that should be fixed.

    Note: (3.0a3)
    >>> 10e40/10e39
    10.0

    The rationale for the division change is that (as far as reasonably
    possible) arithmetic operations with same values should give same result
    regardless of types.

    I have not looked at either algorithm, but if long/long started by
    finding divmod(), but added fractional value when remainer is non-zero
    instead of tossing it, exact quotients would easily be exact (unless too
    large).

    @mdickinson
    Copy link
    Member Author

    Here's a patch that fixes the rounding of integer division. It includes a
    fast path for the case where both integers are small (less than about
    3.5e12).

    @mdickinson
    Copy link
    Member Author

    An example of a case that's almost 3.5 ulps out (Python 2.6):

    Python 2.6.2 (r262:71600, Jul  8 2009, 09:56:31) 
    [GCC 4.0.1 (Apple Inc. build 5490)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from __future__ import division
    >>> m = 295147931372582273023
    >>> n = 295147932265116303360
    >>> m/n
    0.99999999697597697

    The correctly rounded result would be the float given by
    0.9999999969759773.

    @mdickinson
    Copy link
    Member Author

    Stealing this from Tim, with the intention of acting on it in the next
    couple of weeks.

    @mdickinson mdickinson assigned mdickinson and unassigned tim-one Dec 23, 2009
    @mdickinson
    Copy link
    Member Author

    Here's an updated patch, against py3k. On my machine, a/b is a touch faster with this patch when abs(a),
    abs(b) are smaller than 1e15 or so; it's (inevitably) slower than the existing implementation for larger a
    and b. For 'random' a and b, average running time is proportional to the size of b, and is independent of
    the size of a; worst-case running time (which occurs when a has many trailing zero bits) grows as
    max(size(a), size(b)).

    Changing versions to 2.7 and 3.2, but I'm mostly aiming for 3.2. It may not be worth backporting to 2.7,
    given the extra effort required to deal correctly with ints as well as with longs.

    @mdickinson
    Copy link
    Member Author

    Fixed in r77062 (trunk), r77063 (py3k).

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants