Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Private _nth_root function loses accuracy #71948

Open
stevendaprano opened this issue Aug 14, 2016 · 51 comments
Open

Private _nth_root function loses accuracy #71948

stevendaprano opened this issue Aug 14, 2016 · 51 comments
Assignees
Labels
3.7 type-bug An unexpected behavior, bug, or error

Comments

@stevendaprano
Copy link
Member

stevendaprano commented Aug 14, 2016

BPO 27761
Nosy @tim-one, @rhettinger, @mdickinson, @vstinner, @ned-deily, @stevendaprano, @serhiy-storchaka, @wm75
Files
  • roots.py: random-ish test driver
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/stevendaprano'
    closed_at = None
    created_at = <Date 2016-08-14.02:26:23.504>
    labels = ['type-bug', '3.7']
    title = 'Private _nth_root function loses accuracy'
    updated_at = <Date 2016-11-23.15:34:51.657>
    user = 'https://github.com/stevendaprano'

    bugs.python.org fields:

    activity = <Date 2016-11-23.15:34:51.657>
    actor = 'wolma'
    assignee = 'steven.daprano'
    closed = False
    closed_date = None
    closer = None
    components = []
    creation = <Date 2016-08-14.02:26:23.504>
    creator = 'steven.daprano'
    dependencies = []
    files = ['44339']
    hgrepos = []
    issue_num = 27761
    keywords = []
    message_count = 51.0
    messages = ['272638', '272639', '272643', '272649', '272658', '272663', '272664', '272665', '272667', '272668', '272670', '272672', '272713', '272782', '272784', '272806', '272871', '272902', '272911', '272912', '272914', '272929', '272948', '273753', '273759', '273760', '273804', '273822', '273826', '273827', '273829', '273833', '273834', '273843', '273844', '273849', '274190', '274196', '275930', '276732', '276735', '276739', '276741', '276742', '276743', '276744', '276754', '276756', '276772', '276865', '276982']
    nosy_count = 8.0
    nosy_names = ['tim.peters', 'rhettinger', 'mark.dickinson', 'vstinner', 'ned.deily', 'steven.daprano', 'serhiy.storchaka', 'wolma']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'needs patch'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue27761'
    versions = ['Python 3.7']

    @stevendaprano
    Copy link
    Member Author

    stevendaprano commented Aug 14, 2016

    First reported by Martin Panter here:

    http://bugs.python.org/issue27181#msg272488

    I'm afraid I don't know enough about PowerPC to suggest a fix other than weakening the test on that platform.

    @stevendaprano stevendaprano self-assigned this Aug 14, 2016
    @stevendaprano stevendaprano added the type-bug An unexpected behavior, bug, or error label Aug 14, 2016
    @stevendaprano
    Copy link
    Member Author

    stevendaprano commented Aug 14, 2016

    Tests fail on a Power PC buildbot:

    http://buildbot.python.org/all/builders/PPC64LE%20Fedora%203.x/builds/1476/steps/test/logs/stdio
    ======================================================================
    FAIL: testExactPowers (test.test_statistics.Test_Nth_Root) (i=29, n=11)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/shager/cpython-buildarea/3.x.edelsohn-fedora-ppc64le/build/Lib/test/test_statistics.py", line 1216, in testExactPowers
        self.assertEqual(self.nroot(x, n), i)
    AssertionError: 29.000000000000004 != 29

    ======================================================================
    FAIL: testExactPowersNegatives (test.test_statistics.Test_Nth_Root) (i=-29, n=11)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/shager/cpython-buildarea/3.x.edelsohn-fedora-ppc64le/build/Lib/test/test_statistics.py", line 1228, in testExactPowersNegatives
        self.assertEqual(self.nroot(x, n), i)
    AssertionError: -29.000000000000004 != -29

    @rhettinger
    Copy link
    Contributor

    rhettinger commented Aug 14, 2016

    You can do no better than to consult with Uncle Timmy.

    @vadmium
    Copy link
    Member

    vadmium commented Aug 14, 2016

    The problem is more widespread than just Power PC. The same failures (+/-29) are also seen on the three s390x buildbots. There are multiple failures of testExactPowers(Negatives) on
    http://buildbot.python.org/all/builders/x86%20Tiger%203.x/builds/11140/steps/test/logs/stdio

    The first is
    AssertionError: 9.000000000000002 != 9

    An slightly different failure happens on AMD64 Free BSD and Open Indiana buildbots:
    http://buildbot.python.org/all/builders/AMD64%20OpenIndiana%203.x/builds/11077/steps/test/logs/stdio
    ======================================================================
    FAIL: testFraction (test.test_statistics.Test_Nth_Root)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Lib/test/test_statistics.py", line 1247, in testFraction
        self.assertEqual(self.nroot(x**12, 12), float(x))
    AssertionError: 1.1866666666666665 != 1.1866666666666668

    @vadmium vadmium changed the title Private _nth_root function loses accuracy on PowerPC Private _nth_root function loses accuracy Aug 14, 2016
    @mdickinson
    Copy link
    Member

    mdickinson commented Aug 14, 2016

    Steven: can you explain why you think your code *should* be giving exact results for exact powers? Do you have an error analysis that says that should be the case?

    One issue here is that libm pow functions vary hugely in quality, so any algorithm that depends on ** or math.pow is going to be hard to prove anything about.

    I think the tests should simply be weakened: it's unreasonable to expect perfect results in this case.

    @stevendaprano
    Copy link
    Member Author

    stevendaprano commented Aug 14, 2016

    On Sun, Aug 14, 2016 at 08:29:39AM +0000, Mark Dickinson wrote:

    Steven: can you explain why you think your code *should* be giving
    exact results for exact powers? Do you have an error analysis that
    says that should be the case?

    No error analysis, only experimental results.

    One issue here is that libm pow functions vary hugely in quality, so
    any algorithm that depends on ** or math.pow is going to be hard to
    prove anything about.

    pow() is just the starting point. An earlier version of the function
    started with an initial guess of x for the nth root of x, but it was
    very slow, and sometimes failed to converge in any reasonable time. By
    swapping to an initial guess calculated with pow(), I cut the number of
    iterations down to a much smaller number.

    Because I'm not to worried about being super-fast, the nth root function
    goes through the following steps:

    • use y = pow(x, 1/n) for an initial guess;
    • if y**n == x, return y;
    • improve the guess with the "Nth root algorithm";
    • if that doesn't converge after a while, return y;
    • finally, compare the epsilons abs(y**n - x) for the initial guess
      and improved version, returning whichever gives the smaller epsilon.

    So I'm confident that nth_root() should never be worse than pow().

    I think the tests should simply be weakened: it's unreasonable to
    expect perfect results in this case.

    Okay. I'll weaken the tests.

    @mdickinson
    Copy link
    Member

    mdickinson commented Aug 14, 2016

    • if y**n == x, return y;

    Here's the problem, though: you're using the libm pow again in this test, so the result of this being True doesn't give tight information about how close y is to the nth root of x (and the test result may well differ from platform to platform).

    @mdickinson
    Copy link
    Member

    mdickinson commented Aug 14, 2016

    • finally, compare the epsilons abs(y**n - x) for the initial guess
      and improved version, returning whichever gives the smaller epsilon.

    So I'm confident that nth_root() should never be worse than pow().

    Same deal here: those aren't the actual errors; they're approximations to the errors, since the computations of the epsilons depends on (a) the usual floating-point rounding, and more significantly (b) the accuracy of the y**n computation. It's entirely possible that the value giving the smaller epsilon is actually the worse of the two approximations.

    @stevendaprano
    Copy link
    Member Author

    stevendaprano commented Aug 14, 2016

    On Sun, Aug 14, 2016 at 10:52:42AM +0000, Mark Dickinson wrote:

    Same deal here: those aren't the actual errors; they're approximations
    to the errors, since the computations of the epsilons depends on (a)
    the usual floating-point rounding, and more significantly (b) the
    accuracy of the y**n computation. It's entirely possible that the
    value giving the smaller epsilon is actually the worse of the two
    approximations.

    Let's call the two calculated roots y and w, where y is generated by
    pow() and w is the "improved" version, and the actual true
    mathematical root is r (which might fall between floats).

    You're saying that it might turn out that abs(y**n - x) < abs(w**n - x),
    but abs(w - r) < abs(y - r), so I return the wrong calculated root.

    I can see how that could happen, but not what to do about it. I'm
    tempted to say "that platform's pow() is weird, and the best I can do
    is return whatever it returns". That way you're no worse than if I just
    used pow() and didn't try to improve the result at all.

    I think this would be a problem if I wanted to make nth_root() a public
    function with a guarantee that it will be better than pow(), but at
    worst here it just slows the geometric mean down a bit compared to a
    naive pow(product, 1/n).

    What else can I do? Since I'm only dealing with integer powers, should I
    try using my own ipow(y, n) for testing?

    def ipow(y, n):
        x = 1.0
        while n > 0:
            if n % 2 == 1:
                x *= y
            n //= 2
            y *= y
        return x

    @mdickinson
    Copy link
    Member

    mdickinson commented Aug 14, 2016

    What else can I do? Since I'm only dealing with integer powers, should I
    try using my own ipow(y, n) for testing?

    I'd expect that a square-and-multiply-style pow would be less accurate than math.pow, in general, simply because of the number of floating-point operations involved.

    But I don't think there's a real problem here so long as you don't have an expectation of getting super-accurate (e.g., correctly rounded or faithfully rounded) results; testing that results are accurate to within 10 ulps or so is probably good enough.

    If you want an actual correctly-rounded nth root just for testing purposes, it's certainly possible to construct one with integer arithmetic, but an easier alternative might simply be to use MPFR's "mpfr_root" function (as wrapped by gmpy2) to generate test values.

    @stevendaprano
    Copy link
    Member Author

    stevendaprano commented Aug 14, 2016

    On Sun, Aug 14, 2016 at 12:05:37PM +0000, Mark Dickinson wrote:

    But I don't think there's a real problem here so long as you don't
    have an expectation of getting super-accurate (e.g., correctly rounded
    or faithfully rounded) results; testing that results are accurate to
    within 10 ulps or so is probably good enough.

    Thanks for the advice Mark.

    This has certainly showed my naivety when it comes to floating point :-(
    I thought IEEE 754 was supposed to put an end to these sorts of cross-
    platform differences. If this was a trig function, I wouldn't have been
    so surprised, but I was not expecting pow() to differ like this. Once I
    started getting exact results on my machine, I thought everyone would
    see the same thing.

    @mdickinson
    Copy link
    Member

    mdickinson commented Aug 14, 2016

    I thought IEEE 754 was supposed to put an end to these sorts of cross-platform differences.

    Maybe one day. (Though I'm not holding my breath.)

    IEEE 754-2008 does indeed *recommend* (but not actually *require*) correctly rounded implementations of all the usual transcendental and power functions (including nth root). And if everyone followed those recommendations, then yes, the correct rounding would imply that those cross-platform differences would be a thing of the past.

    I don't see that happening any time soon, though: writing an implementation of log (for example) that's provably correctly rounded for all possible inputs is much harder than writing an implementation that's simply "almost always" correctly rounded (e.g., provably accurate to 0.53 ulps instead of 0.5 ulps), and the latter is going to be good enough in the vast majority of contexts. Going from "almost always correctly rounded" to "correctly rounded" isn't going to have much effect on the overall accuracy of a multistep numerical algorithm, so all you're buying (apart from a likely degraded running time) is the cross-platform reproducibility, and it's not really clear how useful cross-platform reproducibility is as a goal in its own right. Overall, it doesn't seem like a good tradeoff.

    The pow function is especially hard to make correctly rounded, not least because of its two inputs. At least for single-input transcendental functions there's some (perhaps remote) hope of doing exhaustive testing on the 18 million million million possible double-precision inputs; for a function taking two inputs that's completely infeasible.

    You can take a look at CRlibm [1] for efforts in the direction of a correctly-rounded libm; AFAIK it hasn't had wide adoption, and its pow function is still work in progress.

    [1] http://lipforge.ens-lyon.fr/www/crlibm/

    @tim-one
    Copy link
    Member

    tim-one commented Aug 15, 2016

    A meta-note: one iteration of Newton's method generally, roughly speaking, doubles the number of "good bits" in the initial approximation.

    For floating n'th root, it would an astonishingly bad libm pow() that didn't get more than half the leading bits in pow(x, 1/n) right.

    So a single Newton iteration, if carried out in infinite precision, should be enough to get all the bits "right" (meaning not significantly worse than 0.5 ulp error when converted back to a float).

    So if you find yourself doing more than one Newton iteration, you're just fighting floating-point noise. It's an illusion - nothing is actually getting better, except perhaps by accident.

    Which suggests one approach for doubles (in C - same as a Python float): get the pow() approximation. Feed it to a fractions.Fraction() constructor. Do one Newton iteration using the Fraction type. Then use float() to convert the result to a float.

    I believe that's the best you can possibly do without doing real work ;-)

    @mdickinson
    Copy link
    Member

    mdickinson commented Aug 15, 2016

    Just for fun, here's a recipe for a correctly-rounded nth root operation for positive finite floats. I'm not suggesting using this in the business logic: it's likely way too slow (especially for large n), but it may have a use in the tests. The logic for the Newton iteration in floor_nroot follows that outlined in this Stack Overflow answer: http://stackoverflow.com/a/35276426

    from math import frexp, ldexp
    
    def floor_nroot(x, n):
        """ For positive integers x, n, return the floor of the nth root of x. """
    
        # Initial guess: here we use the smallest power of 2 that exceeds the nth
        # root. But any value greater than or equal to the target result will do.
        a = 1 << -(-x.bit_length() // n)
        while True:
            d = x // a**(n-1)
            if a <= d:
                return a
            a = ((n-1) * a + d) // n
    
    def nroot(x, n):
        """
        Correctly-rounded nth root (n >= 2) of x, for a finite positive float x.
        """
        if not (x > 0 and n >= 2):
            raise ValueError("x should be positive; n should be at least 2")
    m, e = frexp(x)
    rootm = floor_nroot(int(m * 2**53) << (53*n + (e-1)%n - 52), n)
    return ldexp(rootm + rootm % 2, (e-1)//n - 53)
    

    @mdickinson
    Copy link
    Member

    mdickinson commented Aug 15, 2016

    for positive finite floats

    ... assuming IEEE 754 binary64 format, of course.

    @ned-deily
    Copy link
    Member

    ned-deily commented Aug 15, 2016

    Since test_NTh_Root is failing on so many different buildbot platforms and in different ways, I'm marking this as a "release blocker". It would be very helpful to get this resolved prior to 3.6.0b1 next month. Suggest checking the test results outputs of the various 3x build slaves to ensure all the cases have been addressed.

    https://www.python.org/dev/buildbot/

    @tim-one
    Copy link
    Member

    tim-one commented Aug 16, 2016

    Thanks, Mark! I had worked out the floor_nroot algorithm many years ago, but missed the connection to the AM-GM inequality. As a result, instead of being easy, proving correctness was a pain that stretched over pages. Delighted to see how obvious it can be!

    Just FYI, this was my "cheap" suggestion:

        from fractions import Fraction
    
        def rootn(x, n):
            g = Fraction(x**(1.0/n))
            g = ((n-1)*g + Fraction(x)/g**(n-1)) / n
            return float(g)

    For fun, I ran that and your nroot over several million random cases with x of the form

    ldexp(random(), randrange(-500, 501))
    

    and n in randrange(2, 501).

    They always delivered the same result, which differed from x**(1/n) about a quarter of the time. On average nroot took about 3.7 times longer.

    But reducing the n range to randrange(1, 100), over half the time the (common) result differed from x**(1/n), and nroot was significantly faster.

    Moral of the story: none I can see ;-)

    @tim-one
    Copy link
    Member

    tim-one commented Aug 17, 2016

    Noting that floor_nroot can be sped a lot by giving it a better starting guess. In the context of nroot, the latter could pass int(x**(1/n)) as an excellent starting guess. In the absence of any help, this version figures that out on its own; an optional a=None argument (to supply a starting guess, if desired) would make sense.

        def floor_nroot(x, n):
            """For positive integers x, n, return the floor of the nth root of x."""
    
            bl = x.bit_length()
            if bl <= 1000:
                a = int(float(x) ** (1.0/n))
            else:
                xhi = x >> (bl - 53)
                # x ~= xhi * 2**(bl-53)
                # x**(1/n) ~= xhi**(1/n) * 2**((bl-53)/n)
                # Let(bl-53)/n = w+f where w is the integer part.
                # 2**(w+f) is then 2**w * 2**f, where 2**w is a shift.
                a = xhi ** (1.0 / n)
                t = (bl - 53) / n
                w = int(t)
                a *= 2.0 ** (t - w)
                m, e = frexp(a)
                a = int(m * 2**53)
                e += w - 53
                if e >= 0:
                    a <<= e
                else:
                    a >>= -e
            # A guess of 1 can be horribly slow, since then the next
            # guess is approximately x/n.  So force the first guess to
            # be at least 2.  If that's too large, fine, it will be
            # cut down to 1 right away.
            a = max(a, 2)
            a = ((n-1)*a + x // a**(n-1)) // n
            while True:
                d = x // a**(n-1)
                if a <= d:
                    return a
                a = ((n-1) * a + d) // n

    I haven't yet found a case in the context of `nroot` where it doesn't get out on the first `if a <= d:` test. Of course you can provoke as many iterations as you like by passing `x` with a lot more than 53 "significant" bits (the large `x` passed by `nroot` are mostly long strings of trailing 0 bits).

    @vstinner
    Copy link
    Member

    vstinner commented Aug 17, 2016

    "Just for fun, here's a recipe for a correctly-rounded nth root operation for positive finite floats. I'm not suggesting using this in the business logic: it's likely way too slow (especially for large n), but it may have a use in the tests."

    I don't know well the statistics module, but it looks like it doesn't use directly floats, more a somehow higher level type of numbers to try to reduce rounding errors. For me, the math module is a thin wrapper on C library math functions, except of a few functions specific to Python like math.factorial. But the statistics module is at a higher level. Maybe we should draw a line between accuracy and speed. For example, explain in the statistics module that the module is designed for accuracy?

    Sorry if I completly misunderstood the design of the statistics module :-)

    @vstinner
    Copy link
    Member

    vstinner commented Aug 17, 2016

    About tradeoff, would it be possible to add an option to choose the quality of the accuracy? For example, a flag to choose between "fast nth root" or "accurate nth root".

    Python already has two kind of numbers: Decimal and float. Maybe the "flag" should be the type of input numbers, Decimal or float?

    I'm asking because I looked at http://lipforge.ens-lyon.fr/www/crlibm/ a few years ago, and such library is designed for accuracy, not for speed. crlibm is based on the scslib library which compose a number using multiple float numbers to get a better precision. To get correct rounding, crlibm requires more loop iterations and more floating point number operations, so as expected, it is slower.

    FYI my old blog article (in french!): http://www.haypocalc.com/blog/index.php/2009/02/20/188-bibliotheque-mathematique-arrondi-exact-crlibm

    @vstinner
    Copy link
    Member

    vstinner commented Aug 17, 2016

    Just to share my little experience with rounding numbers.

    Last years, I worked on a API to convert timestamps between float and integers, the private "PyTime C API":

    https://haypo.github.io/pytime.html

    At the beginning, I used various floatting point numbers which looks fine in decimal. But quickly, I got rounding issues on various buildbots. After many years fighting against compilers and trying to write a complete test suite, I decided to only use numbers which can be stored exactly in IEEE 754:

            if use_float:
                # numbers with an exact representation in IEEE 754 (base 2)
                for pow2 in (3, 7, 10, 15):
                    ns = 2.0 ** (-pow2)
                    ns_timestamps.extend((-ns, ns))

    If you are curious, look at CPyTimeTestCase, TestCPyTime and TestOldPyTime classes in Lib/test/test_time.py.

    At the end, I decided to reimplement each conversion function in pure Python using decimal.Decimal and compare the result with the C implementation. It makes the unit tests shorter and simpler.

    @mdickinson
    Copy link
    Member

    mdickinson commented Aug 17, 2016

    Victor: by "way too slow", I really *do* mean way too slow. :-)

    floor_nroot does arithmetic with integers of bit-length approximately 54*n. For small n, that's fine, but if someone tried to take the geometric mean of a list of 50000 values (which it seems to me should be a perfectly reasonable use-case), floor_nroot would then be trying to do computations with multi-million-bit integers. The a**(n-1) operation in particular would be a killer for large n.

    @vstinner
    Copy link
    Member

    vstinner commented Aug 17, 2016

    floor_nroot does arithmetic with integers of bit-length approximately 54*n.

    Oh, I see.

    But maybe the Decimal is fast enough for such giant numbers, since we
    can control the precision?

    @tim-one
    Copy link
    Member

    tim-one commented Aug 27, 2016

    I don't care about correct rounding here, but it is, e.g., a bit embarrassing that

    >>> 64**(1/3)
    3.9999999999999996

    Which you may or may not see on your box, depending on your platform pow(), but which you "should" see: 1/3 is not a third, it's a binary approximation that's a little less than 1/3, so 64 to that power should be less than 4.

    There are two practical problems with nroot, both already noted: (1) Its very cheap initial guess comes at the cost of requiring multiple iterations to get a guess good to 54 bits. That can be repaired, and already showed how. (2) The larger n, the more bits it requires. As Mark noted, at n=50000 we're doing multi-million bit arithmetic. That's inherent - and slow.

    My simple fractions.Fraction function suffers the second problem too, but is even slower for large n.

    But if you give up on guaranteed correct rounding, this is really quite easy: as I noted before, a single Newton iteration approximately doubles the number of "good bits", so _any_ way of getting the effect of extra precision in a single iteration will deliver a result good enough for all practical purposes. Note that an error bound of strictly less than 1 ulp is good enough to guarantee that the exact result is delivered whenever the exact result is representable.

    Here's one way, exploiting that the decimal module has adjustable precision:

        import decimal
        def rootn(x, n):
            c = decimal.getcontext()
            oldprec = c.prec
            try:
                c.prec = 40
                g = decimal.Decimal(x**(1.0/n))
                g = ((n-1)*g + decimal.Decimal(x)/g**(n-1)) / n
                return float(g)
            finally:
                c.prec = oldprec

    Now memory use remains trivial regardless of n. I haven't yet found a case where the result differs from the correctly-rounded nroot(), but didn't try very hard. And, e.g., even at the relatively modest n=5000, it's over 500 times faster than nroot (as modified to use a scaled x**(1/n) as an excellent starting guess, so that it too always gets out after its first Newton step). At n=50000, it's over 15000 times faster.

    For n less than about 70, though, nroot is faster. It's plenty fast for me regardless.

    Note: just in case there's confusion about this, rootn(64.0, 3) returns exactly 4.0 not necessarily because it's "more accurate" than pow() on this box, but because it sees the exact 3 instead of the approximation to 1/3 pow() sees. That approximation is perfectly usable to get a starting guess (64**(1/3)) good to nearly 53 bits. The real improvement comes from the Newton step using exactly 3.

    That said, I have seen rare cases where this (and nroot) give a better result than the platform pow() even for n=2 (where 1/n is exactly representable).

    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented Aug 27, 2016

    What if use pow() with exactly represented degree in approximating step?

        def rootn(x, n):
            g = x**(1.0/n)
            m = 1 << (n-1).bit_length()
            if n != m:
                g = (x*g**(m-n))**(1.0/m)
            return g

    Maybe it needs several iterations, because it converges slower than Newton approximation. But every step can be faster.

    @tim-one
    Copy link
    Member

    tim-one commented Aug 27, 2016

    Serhiy, I don't know what you're thinking there, and the code doesn't make much sense to me. For example, consider n=2. Then m == n, so you accept the initial g = x**(1.0/n) guess. But, as I said, there are cases where that doesn't give the best result, while the other algorithms do. For example, on this box:

    >>> serhiy(7.073208563506701e+46, 2)
    2.6595504438733062e+23
    >>> pow(7.073208563506701e+46, 0.5)  # exactly the same as your code
    2.6595504438733062e+23
    
    >>> nroot(7.073208563506701e+46, 2)  # best possible result
    2.6595504438733066e+23
    >>> import math
    >>> math.sqrt(7.073208563506701e+46) # also achieved by sqrt()
    2.6595504438733066e+23

    On general principle, you can't expect to do better than plain pow() unless you do _something_ that gets the effect of using more than 53 mantissa bits - unless the platform pow() is truly horrible. Using pow() multiple times is useless; doing Newton steps _in_ native float (C double) precision is useless; even some form of "binary search" is useless because just computing g**n (in native precision) suffers rounding errors worse than pow() suffered to begin with.

    So long as you stick to native precision, you're fighting rounding errors at least as bad as the initial rounding errors you're trying to correct. There's no a priori reason to even hope iterating will converge.

    @tim-one
    Copy link
    Member

    tim-one commented Aug 28, 2016

    Adding one more version of the last code, faster by cutting the number of extra digits used, and by playing "the usual" low-level CPython speed tricks.

    I don't claim it's always correctly rounded - although I haven't found a specific case where it isn't. I do claim it will return the exact result whenever the exact result is representable as a float.

    I also claim it's "fast enough" - this version does on the high end of 50 to 100 thousand roots per second on my box, pretty much independent of n.

        import decimal
        c = decimal.DefaultContext.copy()
        c.prec = 25
        def rootn(x, n,
                  D=decimal.Decimal,
                  pow=c.power,
                  mul=c.multiply,
                  add=c.add,
                  div=c.divide):
            g = D(x**(1.0/n))
            n1 = D(n-1)
            g = div(add(mul(n1, g),
                        div(D(x), pow(g, n1))),
                    n)
            return float(g)
        del decimal, c

    @vstinner
    Copy link
    Member

    vstinner commented Aug 28, 2016

    Hum, "g = D(x**(1.0/n))" computes a float and then convert it to a decimal,
    right?

    Can we please add comments to explain why you start with float, compute
    decimal and finish with float?

    And maybe also document the algorithm?

    @tim-one
    Copy link
    Member

    tim-one commented Aug 28, 2016

    Victor, happy to add comments, but only if there's sufficient interest in actually using this. In the context of this issue report, it's really only important that Mark understands it, and he already does ;-)

    For example, it starts with float ** because that's by far the fastest way to get a very good approximation. Then it switches to Decimal to perform a Newton step using greater-than-float precision, which is necessary to absorb rounding errors in intermediate steps. Then it rounds back to float, because it has to - it's a "float in, float out" function. It's for the same reason, e.g., that Mark's nroot converts floats to potentially gigantic integers for its Newton step(s), and my other fractions.Fraction function converts floats to potentially gigantic rationals. "Potentially gigantic" may be necessary to guarantee always-correct rounding of the final result, but isn't necessary to guarantee < 1 ulp error in the final result.

    @stevendaprano
    Copy link
    Member Author

    stevendaprano commented Aug 28, 2016

    This has been really eye-opening, and I just wanted to drop you a note
    that I am watching this thread carefully. My first priority is to get
    the tests all passing before beta 1 on 2016-09-12, even if (as seems
    likely) that means weakening the tests, and then come back and see if we
    can tighten it up again later.

    I haven't checked it in yet, but I've already managed to simplify the
    nth_root code by taking Tim's advice that more than one iteration of
    Newton's method is a waste of time. Thanks!

    Can somebody comment on my reasoning here?

    I start by taking an initial guess for the root by using
    r = pow(x, 1.0/n). Then I test if r**n == x, if it does I conclude that
    r is either the exact root, or as close as representable as a float, and
    just return it without bothering with even one iteration. Sensible?

    Or should I just always run one iteration of Newton and trust that it
    won't make things worse?

    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented Aug 28, 2016

    Wouldn't following implementation be faster?

        import decimal
        c = decimal.DefaultContext.copy()
        c.prec = 25
        def rootn(x, n,
                  D=decimal.Decimal,
                  sub=c.subtract,
                  mul=c.multiply,
                  log=c.ln):
            g = x ** (1.0/n)
            g += float(sub(log(D(x)), mul(log(D(g)), D(n)))) * g / n
            return g
        del decimal, c

    @tim-one
    Copy link
    Member

    tim-one commented Aug 28, 2016

    That's clever, Serhiy! Where did it come from? It's not Newton's method, but it also appears to enjoy quadratic convergence.

    As to speed, why are you asking? You should be able to time it, yes? On my box, it's about 6 times slower than the last code I posted. I assume (but don't know) that's because + - * / have trivial cost compared to power() and ln(), and your code uses two ln() while the last code I posted uses only one power(). Also, my power() has an exact integer exponent, which - for whatever reasons - appears to run much faster than using a non-integer Decimal exponent. It's for the latter reason that I didn't suggest just doing return float(D(x) ** (D(1)/n)) (much slower).

    Or maybe it's "a platform thing". FYI, I'm using the released Python 3.5.2 on 64-bit Win 10 Pro.

    @tim-one
    Copy link
    Member

    tim-one commented Aug 28, 2016

    Steven, you certainly can ;-) check first whether r**n == x, but can you prove r is the best possible result when it's true? Offhand, I can't. I question it because it rarely seems to be true (in well less than 1% of the random-ish test cases I tried). An expensive test that's rarely true tends to make things slower overall rather than faster.

    As to whether a Newton step won't make things worse, that requires seeing the exact code you're using. There are many mathematically equivalent ways to code "a Newton step" that have different numeric behavior. If for some unfathomable (to me) reason you're determined to stick to native precision, then - generally speaking - the numerically best way to do "a correction step" is to code it in the form:

    r += correction  # where `correction` is small compared to `r`
    

    Coding a Newton step here as, e.g., r = ((n-1)*r + x/r**(n-1))/n in native precision would be essentially useless: multiple rounding errors show up in the result's last few bits, but the last few bits are the only ones we're trying to correct.

    When correction is small compared to r in r += correction, the rounding errors in the computation of correction show up in correction's last few bits, which are far removed from r's last few bits (because correction is small compared to r). So that way of writing it may be helpful.

    @vstinner
    Copy link
    Member

    vstinner commented Aug 28, 2016

    Tim Peters: "Victor, happy to add comments, but only if there's
    sufficient interest in actually using this."

    It looks like tests fail on some platforms. I guess that it depends on
    the different factors: quality of the C math library, quality of the
    IEEE 754 hardware implementation, etc.

    I like the idea of providing an accurate and *portable* function
    (_nth_root) thanks to the usage of the decimal module to get a better
    precision.

    If someone considers that the performance of
    statistics.geometric_mean() matters, I suggest to document it. It's
    new feature of Python 3.6, it's not like a regression, so it's ok.

    What do you think?

    @tim-one
    Copy link
    Member

    tim-one commented Aug 28, 2016

    As I said, the last code I posted is "fast enough" - I can't imagine a real application can't live with being able to do "only" tens of thousands of roots per second. A geometric mean is typically an output summary statistic, not a transformation applied billions of times to input data.

    To get a 100% guarantee of 100% portability (although still confined to IEEE 754 format boxes), we can't use the libm pow() at all. Then you can kiss speed goodbye. Mark's nroot is portable in that way, but can take tens of thousands of times longer to compute a root.

    Here's another way, but routinely 100 times slower (than the last code I posted):

        import decimal
        c = decimal.DefaultContext.copy()
        c.prec = 25
    
        def slow(x, n,
                 D=decimal.Decimal,
                 pow=c.power,
                 div=c.divide,    
                 d1=decimal.Decimal(1)):
            return float(pow(D(x), div(d1, n)))
    del decimal, c
    

    Too slow for my tastes, but at least it's obvious ;-)

    @tim-one
    Copy link
    Member

    tim-one commented Aug 29, 2016

    Let's spell one of these out, to better understand why sticking to native precision is inadequate. Here's one way to write the Newton step in "guess + relatively_small_correction" form:

        def plain(x, n):
            g = x**(1.0/n)
            return g - (g - x/g**(n-1))/n

    Alas, it's pretty much useless. That's because when g is a really good guess, g and x/g**(n-1) are, in native precision, almost the same. So the subtraction cancels out almost all the bits, leaving only a bit or two of actual information. For this kind of approach to be helpful in native precision, it generally requires a clever way of rewriting the correction computation that doesn't suffer massive cancellation.

    Example:

    >> x = 5.283415219603294e-14

    and we want the square root. Mark's nroot always gives the best possible result:

    >>> nroot(x, 2)
    2.298568080262861e-07

    In this case, so does sqrt(), and also x**0.5 (on my box - pow() may not on yours):

    >>> sqrt(x)
    2.298568080262861e-07
    >>> x**0.5
    2.298568080262861e-07

    You're going to think it needs "improvement", because the square of that is not x:

    >>> sqrt(x)**2 < x
    True

    Let's see what happens in the plain() function above:

    >> g = x**0.5
    >> temp = x/g**(2-1)

    >>> g.hex()
    '0x1.ed9d1dd7ce57fp-23'
    >>> temp.hex()
    '0x1.ed9d1dd7ce580p-23'

    They differ by just 1 in the very last bit. There's nothing wrong with "the math", but _in native precision_ the subtraction leaves only 1 bit of information.

    Then:

    >>> correction = (g - temp)/2
    >>> correction
    -1.3234889800848443e-23

    is indeed small compared to g, but it only has one "real" bit:

    >>> correction.hex()
    '-0x1.0000000000000p-76'

    Finally,

    >>> g - correction
    2.2985680802628612e-07

    is _worse_ than the guess we started with. Not because of "the math", but because sticking to native precision leaves us with an extremely crude approximation to the truth.

    This can be repaired in a straightforward way by computing the crucial subtraction with greater than native precision. For example,

        import decimal
        c = decimal.DefaultContext.copy()
        c.prec = 25
    
        def blarg(x, n,
                  D=decimal.Decimal,
                  pow=c.power,
                  sub=c.subtract,
                  div=c.divide):
            g = x**(1.0/n)
            Dg = D(g)
            return g - float(sub(Dg, div(D(x), pow(Dg, n-1)))) / n
    del decimal, c
    

    Then the difference is computed as

    Decimal('-2.324439989147273024835272E-23')

    and the correction (convert that to float and divide by 2) as:

    -1.1622199945736365e-23

    The magnitude of that is smaller than of the -1.3234889800848443e-23 we got in native precision, and adding the new correction to g makes no difference: the correction is (correctly!) viewed as being too small (compared with g) to matter.

    So, bottom line: there is no known way of writing the Newton step that won't make things worse at times, so long as it sticks to native precision. Newton iterations in native precision are wonderful when, e.g., you know you have about 20 good bits and want to get about 40 good bits quickly. The last bit or two remain pure noise, unless you can write the correction computation in a way that retains "a bunch" of significant bits. By far the easiest way to do the latter is simply to use more precision.

    @tim-one
    Copy link
    Member

    tim-one commented Sep 2, 2016

    Attched file "roots.py" you can run to get a guess as to how bad pow(x, 1/n) typically is on your box.

    Note that it's usually "pretty darned good" the larger n is. There's a reason for that. For example, when n=1000, all x satisfying 1 <= x < 21000 have a root r satisfying 1 <= r < 2. Mathematically, that's a 1-to-1 function, but in floating point there are only 252 representable r in the lstter range: the larger n, the more of a many-to-1 function the n'th root is in native floating point precision. That makes it much easier to get the best-possible result even by accident ;-) So, e.g., this kind of output is common for "large" n:

    n = 2272
        x**(1/n)
               -1    10
                0   982
                1     8
        with 1 native-precision step
            all correct
        with 1 extended-precision step
            all correct

    That means that, out of 1000 "random-ish" x, x**(1/2272) returned a result 1 ulp smaller than best-possible 10 times, 1 ulp too large 8 times, and best-possible 982 times ("ulp" is relative to the best-possible result). Doing any form of a single "guess + small_correction" Newton step (in either native or extended precision) repaired all the errors.

    Things look very different for small n. Like:

    n = 2
        x**(1/n)
               -1     1
                0   997
                1     2
        with 1 native-precision step
               -1   117
                0   772
                1   111
        with 1 extended-precision step
            all correct

    1/2 is exactly representable, so errors are few. But doing a native-precsion "correction" made results significantly worse.

    n=3 is more interesting, because 1/3 is not exactly representable:
    
    n = 3
        x**(1/n)
              -55     1
              -54     2
              -53     2
              -52     1
              -51     3
              -50     3
              -49     2
              -48     3
              -47     4
              -46     2
              -45     6
              -44     5
              -43     7
              -42     5
              -41     6
              -40     4
              -39     7
              -38     7
              -37     8
              -36     7
              -35     5
              -34     8
              -33     8
              -32    14
              -31     8
              -30     9
              -29    15
              -28    13
              -27    21
              -26    14
              -25     7
              -24    14
              -23    15
              -22     7
              -21    18
              -20    14
              -19    12
              -18    17
              -17     8
              -16    13
              -15    15
              -14     9
              -13    10
              -12    14
              -11    11
              -10     7
               -9    10
               -8    14
               -7    11
               -6     7
               -5    11
               -4    12
               -3    12
               -2    12
               -1     8
                0    12
                1     7
                2    11
                3    11
                4    12
                5     5
                6     9
                7    14
                8    12
                9     8
               10    14
               11    15
               12    13
               13    12
               14     9
               15    15
               16    17
               17     9
               18    10
               19    17
               20    15
               21     9
               22     9
               23     6
               24    11
               25    20
               26    24
               27    21
               28    16
               29    11
               30    12
               31     3
               32    11
               33    12
               34    11
               35    11
               36     2
               37     8
               38     8
               39     9
               40     3
               41     5
               42     4
               43     7
               44     4
               45     7
               46     5
               47     3
               48     4
               50     2
               53     3
               54     3
               56     1
        with 1 native-precision step
               -1    42
                0   917
                1    41
        with 1 extended-precision step
            all correct

    There a single native-precision step helped a lot overall. But for n=4 (where 1/n is again exactly representable), it again hurts:

    n = 4
        x**(1/n)
                0   999
                1     1
        with 1 native-precision step
               -1    50
                0   894
                1    56
        with 1 extended-precision step
            all correct

    And at n=5 it helps again; etc.

    Of course my story hasn't changed ;-) That is, use the single-step extended-precision code. It's "almost always" correctly rounded (I still haven't seen a case where it isn't, and the test code here assert-fails if it finds one), and is plenty fast. The reason this code gets very visibly slower as n increases is due to using the always-correctly-rounded nroot() for comparison.

    @tim-one
    Copy link
    Member

    tim-one commented Sep 2, 2016

    BTW, add this other way of writing a native-precision Newton step to see that it's much worse (numerically) than writing it in the "guess + small_correction" form used in roots.py. Mathematically they're identical, but numerically they behave differently:

    def native2(x, n):
        g = x**(1.0/n)
        if g**n == x:
            return g
        return ((n-1)*g + x/g**(n-1)) / n

    @stevendaprano
    Copy link
    Member Author

    stevendaprano commented Sep 12, 2016

    As discussed with Ned via email, this issue shouldn't affect the public geometric_function and the failing tests (currently skipped) are too strict. The existing tests demand an unrealistic precision which should be loosened, e.g. from assertEqual to assertAlmostEqual. Ned wrote:

    "If you are only planning to make changes to the tests themselves, I think that can wait for b2."

    I am currently unable to build 3.6, and I won't have time to resolve that before b1.

    @mdickinson
    Copy link
    Member

    mdickinson commented Sep 16, 2016

    I think this whole nth root discussion has become way more complicated than it needs to be, and there's a simple and obvious solution.

    Two observations:

    1. What we actually need is not nth_root(x), but nth_root(x*2**e) for a float x and integer e. That's easily computed as exp((log(x) + e*log(2)) / n), and if we (a) rescale x to lie in [1.0, 2.0) and (b) remove multiples of n from e beforehand and so replace e with e % n, all intermediate values in this computation are small (less than 2.0).

    2. It's easy to compute the above expression either directly using the math module (which gives an nth root operation with an error of a handful of ulps), or with extra precision using the Decimal module (which gives an nth root operation that's almost always correctly rounded).

    If we do this, this also fixes issue bpo-28111, which is caused by the current algorithm getting into difficulties when computing the nth root of the 2**e part of x*2**e.

    Here's a direct solution using the Decimal module. If my back-of-the-envelope forward error analysis is correct, the result is always within 0.502 ulps of the correct result. That means it's *usually* going to be correctly rounded, and *always* going to be faithfully rounded. If 0.502 ulps isn't good enough for you, increase the context precision from 20 to 30, then we'll always be within 0.5000000000002 ulps instead.

    The code deals with the core problem where x is finite and positive. Extending to negative values, zeros, nans and infinities is left as an exercise for the reader.

    import decimal
    import math
    
    ROOTN_CONTEXT = decimal.Context(prec=20)
    LOG2 = ROOTN_CONTEXT.ln(2)
    
    
    def rootn(x, e, n):
        """
        Accurate value for (x*2**e)**(1/n).
    Result is within 0.502 ulps of the correct result.
    """
    if not 0.0 < x < math.inf:
        raise ValueError("x should be finite and positive")
    
    m, exp = math.frexp(x)
    q, r = divmod(exp + e - 1, n)
    d = decimal.Decimal(m*2.0)
    c = ROOTN_CONTEXT  # for brevity
    y = c.exp(c.divide(c.add(c.ln(d), c.multiply(r, LOG2)), n))
    return math.ldexp(y, q)
    

    @mdickinson
    Copy link
    Member

    mdickinson commented Sep 16, 2016

    Here's the error analysis, for the record. It's crude, but effective. Assume
    our decimal context precision is p (so p = 20 for the above code).

    1. d is represented exactly with value in [1.0, 2.0). (Conversions from float
      to Decimal are exact.)

    2. c.ln(d) < 1.0, and Decimal's ln operation is correctly rounded, so the
      *absolute* error in c.ln(d) is at most 0.5 * 10**-p.

    3. Similarly, LOG2 has an absolute error of at most 0.5 * 10**-p.

    4. Let q be the unique integer satisfying 10**q <= n < 10**(q+1).

    5. r * LOG2 < n < 10**(q+1), so ulp(r * LOG2) <= 10**(q + 1 - p),
      and the rounding error in computing r * LOG2 is at most
      0.5 * 10**(q + 1 - p). We also inherit the error from LOG2, scaled
      up by r (which is less than 10**(q+1)), so this inherited error
      is also bounded by 0.5 * 10**(q + 1 - p). Our total absolute error
      is now bounded by 10**(q + 1 - p).

    6. The result of the addition of d to r * LOG2 is also bounded by
      n < 10**(q+1), so introduces a new rounding error of up to
      0.5 * 10**(q + 1 - p) on top of the errors in d and r * LOG2
      (steps 2 and 5). Our total error is now bounded by 1.5*10**(q + 1 - p).

    7. Division by n produces a result <= 1.0. It divides our absolute error so
      far by n (>= 10**q), giving a new error of at most 15 * 10**-p, and
      introduces a new rounding error of at most 0.5 * 10**-p. Bound
      so far is 15.5 * 10**-p.

    8. The exponential operation gives a result in the range [1.0, 2.0],
      and converts our absolute error of 15.5 * 10**-p into a relative
      error of at most 15.5 * 10**-p (let's call it 16 * 10**-p to be
      safe), which since our result is bounded by 2.0 amounts to an absolute
      error of at most 32 * 10**-p. We get a rounding error on the result
      of at most 5 * 10**-p (like ln, the decimal module's "exp" operation
      is also correctly rounded), making our total error bounded by 37 * 10**-p.

    Finally, we want to express the error bound above in terms of ulps of the
    original IEEE 754 binary64 format that float (almost certainly) uses.
    Since our result (before the ldexp) is in the range [1.0, 2.0], one ulp
    for the binary format is 2**-52. So the error bound in step 8, expressed
    in binary64 ulps, is 37 / 2 **-52 * 10**-p < 2 * 10**(17-p) ulps. Rounding
    from the final Decimal value to float also incurs an error of at most
    0.5 ulps, so we end up with an error bound of

    final_error <= 0.5 + 2 * 10**(17-p) ulps.

    For p = 20, this is 0.502 ulps.

    @tim-one
    Copy link
    Member

    tim-one commented Sep 16, 2016

    Mark, the code I showed in roots.py is somewhat more accurate and highly significantly faster than the code you just posted. It's not complicated at all: it just uses Decimal to do a single Newton correction with extended precision.

    Since it doesn't use the Decimal exp() or ln(), it's faster. It does use the Decimal pow(), but with an integer exponent, so this specific use of pow() doesn't invoke the Decimal exp() or ln() either. And it's still the case that I haven't found a case where its result isn't correctly rounded. My testing framework found multiple not-correctly-rounded cases in your new code within seconds.

    Presumably you could boost the precision to improve that, but then it would get even slower.

    @mdickinson
    Copy link
    Member

    mdickinson commented Sep 16, 2016

    Mark, the code I showed in roots.py is somewhat more accurate and highly significantly faster than the code you just posted.

    Okay, fair enough. In that case, we still need a solution for computing rootn(x * 2**e) in the case where x*2**e itself overflows / underflows an IEEE 754 float.

    @mdickinson
    Copy link
    Member

    mdickinson commented Sep 16, 2016

    It does use the Decimal pow(), but with an integer exponent, so this specific use of pow() doesn't invoke the Decimal exp() or ln() either.

    Decimal pow doesn't special-case integer exponents; the solution will still be based on exp and ln.

    @mdickinson
    Copy link
    Member

    mdickinson commented Sep 16, 2016

    Decimal pow doesn't special-case integer exponents; the solution will still be based on exp and ln.

    Ah, sorry; I'm wrong. That was true for the Python version of the decimal module, not for the C implementation.

    @mdickinson
    Copy link
    Member

    mdickinson commented Sep 16, 2016

    And it's still the case that I haven't found a case where its result isn't correctly rounded.

    Here's one: :-)

    >>> rootn(1 + 2**-52, 2)
    1.0000000000000002

    The correctly rounded result would be 1.0.

    @tim-one
    Copy link
    Member

    tim-one commented Sep 16, 2016

    Mark, thanks for the counterexample! I think I can fairly accuse you of thinking ;-)

    I expect the same approach would be zippy for scaling x by 2**e, provided that the scaled value doesn't exceed the dynamic range of the decimal context. Like so:

    def erootn(x, e, n,
              D=decimal.Decimal,
              D2=decimal.Decimal(2),
              pow=c.power,
              sub=c.subtract,
              div=c.divide):
        g = x**(1.0/n) * 2.0**(e/n) # force e to float for Python 2?
        Dg = D(g)
        return g - float(sub(Dg, div(D(x)*D2**e, pow(Dg, n-1)))) / n

    The multiple errors in the native-float-precision starting guess shouldn't hurt. In this case D(x)*D2**e may not exactly equal the scaled input either, but the error(s) are "way at the end" of the extended-precision decimal format.

    I don't know the range of e you have to worry about. On my box, decimal.MAX_EMAX == 999999999999999999 ... but the docs say that on a 32-bit box it's "only" 425000000. If forcing the context Emax to that isn't enough, then your code is still graceful but this other approach would need to get uglier.

    @tim-one
    Copy link
    Member

    tim-one commented Sep 16, 2016

    Oops! The D2**e in that code should be pow(D2, e), to make it use the correct decimal context.

    @tim-one
    Copy link
    Member

    tim-one commented Sep 17, 2016

    Let me give complete code for the last idea, also forcing the scaling multiplication to use the correct context:

        import decimal
        c = decimal.DefaultContext.copy()
        c.prec = 25
        c.Emax = decimal.MAX_EMAX
        c.Emin = decimal.MIN_EMIN
    
        def erootn(x, e, n,
                  D=decimal.Decimal,
                  D2=decimal.Decimal(2),
                  pow=c.power,
                  sub=c.subtract,
                  mul=c.multiply,
                  div=c.divide):
            g = x**(1.0/n) * 2.0**(e/n)
            Dg = D(g)
            return g - float(sub(Dg, div(mul(D(x), pow(D2, e)),
                                         pow(Dg, n-1)))) / n
    del decimal, c
    

    Both pow() instances use an integer exponent, so the C implementation of decimal still avoids expensive exp() and ln() computations - it's all just basic - * / under the covers.

    On a 32-bit box, the scaled input (x*2**e) must be <= 9.999...999E425000000, which is about 2**1411819443 (an exponent of about 1.4 billion). On the close-to-0 side, approximately the reciprocals of those giant numbers.

    The range is much wider on a 64-bit box (because Emax and Emin have much larger absolute values).

    Then, e.g.,

    >>> erootn(1, 1411819443, 1411819443)
    2.0
    >>> erootn(1, -1411819443, 1411819443)
    0.5

    So it's obviously perfect ;-) Except that, because it reduces to the previous algorithm when e=0, Mark's counterexample (to correct rounding) still holds:

    >>> erootn(1 + 2**-52, 0, 2)
    1.0000000000000002

    Boosting precision to 28 (from 25) is enough to repair that, but it doesn't bother me (there was never a claim that it was always correctly rounded - but at prec=25 it went through hours & hours of testing randomish inputs and n values without finding a less-than-perfect case - but those tests effectively always used e=0 too).

    Basic error analysis for Newton's method usually goes like this: suppose the true root is t and the current guess is g = t*(1+e) for some small |e|. Then plug t*(1+e) into the equation and do a Taylor expansion around e=0. In this case, dropping terms at or above cubic in e leaves that the new guess is about

    t*(1 + (n-1)/2 * e**2)

    That the relative error goes from e to (about) e**2 (which is typical of Newton's method) is the source of the claim that the number of good bits approximately doubles on each iteration. In this specific case, it's not quite that good: the (n-1)/2 factor means convergence becomes slower the larger n is. But e in this case is perhaps on the order of 2**-50, and (n-1)/2 times 2**-100 (e**2) remains tiny for any plausible value of n.

    Good enough for me ;-)

    @tim-one
    Copy link
    Member

    tim-one commented Sep 18, 2016

    Let me clarify something about the extended algorithm: the starting guess is no longer the most significant source of error. It's the mul(D(x), pow(D2, e)) part. D(x) is exact, but pow(D2, e) may not be exactly representable with 26 decimal digits, and the multiplication may also need to round. Those 2 operations may each suffer a half ulp error (but "ulp" in this case is 1 in the 26th decimal digit).

    The rough error analysis is straightforward to extend, but I used wxMaxima to do the Taylor expansions and to ensure I didn't drop any terms.

    So suppose that instead of exactly x2**e we get x2e*(1+r1) for some small |r1| (on the order of 10-25), and instead of the exact n'th root t the starting guess g = t*(1+r2) for some small |r2| (on the order of 2**-50). Then the mathematical value of the corrected g is t times (dropping terms higher than quadratic in r1 and r2):

    1 + r1/n - (n-1)/n * r1*r2 + (n-1)/2 * r2**2

    The largest term (besides 1) by far is now r1/n (assuming a not-insanely-large n). But that's still "way at the end" of the extended-precision decimal format, which should be ;-) intuitively obvious. It's too small to have a direct effect on the last bit of the 53-bit result (but can certainly affect "close call" rounding cases).

    @mdickinson
    Copy link
    Member

    mdickinson commented Sep 19, 2016

    Good enough for me ;-)

    Me too.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    8 participants