Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add math.gcd() #66676

Closed
scoder opened this issue Sep 24, 2014 · 48 comments
Closed

Add math.gcd() #66676

scoder opened this issue Sep 24, 2014 · 48 comments
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@scoder
Copy link
Contributor

scoder commented Sep 24, 2014

BPO 22486
Nosy @mdickinson, @abalkin, @pitrou, @scoder, @vstinner, @serhiy-storchaka, @wm75
PRs
  • bpo-39350: Remove deprecated fractions.gcd() #18021
  • Files
  • lehmer_gcd_4.patch
  • lehmer_gcd_5.patch
  • lehmer_gcd_6.patch
  • lehmer_gcd_7.patch
  • euclidean_gcd.patch
  • lehmer_gcd_8.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2015-05-12.23:26:09.896>
    created_at = <Date 2014-09-24.20:40:35.148>
    labels = ['library', 'performance']
    title = 'Add math.gcd()'
    updated_at = <Date 2020-01-16.10:02:58.267>
    user = 'https://github.com/scoder'

    bugs.python.org fields:

    activity = <Date 2020-01-16.10:02:58.267>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2015-05-12.23:26:09.896>
    closer = 'benjamin.peterson'
    components = ['Library (Lib)']
    creation = <Date 2014-09-24.20:40:35.148>
    creator = 'scoder'
    dependencies = []
    files = ['36722', '36723', '36726', '36741', '36742', '37422']
    hgrepos = []
    issue_num = 22486
    keywords = ['patch']
    message_count = 48.0
    messages = ['227487', '227507', '227522', '227524', '227525', '227527', '227528', '227529', '227531', '227532', '227533', '227534', '227535', '227536', '227537', '227538', '227554', '227555', '227570', '227619', '227620', '227623', '227629', '227630', '227637', '227638', '227640', '227641', '227663', '227664', '227670', '227672', '227711', '228544', '228545', '228546', '228547', '228548', '228549', '228550', '228551', '228552', '228556', '232521', '232523', '242048', '243015', '360112']
    nosy_count = 9.0
    nosy_names = ['mark.dickinson', 'belopolsky', 'pitrou', 'scoder', 'vstinner', 'python-dev', 'serhiy.storchaka', 'wolma', 'gladman']
    pr_nums = ['18021']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue22486'
    versions = ['Python 3.5']

    @scoder
    Copy link
    Contributor Author

    scoder commented Sep 24, 2014

    fractions.gcd() is required for normalising numerator and denominator of the Fraction data type. Some speed improvements were applied to Fraction in bpo-22464, now the gcd() function takes up about half of the instantiation time in the benchmark in bpo-22458, which makes it quite a heavy part of the overall Fraction computation time.

    The current implementation is

    def gcd(a, b):
        while b:
            a, b = b, a%b
        return a

    Reimplementing it in C would provide for much faster calculations. Here is a Cython version that simply drops the calculation loop into C as soon as the numbers are small enough to fit into a C long long int:

    def _gcd(a, b):
        # Try doing all computation in C space.  If the numbers are too large
        # at the beginning, retry until they are small enough.
        cdef long long ai, bi
        while b:
            try:
                ai, bi = a, b
            except OverflowError:
                pass
            else:
                # switch to C loop
                while bi:
                    ai, bi = bi, ai%bi
                return ai
            a, b = b, a%b
        return a

    It's substantially faster already because the values will either be small enough right from the start or quickly become so after a few iterations with Python objects.

    Further improvements should be possible with a dedicated PyLong implementation based on Lehmer's GCD algorithm:

    https://en.wikipedia.org/wiki/Lehmer_GCD_algorithm

    @scoder scoder added performance Performance or resource usage stdlib Python modules in the Lib dir labels Sep 24, 2014
    @mdickinson
    Copy link
    Member

    In case it's useful, see issue bpo-1682 for my earlier Lehmer gcd implementation. At the time, that approach was dropped as being premature optimisation.

    @pitrou
    Copy link
    Member

    pitrou commented Sep 25, 2014

    If Python grows an optimized implementation, how about exposing it in the math module?

    @mdickinson
    Copy link
    Member

    If Python grows an optimized implementation, how about exposing it in the math module?

    +1.

    @scoder
    Copy link
    Contributor Author

    scoder commented Sep 25, 2014

    That's what the patch does anyway. +1

    @pitrou
    Copy link
    Member

    pitrou commented Sep 25, 2014

    Hmm... which patch?

    @scoder
    Copy link
    Contributor Author

    scoder commented Sep 25, 2014

    @serhiy-storchaka
    Copy link
    Member

    Here is updated Mark's patch from bpo-1682. It is ported to 3.5, slightly simplified and optimized (I did not touched the main algorithm still), utilized in the fractions module, added tests and documentation.

    It speeds up Stefan's fractions benchmark about 20%.

    @serhiy-storchaka serhiy-storchaka added extension-modules C modules in the Modules dir type-feature A feature request or enhancement and removed performance Performance or resource usage labels Sep 25, 2014
    @serhiy-storchaka serhiy-storchaka changed the title Speed up fractions.gcd() Add math.gcd() Sep 25, 2014
    @scoder
    Copy link
    Contributor Author

    scoder commented Sep 25, 2014

    The problem is that this changes the behaviour of fractions.gcd() w.r.t. negative numbers. It's a public function, so we should keep it for backwards compatibility reasons, *especially* when adding a new function in the math module.

    @scoder scoder added performance Performance or resource usage and removed extension-modules C modules in the Modules dir type-feature A feature request or enhancement labels Sep 25, 2014
    @scoder
    Copy link
    Contributor Author

    scoder commented Sep 25, 2014

    Oh, and thanks for working on it, Serhiy! :)

    @wm75
    Copy link
    Mannequin

    wm75 mannequin commented Sep 25, 2014

    see bpo-22477 for a discussion of whether the behavior of fractions.gcd should be changed or not

    @wm75
    Copy link
    Mannequin

    wm75 mannequin commented Sep 25, 2014

    sorry, forgot to format the link:
    issue<22477>

    @scoder
    Copy link
    Contributor Author

    scoder commented Sep 25, 2014

    The thing is, if we add something new in a substantially more exposed place (the math module), then why break legacy code *in addition*? Just leaving it the way it is won't harm anyone, really.

    @wm75
    Copy link
    Mannequin

    wm75 mannequin commented Sep 25, 2014

    I wasn't arguing for or against anything, just providing a link to the relevant discussion.

    @serhiy-storchaka
    Copy link
    Member

    Well, here is a patch which keeps the same weird behavior of fractions.gcd().

    @gladman
    Copy link
    Mannequin

    gladman mannequin commented Sep 25, 2014

    I am inclined to think that a maths.gcd() makes sense as this would be where I would go first to find this function. And the prospect of better performance is attractive since the gcd is an important operation in work with number theory algorithms.

    Would it co-exist with fractions.gcd(), with identical semantics?

    Or would it co-exist with fractions.gcd(), with the 'less surprising' semantics that are under discussion in the 'GCD in Fractions' thread?

    Would it take on the suggestion of operating on one or more input parameters?

    @mdickinson
    Copy link
    Member

    Or would it co-exist with fractions.gcd(), with the 'less surprising'
    semantics that are under discussion in the 'GCD in Fractions' thread?

    Yes, exactly. math.gcd will always give a nonnegative result. The output of fractions.gcd remains unchanged for integer inputs, for backwards compatibility.

    @mdickinson
    Copy link
    Member

    Serhiy: thank you! I've been meaning to update that patch for a long time, but hadn't found the courage or time to face the inevitable bitrot.

    @serhiy-storchaka
    Copy link
    Member

    Now I spent more time on the patch. Changes in updated patch:

    • Removed code duplication for odd and even k.
    • Temporary buffers c and d no longer allocated on every iteration.
    • Long result now compacted. No longer unused allocated size.
    • Added checks for results of long_abs() (it can fail).
    • Merged _PyLong_GCD and long_gcd. Fast path for small negative integers no longer need to copy long objects in long_abs().
    • Added tests for large negative numbers and for case Py_SIZE(a) - Py_SIZE(b) > 3.

    @scoder
    Copy link
    Contributor Author

    scoder commented Sep 26, 2014

    Thanks, Serhiy. However, something is wrong with the implementation. The benchmark runs into an infinite loop (it seems). And so do the previous patches. Does it work for you?

    @scoder
    Copy link
    Contributor Author

    scoder commented Sep 26, 2014

    I compiled it with 30 bit digits, in case that's relevant. (It might be.)

    @serhiy-storchaka
    Copy link
    Member

    It works to me (compiled with 15-bit digits). Cold you please add debugging
    prints (before and after the call of math.gcd()) and find which operation is
    looping (math.gcd() itself, and for what arguments, or some Python code)?

    @scoder
    Copy link
    Contributor Author

    scoder commented Sep 26, 2014

    This is what hangs for me:

    math.gcd(1216342683557601535506311712, 436522681849110124616458784)

    "a" and "b" keep switching between both values, but otherwise, the loop just keeps running.

    The old fractions.gcd() gives 32 for them.

    @scoder
    Copy link
    Contributor Author

    scoder commented Sep 26, 2014

    I can confirm that it works with 15 bit digits.

    @mdickinson
    Copy link
    Member

    To avoid regressions, please can we leave the old fractions.gcd exactly as it was?

    For example, the current fractions.gcd does work for Fraction instances [1]. That's certainly not its intended use, but I wouldn't be surprised if there's code out there that uses it in that way. It also just happens to work for nonnegative finite float inputs, because a % b gives exact results when a and b are positive floats, so no error is introduced at any point.

    I'd also worry about breaking existing uses involving integer-like objects (instances of numpy.int64, for example) in place of instances of ints.

    [1] By "works", I mean that if a and b are Fractions then gcd(a, b) returns a Fraction such that (1) a and b are integer multiples of gcd(a, b), and (2) gcd(a, b) is an integer multiple of any other number with this property.

    @mdickinson
    Copy link
    Member

    This is what hangs for me:

    Uh-oh. Sounds like I screwed up somewhere. I'll take a look this weekend, unless Serhiy beats me too it.

    @mdickinson
    Copy link
    Member

    too it.

    Bah. "to it". Stupid fingers.

    @serhiy-storchaka
    Copy link
    Member

    Thank you Stefan. I confirm that it hangs with 30-bit digits.

    One existing bug is in the use of PyLong_AsLong() before simple Euclidean
    loop. It should be PyLong_AsLongLong() if the long is not enough for two
    digits. But there is another bug in inner loop...

    @serhiy-storchaka
    Copy link
    Member

    Here is fixed patch.

    There was integer overflow. In C short*short is extended to int, but int*int
    results int.

    @serhiy-storchaka
    Copy link
    Member

    And for comparison here is simpler patch with Euclidean algorithm.

    @scoder
    Copy link
    Contributor Author

    scoder commented Sep 27, 2014

    Patch 7 works for me. Why are the two Py_ABS() calls at the end needed when we start off the algorithm with long_abs()?

    The Lehmer code is complex (I guess that's why you added the pure Euclidean implementation), but it's the right algorithm to use here, so I'd say we should. It's 4% faster than the Euclidean code for the fractions benchmark when using 30 bit digits, but (surprisingly enough) about the same speed with 15 bit digits. There is no major difference to expect here as the numbers are perpetually normalised in Fractions and thus kept small (usually small enough to fit into a 64bit integer), i.e. Euclid should do quite well on them.

    The difference for big numbers is substantial though:

    Euclid:

    $ ./python -m timeit -s 'from math import gcd; a = 2**123 + 3**653 + 5**23 + 7**49; b = 2**653 + 2**123 + 5**23 + 11**34' 'gcd(a,b)'
    10000 loops, best of 3: 71 usec per loop

    Lehmer:

    $ ./python -m timeit -s 'from math import gcd; a = 2**123 + 3**653 + 5**23 + 7**49; b = 2**653 + 2**123 + 5**23 + 11**34' 'gcd(a,b)'
    100000 loops, best of 3: 11.6 usec per loop

    @serhiy-storchaka
    Copy link
    Member

    Why are the two Py_ABS() calls at the end needed when we start off the
    algorithm with long_abs()?

    Because long_abs()'s are omitted for small enough numbers (common case). So we
    avoid a copying for negative numbers or int subclasses.

    I guess that's why you added the pure Euclidean implementation

    Euclidean algorithm is required step at the end of Lehmer algorithm.

    It's 4% faster than the Euclidean code for the fractions benchmark
    when using 30 bit digits, but (surprisingly enough) about the same speed
    with 15 bit digits.

    May be because Lehmer code uses 64-bit computation for 30-bit digits, and
    Euclidean code always uses 32-bit computation.

    The difference for big numbers is substantial though:

    1000-bit integers are big, but can be encountered in real word (e.g. in
    cryptography). So may be there is need in Lehmer algorithm.

    @scoder
    Copy link
    Contributor Author

    scoder commented Sep 27, 2014

    My personal take is: if there is an implementation in the stdlib, it should be the one that's most widely applicable. And that includes large numbers. We have a working implementation that is algorithmically faster for large numbers, so I can't see why we should drop it unused.

    I'm for merging patch 7.

    @scoder
    Copy link
    Contributor Author

    scoder commented Oct 5, 2014

    Any objections to merging the last patch?

    @mdickinson
    Copy link
    Member

    Any objections to merging the last patch?

    Yes! Please don't make these changes to Fractions.gcd: they'll cause regressions for existing uses of Fractions.gcd with objects not of type int.

    @scoder
    Copy link
    Contributor Author

    scoder commented Oct 5, 2014

    There are not such changes in patch 7. The fractions.gcd() function is unchanged but no longer used by the Fraction type, which now uses math.gcd() internally instead.

    @mdickinson
    Copy link
    Member

    Ah, I misread; thanks.

    What happens with this patch if a Fraction has been created with Integrals that aren't of type int? (E.g., with NumPy int32 instances, for example?)

    @vstinner
    Copy link
    Member

    vstinner commented Oct 5, 2014

    Why patching fraction.Fraction constructor instead of fractions.gcd()?

    I don't like the idea of having two functions, math.gcd and fractions.gcd, which do almost the same, but one is slow, whereas the other is fast. It's harder to write efficient code working on Python < 3.5 (use fractions) and Python >= 3.5 (use math or fractions?).

    I suggest to modify fractions.gcd() to use math.gcd() if the two parameters are int. We just have to adjust the sign: if the second parameter is negative, return -math.gcd(a, b). (I guess that we have unit tests for fractions.gcd checking the 4 cases for signed parameters.)

    @mdickinson
    Copy link
    Member

    I suggest to modify fractions.gcd() to use math.gcd() if the two parameters are int.

    Sounds fine to me, so long as the code (both fractions.gcd and the fractions.Fraction implementation) continues to function as before for objects that don't have exact type int.

    @scoder
    Copy link
    Contributor Author

    scoder commented Oct 5, 2014

    +1

    I mean, there is already such a type check in Fraction.__init__(), but I can see a case for also optimising fraction.gcd() for exact ints.

    @mdickinson
    Copy link
    Member

    One other suggestion: I think math.gcd should work with arbitrary Python objects implementing __index__, and not just with instances of int.

    @mdickinson
    Copy link
    Member

    I mean, there is already such a type check in Fraction.__init__()

    That type-check doesn't protect us from non-int Integrals, though, as far as I can tell. It looks to me as though doing Fraction(numpy.int32(3), numpy.int32(2)) would fail with a TypeError after this patch. (It works in Python 3.4.)

    @serhiy-storchaka
    Copy link
    Member

    I suggest just add deprecation warning in fractions.gcd(). Or at least add
    notes which recommend math.gcd() in the docstring and the documentation of
    fractions.gcd().

    One other suggestion: I think math.gcd should work with arbitrary Python
    objects implementing __index__, and not just with instances of int.

    Agree.

    @vstinner
    Copy link
    Member

    What's the status of this issue? See also the issue bpo-22477.

    @serhiy-storchaka
    Copy link
    Member

    Here is a patch which addresses both Mark's suggestions.

    • math.gcd() now work with arbitrary Python objects implementing __index__.
    • fractions.gcd() and Fraction's constructor now use math.gcd() if both arguments are int, but also support non-ints (e.g. Fractions or floats).
    • fractions.gcd() now is deprecated.

    But before committing I want to experiment with simpler implementation and compare it with current complex implementation. If the difference will be not too large, we could use simpler implementation.

    @scoder
    Copy link
    Contributor Author

    scoder commented Apr 26, 2015

    Any more comments on this? The deadlines for new features in Py3.5 are getting closer. It seems we're just discussing details here, but pretty much everyone wants this feature.

    So, what are the things that still need to be done? Serhiy submitted working patches months ago.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 12, 2015

    New changeset 34648ce02bd4 by Serhiy Storchaka in branch 'default':
    Issue bpo-22486: Added the math.gcd() function. The fractions.gcd() function now is
    https://hg.python.org/cpython/rev/34648ce02bd4

    @vstinner
    Copy link
    Member

    New changeset 4691a2f by Victor Stinner in branch 'master':
    bpo-39350: Remove deprecated fractions.gcd() (GH-18021)
    4691a2f

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants