Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up fractions implementation #66654

Closed
scoder opened this issue Sep 22, 2014 · 18 comments
Closed

Speed up fractions implementation #66654

scoder opened this issue Sep 22, 2014 · 18 comments
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@scoder
Copy link
Contributor

scoder commented Sep 22, 2014

BPO 22464
Nosy @birkenfeld, @rhettinger, @mdickinson, @scoder, @serhiy-storchaka
Files
  • special_case_int.patch: special case "int" values in the constructor
  • special_case_int2.patch: special case only exact "int" values in the constructor
  • special_case_int3.patch: same as before with a little twist in type check
  • special_case_int4.patch: same as before + faster "f == intval"
  • avoid_redundant_property_lookups.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2014-09-26.07:53:10.446>
    created_at = <Date 2014-09-22.17:37:14.731>
    labels = ['library', 'performance']
    title = 'Speed up fractions implementation'
    updated_at = <Date 2014-09-26.08:30:22.188>
    user = 'https://github.com/scoder'

    bugs.python.org fields:

    activity = <Date 2014-09-26.08:30:22.188>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2014-09-26.07:53:10.446>
    closer = 'scoder'
    components = ['Library (Lib)']
    creation = <Date 2014-09-22.17:37:14.731>
    creator = 'scoder'
    dependencies = []
    files = ['36687', '36688', '36689', '36690', '36691']
    hgrepos = []
    issue_num = 22464
    keywords = ['patch']
    message_count = 18.0
    messages = ['227284', '227290', '227292', '227294', '227295', '227298', '227299', '227300', '227302', '227308', '227395', '227406', '227407', '227408', '227488', '227576', '227591', '227593']
    nosy_count = 6.0
    nosy_names = ['georg.brandl', 'rhettinger', 'mark.dickinson', 'scoder', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue22464'
    versions = ['Python 3.5']

    @scoder
    Copy link
    Contributor Author

    scoder commented Sep 22, 2014

    Fractions are an excellent way to do exact money calculations and largely beat Decimal in terms of simplicity, accuracy and safety. Clearly not in terms of speed, though.

    The current implementation does some heavy type checking and dispatching in __new__() and a simplistic gcd based normalisation. Here is a profiling run from the benchmark proposed in bpo-22458 (which matches more or less the results with my own code):

         6969671 function calls (6969278 primitive calls) in 4.835 seconds
    

    Ordered by: internal time

    ncalls tottime percall cumtime percall filename:lineno(function)
    519644 1.324 0.000 2.928 0.000 fractions.py:73(new)
    519632 0.654 0.000 0.654 0.000 fractions.py:17(gcd)
    319744 0.637 0.000 2.694 0.000 fractions.py:400(_add)
    1039260 0.507 0.000 0.950 0.000 abc.py:178(instancecheck)
    4 0.459 0.115 4.821 1.205 bm_telco_fractions.py:38(run)
    1039300 0.443 0.000 0.443 0.000 _weakrefset.py:70(contains)
    519616 0.301 0.000 4.329 0.000 fractions.py:373(forward)
    199872 0.272 0.000 1.335 0.000 fractions.py:416(_mul)
    1598720 0.117 0.000 0.117 0.000 fractions.py:278(denominator)
    959232 0.074 0.000 0.074 0.000 fractions.py:274(numerator)

    The instantiation itself takes twice as much time as the gcd calculations, and both dominate the overall runtime of the benchmark by about 60%. Improving the instantiation time would thus bring a substantial benefit.

    @scoder scoder added the stdlib Python modules in the Lib dir label Sep 22, 2014
    @scoder
    Copy link
    Contributor Author

    scoder commented Sep 22, 2014

    Adding Mark Dickinson to the noisy list who mentioned having worked on a C implementation for gcd(). I think this would be a good thing to try. However, the most important part would be to restructure and specialise Fraction.__new__().

    @scoder
    Copy link
    Contributor Author

    scoder commented Sep 22, 2014

    Here is a straight forward patch that special cases int values in the constructor. It gives me a 35% speedup in the benchmark.

    @scoder scoder added the performance Performance or resource usage label Sep 22, 2014
    @scoder
    Copy link
    Contributor Author

    scoder commented Sep 22, 2014

    Updated patch - there is a somewhat hidden attempt in the code to keep nominator and denominater plain int values, not subtypes. That means that it's safer to restrict the optimisation to plain ints as well, which should still hit >95% of the use cases.

    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented Sep 22, 2014

    What speedup give you second change?

    @scoder
    Copy link
    Contributor Author

    scoder commented Sep 22, 2014

    The second isn't a big difference because it only hits plain instantiations from integers. They are less likely to be performance critical than those from a quotient, which happen for all calculations.

    It's more for symmetry, I guess.

    @scoder
    Copy link
    Contributor Author

    scoder commented Sep 22, 2014

    I found one more place where special casing helps: equal comparisons to integers, e.g. f == 0 or f == 1 or so. timeit shows me a speedup by a factor of three for this, with only a tiny slow-down when comparing fractions on both sides.

    I put all of them in one patch, deselecting after applying should be easy enough.

    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented Sep 22, 2014

    Microbenchmarks show about 2x speedup in both cases.

    The patch LGTM.

    @scoder
    Copy link
    Contributor Author

    scoder commented Sep 22, 2014

    Benchmark profile after the patch:

         5930670 function calls (5930288 primitive calls) in 3.748 seconds
    

    Ordered by: internal time

    ncalls tottime percall cumtime percall filename:lineno(function)
    519632 0.828 0.000 0.828 0.000 fractions.py:17(gcd)
    519644 0.806 0.000 1.700 0.000 fractions.py:73(new)
    319744 0.630 0.000 1.983 0.000 fractions.py:408(add)
    4 0.393 0.098 3.735 0.934 bm_telco_fractions.py:38(run)
    519616 0.277 0.000 3.316 0.000 fractions.py:381(forward)
    199872 0.275 0.000 0.901 0.000 fractions.py:424(mul)
    1598720 0.161 0.000 0.161 0.000 fractions.py:285(denominator)
    520257 0.155 0.000 0.155 0.000 {built-in method isinstance}
    959232 0.118 0.000 0.118 0.000 fractions.py:281(numerator)
    519657 0.066 0.000 0.066 0.000 {built-in method __new
    of type object at 0x9d1c40}

    Note that the gcd() call inside of __new__() now takes about as much time as the rest of the __new__() itself. It's clearly still worth optimising that.

    @scoder
    Copy link
    Contributor Author

    scoder commented Sep 22, 2014

    Here is another little optimisation that removes the redundant property lookups for the denominator in __add__() and __sub__().

    New profile:

         5291182 function calls (5290800 primitive calls) in 3.596 seconds
    

    Ordered by: internal time

    ncalls tottime percall cumtime percall filename:lineno(function)
    519632 0.843 0.000 0.843 0.000 fractions.py:17(gcd)
    519644 0.800 0.000 1.709 0.000 fractions.py:73(new)
    319744 0.520 0.000 1.816 0.000 fractions.py:408(add)
    4 0.401 0.100 3.582 0.896 bm_telco_fractions.py:38(run)
    519616 0.291 0.000 3.156 0.000 fractions.py:381(forward)
    199872 0.274 0.000 0.904 0.000 fractions.py:424(mul)
    520257 0.145 0.000 0.145 0.000 {built-in method isinstance}
    959232 0.108 0.000 0.108 0.000 fractions.py:285(denominator)
    959232 0.108 0.000 0.108 0.000 fractions.py:281(numerator)
    519657 0.066 0.000 0.066 0.000 {built-in method __new
    of type object at 0x9d1c40}

    @scoder
    Copy link
    Contributor Author

    scoder commented Sep 23, 2014

    This simple Cython variant of gcd() is substantially faster for me (you may consider it pseudo-code for a C implementation):

    def _gcd(a, b):
        # Try doing all computation in C space.  If the numbers are too large
        # at the beginning, retry until they are small enough.
        cdef long long ai, bi
        while b:
            try:
                ai, bi = a, b
            except OverflowError:
                pass
            else:
                # switch to C loop
                while bi:
                    ai, bi = bi, ai%bi
                return ai
            a, b = b, a%b
        return a

    It tries to drop the two numbers into C long-long-ints as soon as possible, and only runs the calculation in Python space until they are small enough to fit. In most cases, this will be either right from the start or fairly quickly after only a few iterations.

    @scoder
    Copy link
    Contributor Author

    scoder commented Sep 24, 2014

    BTW, the last two patches (int4 and redundant_property) are ready to be applied. Anyone?

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 24, 2014

    New changeset 646bc7d3544b by Georg Brandl in branch 'default':
    bpo-22464: Speed up common Fraction operations by special-casing several
    https://hg.python.org/cpython/rev/646bc7d3544b

    @birkenfeld
    Copy link
    Member

    birkenfeld commented Sep 24, 2014

    Left this open in case you have additional patches coming.

    @scoder
    Copy link
    Contributor Author

    scoder commented Sep 24, 2014

    I created bpo-22486 about the gcd() performance. I think we can close this ticket - I don't see any more obvious low hanging fruit and future findings can have their own ticket.

    Out of interest, I modified the fractions module to compile Fraction into an extension type with Cython. For the benchmark, it currently gives me a 10x speedup over the original fractions module in Py3.4. I uploaded my initial attempt to PyPI and it's on github:

    https://pypi.python.org/pypi/quicktions

    https://github.com/scoder/quicktions

    That makes a C implementation of Fraction generally worth considering for CPython.

    @scoder scoder closed this as completed Sep 24, 2014
    @scoder
    Copy link
    Contributor Author

    scoder commented Sep 25, 2014

    Sorry for reopening this, but I found one more thing. Division is pretty heavy on PyLong objects and there doesn't seem to be an internal optimisation for division by 1 and -1. Since many Fraction input values can already be normalised for some reason, the following change shaves off almost 30% of the calls to PyNumber_InPlaceFloorDivide() in the telco benchmark during Fraction instantiation according to callgrind, thus saving 20% of the CPU instructions that go into tp_new().

    Instead of always executing

            numerator //= g
            denominator //= g
    

    this avoids unnecessary divisions:

                if g is not 1:
                    if g is -1:
                        numerator = -numerator
                        denominator = -denominator
                    else:
                        numerator //= g
                        denominator //= g

    Using the "is" operator here is CPythonish, but doing the same with != and == should also be an improvement already, although not as cheap. Now, given that CPython caches small integers internally, I would suggest actually not adding these two special cases to the fractions module but more generally to the division routines in longobject.c.

    @scoder scoder reopened this Sep 25, 2014
    @scoder
    Copy link
    Contributor Author

    scoder commented Sep 26, 2014

    I tried it, but it seems better to open a new ticket for this as there are behavioural changes. See bpo-22501.

    @scoder scoder closed this as completed Sep 26, 2014
    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented Sep 26, 2014

    Please do not use "is" for number comparison. This can be broken unexpectedly in future or on alternative implementation.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants