-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
Speed up fractions implementation #66654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Fractions are an excellent way to do exact money calculations and largely beat Decimal in terms of simplicity, accuracy and safety. Clearly not in terms of speed, though. The current implementation does some heavy type checking and dispatching in __new__() and a simplistic gcd based normalisation. Here is a profiling run from the benchmark proposed in bpo-22458 (which matches more or less the results with my own code):
Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) The instantiation itself takes twice as much time as the gcd calculations, and both dominate the overall runtime of the benchmark by about 60%. Improving the instantiation time would thus bring a substantial benefit. |
Adding Mark Dickinson to the noisy list who mentioned having worked on a C implementation for gcd(). I think this would be a good thing to try. However, the most important part would be to restructure and specialise Fraction.__new__(). |
Here is a straight forward patch that special cases int values in the constructor. It gives me a 35% speedup in the benchmark. |
Updated patch - there is a somewhat hidden attempt in the code to keep nominator and denominater plain int values, not subtypes. That means that it's safer to restrict the optimisation to plain ints as well, which should still hit >95% of the use cases. |
What speedup give you second change? |
The second isn't a big difference because it only hits plain instantiations from integers. They are less likely to be performance critical than those from a quotient, which happen for all calculations. It's more for symmetry, I guess. |
I found one more place where special casing helps: equal comparisons to integers, e.g. f == 0 or f == 1 or so. timeit shows me a speedup by a factor of three for this, with only a tiny slow-down when comparing fractions on both sides. I put all of them in one patch, deselecting after applying should be easy enough. |
Microbenchmarks show about 2x speedup in both cases. The patch LGTM. |
Benchmark profile after the patch:
Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) Note that the gcd() call inside of __new__() now takes about as much time as the rest of the __new__() itself. It's clearly still worth optimising that. |
Here is another little optimisation that removes the redundant property lookups for the denominator in __add__() and __sub__(). New profile:
Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) |
This simple Cython variant of gcd() is substantially faster for me (you may consider it pseudo-code for a C implementation): def _gcd(a, b):
# Try doing all computation in C space. If the numbers are too large
# at the beginning, retry until they are small enough.
cdef long long ai, bi
while b:
try:
ai, bi = a, b
except OverflowError:
pass
else:
# switch to C loop
while bi:
ai, bi = bi, ai%bi
return ai
a, b = b, a%b
return a It tries to drop the two numbers into C long-long-ints as soon as possible, and only runs the calculation in Python space until they are small enough to fit. In most cases, this will be either right from the start or fairly quickly after only a few iterations. |
BTW, the last two patches (int4 and redundant_property) are ready to be applied. Anyone? |
New changeset 646bc7d3544b by Georg Brandl in branch 'default': |
Left this open in case you have additional patches coming. |
I created bpo-22486 about the gcd() performance. I think we can close this ticket - I don't see any more obvious low hanging fruit and future findings can have their own ticket. Out of interest, I modified the fractions module to compile Fraction into an extension type with Cython. For the benchmark, it currently gives me a 10x speedup over the original fractions module in Py3.4. I uploaded my initial attempt to PyPI and it's on github: https://pypi.python.org/pypi/quicktions https://github.com/scoder/quicktions That makes a C implementation of Fraction generally worth considering for CPython. |
Sorry for reopening this, but I found one more thing. Division is pretty heavy on PyLong objects and there doesn't seem to be an internal optimisation for division by 1 and -1. Since many Fraction input values can already be normalised for some reason, the following change shaves off almost 30% of the calls to PyNumber_InPlaceFloorDivide() in the telco benchmark during Fraction instantiation according to callgrind, thus saving 20% of the CPU instructions that go into tp_new(). Instead of always executing
this avoids unnecessary divisions: if g is not 1:
if g is -1:
numerator = -numerator
denominator = -denominator
else:
numerator //= g
denominator //= g Using the "is" operator here is CPythonish, but doing the same with != and == should also be an improvement already, although not as cheap. Now, given that CPython caches small integers internally, I would suggest actually not adding these two special cases to the fractions module but more generally to the division routines in longobject.c. |
I tried it, but it seems better to open a new ticket for this as there are behavioural changes. See bpo-22501. |
Please do not use "is" for number comparison. This can be broken unexpectedly in future or on alternative implementation. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: