Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-43420: Simple optimizations for Fraction's arithmetics #24779

Merged
merged 13 commits into from Mar 22, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
125 changes: 116 additions & 9 deletions Lib/fractions.py
Expand Up @@ -380,32 +380,139 @@ def reverse(b, a):

return forward, reverse

# Rational arithmetic algorithms: Knuth, TAOCP, Volume 2, 4.5.1.
#
# Assume input fractions a and b are normalized.
#
# 1) Consider addition/subtraction.
#
# Let g = gcd(da, db). Then
#
# na nb na*db ± nb*da
# a ± b == -- ± -- == ------------- ==
# da db da*db
#
# na*(db//g) ± nb*(da//g) t
# == ----------------------- == -
# (da*db)//g d
#
# Now, if g > 1, we're working with smaller integers.
#
# Note, that t, (da//g) and (db//g) are pairwise coprime.
#
# Indeed, (da//g) and (db//g) share no common factors (they were
# removed) and da is coprime with na (since input fractions are
# normalized), hence (da//g) and na are coprime. By symmetry,
# (db//g) and nb are coprime too. Then,
#
# gcd(t, da//g) == gcd(na*(db//g), da//g) == 1
# gcd(t, db//g) == gcd(nb*(da//g), db//g) == 1
#
# Above allows us optimize reduction of the result to lowest
# terms. Indeed,
#
# g2 = gcd(t, d) == gcd(t, (da//g)*(db//g)*g) == gcd(t, g)
#
# t//g2 t//g2
# a ± b == ----------------------- == ----------------
# (da//g)*(db//g)*(g//g2) (da//g)*(db//g2)
#
# is a normalized fraction. This is useful because the unnormalized
# denominator d could be much larger than g.
#
# We should special-case g == 1 (and g2 == 1), since 60.8% of
# randomly-chosen integers are coprime:
# https://en.wikipedia.org/wiki/Coprime_integers#Probability_of_coprimality
# Note, that g2 == 1 always for fractions, obtained from floats: here
# g is a power of 2 and the unnormalized numerator t is an odd integer.
#
# 2) Consider multiplication
#
# Let g1 = gcd(na, db) and g2 = gcd(nb, da), then
#
# na*nb na*nb (na//g1)*(nb//g2)
# a*b == ----- == ----- == -----------------
# da*db db*da (db//g1)*(da//g2)
#
# Note, that after divisions we're multiplying smaller integers.
#
# Also, the resulting fraction is normalized, because each of
# two factors in the numerator is coprime to each of the two factors
# in the denominator.
#
# Indeed, pick (na//g1). It's coprime with (da//g2), because input
# fractions are normalized. It's also coprime with (db//g1), because
# common factors are removed by g1 == gcd(na, db).
#
# As for addition/subtraction, we should special-case g1 == 1
# and g2 == 1 for same reason. That happens also for multiplying
# rationals, obtained from floats.

def _add(a, b):
"""a + b"""
da, db = a.denominator, b.denominator
return Fraction(a.numerator * db + b.numerator * da,
da * db)
na, da = a.numerator, a.denominator
nb, db = b.numerator, b.denominator
g = math.gcd(da, db)
if g == 1:
return Fraction(na * db + da * nb, da * db, _normalize=False)
s = da // g
t = na * (db // g) + nb * s
g2 = math.gcd(t, g)
if g2 == 1:
return Fraction(t, s * db, _normalize=False)
return Fraction(t // g2, s * (db // g2), _normalize=False)

__add__, __radd__ = _operator_fallbacks(_add, operator.add)

def _sub(a, b):
"""a - b"""
da, db = a.denominator, b.denominator
return Fraction(a.numerator * db - b.numerator * da,
da * db)
na, da = a.numerator, a.denominator
nb, db = b.numerator, b.denominator
g = math.gcd(da, db)
if g == 1:
return Fraction(na * db - da * nb, da * db, _normalize=False)
s = da // g
t = na * (db // g) - nb * s
g2 = math.gcd(t, g)
if g2 == 1:
return Fraction(t, s * db, _normalize=False)
return Fraction(t // g2, s * (db // g2), _normalize=False)

__sub__, __rsub__ = _operator_fallbacks(_sub, operator.sub)

def _mul(a, b):
"""a * b"""
return Fraction(a.numerator * b.numerator, a.denominator * b.denominator)
na, da = a.numerator, a.denominator
nb, db = b.numerator, b.denominator
g1 = math.gcd(na, db)
if g1 > 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the "> 1" tests should be removed. The cost of a test is in the same ballpark as the cost of a division. The code is clearer if you just always divide. The performance gain mostly comes from two small gcd's being faster than a single big gcd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the "> 1" tests should be removed.

Maybe, I didn't benchmark this closely, but...

The cost of a test is in the same ballpark as the cost of a division.

This is a micro-optimization, yes. But ">" has not same cost as "//" even in the CPython:

$ ./python -m timeit -s 'a = 1' -s 'b = 1' 'a > b'
5000000 loops, best of 5: 82.3 nsec per loop
$ ./python -m timeit -s 'a = 111111111111111111111111111111111' -s 'b = 1' 'a > b'
5000000 loops, best of 5: 86.1 nsec per loop
$ ./python -m timeit -s 'a = 11111' -s 'b = 1' 'a // b'
2000000 loops, best of 5: 120 nsec per loop
$ ./python -m timeit -s 'a = 111111111111111111111111' -s 'b = 1' 'a // b'
1000000 loops, best of 5: 203 nsec per loop
$ ./python -m timeit -s 'a = 1111111111111111111111111111111111111111111111' -s 'b = 1' 'a // b'
1000000 loops, best of 5: 251 nsec per loop
$ ./python -m timeit -s 'a = 111111111111111111111111111111122222222222222111111111111111' \
 -s 'b = 1' 'a // b'
1000000 loops, best of 5: 294 nsec per loop

Maybe my measurements are primitive, but there is some pattern...

The code is clearer if you just always divide.

I tried to explain reasons for branching in the comment above code. Careful reader might ask: why you don't include same optimizations in the _mul as in _add/sub?

On another hand, gmplib doesn't use thise optimizations (c.f. same in mpz_add/sub). SymPy's PythonRational class - too.

So, I did my best in defending those branches and I'm fine with removing, if @tim-one has no arguments against.

The performance gain mostly comes from two small gcd's being faster than a single big gcd.

This is more important, yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cost of a test is in the same ballpark as the cost of a division.

? Comparing g to 1 takes, at worst, small constant time, independent of how many digits g has. But n // 1 takes time proportional to the number of internal digits in n - it's a slow, indirect way of making a physical copy of n. Since we expect g to be 1 most of the time, and these are unbounded ints, the case for checking g > 1 seems clear to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind either way.

If it were just me, I would leave out the '> 1' tests because:

  1. For small inputs, the code is about 10% faster without the guards.
  2. For random 100,000 bit numerators and denominators where the gcd's were equal to 1, my timings show no discernible benefit.
  3. The code and tests are simpler without the guards.

Copy link
Member

@tim-one tim-one Mar 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That just doesn't make much sense to me. On my box just now, with the released 3.9.1:

C:\Windows\System32>py -m timeit -s "g = 1" "g > 1"
10000000 loops, best of 5: 24.2 nsec per loop

C:\Windows\System32>py -m timeit -s "g = 1; t = 1 << 100000" "g > 1; t // g"
10000 loops, best of 5: 32.4 usec per loop

So doing the useless division not only drove the per-iteration measure from 24.2 to 32.4, the unit increased by a factor of 1000(!), from nanoseconds to microseconds. That's using, by construction, an exactly 100,001 bit numerator.

And, no, removing the test barely improves that relative disaster at all:

C:\Windows\System32>py -m timeit -s "g = 1; t = 1 << 100000" "t // g"
10000 loops, best of 5: 32.3 usec per loop

BTW, as expected, boost the number of numerator bits by a factor of 10 (to a million+1), and per-loop expense also becomes 10 times as much; cut it by a factor 10, and similarly the per-loop expense is also cut by a factor of 10.

With respect to the BPO message you linked to, it appeared atypical: both gcds were greater than 1: (10 / 3) * (6 / 5), so there was never any possibility of testing for > 1 paying off (while in real life a gcd of 1 is probably most common, and when multiplying Fractions obtained from floats gcd=1 is certain unless one of the inputs is an even integer (edited: used to say 0, but it's broader than just that)).

na //= g1
skirpichev marked this conversation as resolved.
Show resolved Hide resolved
db //= g1
g2 = math.gcd(nb, da)
if g2 > 1:
nb //= g2
da //= g2
return Fraction(na * nb, db * da, _normalize=False)

__mul__, __rmul__ = _operator_fallbacks(_mul, operator.mul)

def _div(a, b):
"""a / b"""
return Fraction(a.numerator * b.denominator,
a.denominator * b.numerator)
# Same as _mul(), with inversed b.
na, da = a.numerator, a.denominator
nb, db = b.numerator, b.denominator
g1 = math.gcd(na, nb)
if g1 > 1:
na //= g1
nb //= g1
g2 = math.gcd(db, da)
if g2 > 1:
da //= g2
db //= g2
n, d = na * db, nb * da
if d < 0:
n, d = -n, -d
skirpichev marked this conversation as resolved.
Show resolved Hide resolved
return Fraction(n, d, _normalize=False)

__truediv__, __rtruediv__ = _operator_fallbacks(_div, operator.truediv)

Expand Down
2 changes: 2 additions & 0 deletions Lib/test/test_fractions.py
Expand Up @@ -369,7 +369,9 @@ def testArithmetic(self):
self.assertEqual(F(1, 2), F(1, 10) + F(2, 5))
self.assertEqual(F(-3, 10), F(1, 10) - F(2, 5))
self.assertEqual(F(1, 25), F(1, 10) * F(2, 5))
self.assertEqual(F(5, 6), F(2, 3) * F(5, 4))
self.assertEqual(F(1, 4), F(1, 10) / F(2, 5))
self.assertEqual(F(-15, 8), F(3, 4) / F(-2, 5))
self.assertTypedEquals(2, F(9, 10) // F(2, 5))
self.assertTypedEquals(10**23, F(10**23, 1) // F(1))
self.assertEqual(F(5, 6), F(7, 3) % F(3, 2))
Expand Down
1 change: 1 addition & 0 deletions Misc/ACKS
Expand Up @@ -902,6 +902,7 @@ James King
W. Trevor King
Jeffrey Kintscher
Paul Kippes
Sergey B Kirpichev
Steve Kirsch
Sebastian Kirsche
Kamil Kisiel
Expand Down
@@ -0,0 +1,2 @@
Improve performance of class:`fractions.Fraction` arithmetics for large
components. Contributed by Sergey B. Kirpichev.