Implement Stein's algorithm for gcd #15

Emerentius · 2018-01-07T01:10:24Z

This implements Stein's algorithm for bigints.
Asymptotically this has the same runtime complexity as the euclidean algorithm but it's faster because it avoids division in favor of bitshifts and subtractions.
There are faster algorithms for large bigints. For small ones, gmp uses the binary gcd too.

I've run some benchmarks with the code in this repo
This iterates through the sizes of 1-10 BigDigits and generates 300 uniformly distributed random bigints at each size and computes the gcd for each combination with both Euclid's and Stein's algorithm. I'm only looking at combinations of numbers with the same number of BigDigits

The speed gains are sizeable. See the benchmark results below. I'm running this on an ultrabook with a 15W CPU (i5 4210u). Performance may differ on different architectures, in particular if there is no intrinsic for counting trailing zeroes.

Please run the benchmark on your machine. It's just a simple

git clone https://github.com/Emerentius/bigint_gcd_bench
cargo run --release

2^32n bits	euclidean gcd	binary gcd	speedup
n:  1 =>	0.3050s		0.0728s		4.19
n:  2 =>	0.6228s		0.1453s		4.29
n:  3 =>	0.9618s		0.2214s		4.34
n:  4 =>	1.3021s		0.3028s		4.30
n:  5 =>	1.6469s		0.3875s		4.25
n:  6 =>	2.0017s		0.4759s		4.21
n:  7 =>	2.3636s		0.5667s		4.17
n:  8 =>	2.7284s		0.6418s		4.25
n:  9 =>	3.0712s		0.7302s		4.21
n: 10 =>	3.4822s		0.8223s		4.23

The guys at gmp say these algorithms are quadratic in N, I'm not sure why they seem almost linear here.

Emerentius · 2018-01-07T01:22:06Z

This patch also adds a private trailing_zeros() function to BigUint. Should this be made public? It can be useful for performance optimizations because it tells you the multiplicity for the prime factor 2 very efficiently.

count_ones() might also be useful, leading_zeros() and count_zeros() seem a bit strange for a bignum where you could imagine an inifinite number of preceding zeros.

cuviper · 2018-01-07T03:03:06Z

Ah, yes, num-integer already uses Stein's for the primitive integers, so this sounds good to me.

FWIW, I have trailing_zeros in #8 too. You might want to compare the performance between our approaches. (But that might not be a bottleneck anyway.)

Emerentius · 2018-01-07T03:32:03Z

Without testing, I'm pretty sure that your implementation is faster or can be made to be faster. If you switch out that .enumerate() for a (0..).step_by(big_digit::BITS) (or equivalent code) you can skip the multiplication and you'll be doing strictly less work.

Probably worth it for cpus without ctz anyway.

cuviper · 2018-01-07T03:56:28Z

Multiplying by BITS should get optimized as a simple left shift, maybe even LEA on x86. Almost surely not a performance target, anyway.

Emerentius · 2018-01-07T04:06:47Z

I meant that without the multiplication our two versions do identical work except that my code also does a superfluoustrailing_zeros() call at every step.

the methods are implemented on the types directly since rust 1.23 the trait's still needed for backwards compatibility

same asymptotic complexity as euclidean but faster thanks to bitshifts and subtractions rather than division

cuviper · 2018-02-08T06:27:25Z

I added an optimization to shr that eliminated most of the allocation overhead.

Thanks for the PR!

bors r+

15: Implement Stein's algorithm for gcd r=cuviper a=Emerentius This implements Stein's algorithm for bigints. Asymptotically this has the same runtime complexity as the euclidean algorithm but it's faster because it avoids division in favor of bitshifts and subtractions. There are faster algorithms for large bigints. For small ones, [gmp uses the binary gcd too](https://gmplib.org/manual/Binary-GCD.html). I've run some benchmarks with the code in [this repo](https://github.com/Emerentius/bigint_gcd_bench) This iterates through the sizes of 1-10 `BigDigit`s and generates 300 uniformly distributed random bigints at each size and computes the gcd for each combination with both Euclid's and Stein's algorithm. I'm only looking at combinations of numbers with the same number of `BigDigit`s The speed gains are sizeable. See the benchmark results below. I'm running this on an ultrabook with a 15W CPU (i5 4210u). Performance may differ on different architectures, in particular if there is no intrinsic for counting trailing zeroes. Please run the benchmark on your machine. It's just a simple ``` git clone https://github.com/Emerentius/bigint_gcd_bench cargo run --release ``` ``` 2^32n bits euclidean gcd binary gcd speedup n: 1 => 0.3050s 0.0728s 4.19 n: 2 => 0.6228s 0.1453s 4.29 n: 3 => 0.9618s 0.2214s 4.34 n: 4 => 1.3021s 0.3028s 4.30 n: 5 => 1.6469s 0.3875s 4.25 n: 6 => 2.0017s 0.4759s 4.21 n: 7 => 2.3636s 0.5667s 4.17 n: 8 => 2.7284s 0.6418s 4.25 n: 9 => 3.0712s 0.7302s 4.21 n: 10 => 3.4822s 0.8223s 4.23 ``` The guys at gmp say these algorithms are quadratic in N, I'm not sure why they seem almost linear here.

bors · 2018-02-08T06:40:02Z

Build succeeded

continuous-integration/travis-ci/push

Emerentius and others added 7 commits February 7, 2018 21:38

allow_unused AsciiExt

b2ff86b

the methods are implemented on the types directly since rust 1.23 the trait's still needed for backwards compatibility

implement Stein's algorithm for gcd

ba54f17

same asymptotic complexity as euclidean but faster thanks to bitshifts and subtractions rather than division

add a gcd benchmark

8893408

Remove unnecessary core import

a46f422

use an iterator for trailing_zeros

f5c0546

small gcd cleanups

4714f82

reduce allocations in shr

e45b2b7

cuviper force-pushed the master branch from 9cd1443 to e45b2b7 Compare February 8, 2018 06:23

bors bot merged commit e45b2b7 into rust-num:master Feb 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Stein's algorithm for gcd #15

Implement Stein's algorithm for gcd #15

Emerentius commented Jan 7, 2018

Emerentius commented Jan 7, 2018 •

edited

cuviper commented Jan 7, 2018

Emerentius commented Jan 7, 2018 •

edited

cuviper commented Jan 7, 2018

Emerentius commented Jan 7, 2018

cuviper commented Feb 8, 2018

bors bot commented Feb 8, 2018

Implement Stein's algorithm for gcd #15

Implement Stein's algorithm for gcd #15

Conversation

Emerentius commented Jan 7, 2018

Emerentius commented Jan 7, 2018 • edited

cuviper commented Jan 7, 2018

Emerentius commented Jan 7, 2018 • edited

cuviper commented Jan 7, 2018

Emerentius commented Jan 7, 2018

cuviper commented Feb 8, 2018

bors bot commented Feb 8, 2018

Build succeeded

Emerentius commented Jan 7, 2018 •

edited

Emerentius commented Jan 7, 2018 •

edited