Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Stein's algorithm for gcd #15

Merged
merged 7 commits into from Feb 8, 2018
Merged

Conversation

Emerentius
Copy link
Contributor

This implements Stein's algorithm for bigints.
Asymptotically this has the same runtime complexity as the euclidean algorithm but it's faster because it avoids division in favor of bitshifts and subtractions.
There are faster algorithms for large bigints. For small ones, gmp uses the binary gcd too.

I've run some benchmarks with the code in this repo
This iterates through the sizes of 1-10 BigDigits and generates 300 uniformly distributed random bigints at each size and computes the gcd for each combination with both Euclid's and Stein's algorithm. I'm only looking at combinations of numbers with the same number of BigDigits

The speed gains are sizeable. See the benchmark results below. I'm running this on an ultrabook with a 15W CPU (i5 4210u). Performance may differ on different architectures, in particular if there is no intrinsic for counting trailing zeroes.

Please run the benchmark on your machine. It's just a simple

git clone https://github.com/Emerentius/bigint_gcd_bench
cargo run --release
2^32n bits	euclidean gcd	binary gcd	speedup
n:  1 =>	0.3050s		0.0728s		4.19
n:  2 =>	0.6228s		0.1453s		4.29
n:  3 =>	0.9618s		0.2214s		4.34
n:  4 =>	1.3021s		0.3028s		4.30
n:  5 =>	1.6469s		0.3875s		4.25
n:  6 =>	2.0017s		0.4759s		4.21
n:  7 =>	2.3636s		0.5667s		4.17
n:  8 =>	2.7284s		0.6418s		4.25
n:  9 =>	3.0712s		0.7302s		4.21
n: 10 =>	3.4822s		0.8223s		4.23

The guys at gmp say these algorithms are quadratic in N, I'm not sure why they seem almost linear here.

@Emerentius
Copy link
Contributor Author

Emerentius commented Jan 7, 2018

This patch also adds a private trailing_zeros() function to BigUint. Should this be made public? It can be useful for performance optimizations because it tells you the multiplicity for the prime factor 2 very efficiently.

count_ones() might also be useful, leading_zeros() and count_zeros() seem a bit strange for a bignum where you could imagine an inifinite number of preceding zeros.

@cuviper
Copy link
Member

cuviper commented Jan 7, 2018

Ah, yes, num-integer already uses Stein's for the primitive integers, so this sounds good to me.

FWIW, I have trailing_zeros in #8 too. You might want to compare the performance between our approaches. (But that might not be a bottleneck anyway.)

@Emerentius
Copy link
Contributor Author

Emerentius commented Jan 7, 2018

Without testing, I'm pretty sure that your implementation is faster or can be made to be faster. If you switch out that .enumerate() for a (0..).step_by(big_digit::BITS) (or equivalent code) you can skip the multiplication and you'll be doing strictly less work.

Probably worth it for cpus without ctz anyway.

@cuviper
Copy link
Member

cuviper commented Jan 7, 2018

Multiplying by BITS should get optimized as a simple left shift, maybe even LEA on x86. Almost surely not a performance target, anyway.

@Emerentius
Copy link
Contributor Author

I meant that without the multiplication our two versions do identical work except that my code also does a superfluoustrailing_zeros() call at every step.

Emerentius and others added 7 commits February 7, 2018 21:38
the methods are implemented on the types directly since
rust 1.23
the trait's still needed for backwards compatibility
same asymptotic complexity as euclidean but faster
thanks to bitshifts and subtractions rather than division
@cuviper
Copy link
Member

cuviper commented Feb 8, 2018

I added an optimization to shr that eliminated most of the allocation overhead.

Thanks for the PR!

bors r+

bors bot added a commit that referenced this pull request Feb 8, 2018
15: Implement Stein's algorithm for gcd r=cuviper a=Emerentius

This implements Stein's algorithm for bigints.
Asymptotically this has the same runtime complexity as the euclidean algorithm but it's faster because it avoids division in favor of bitshifts and subtractions.
There are faster algorithms for large bigints. For small ones, [gmp uses the binary gcd too](https://gmplib.org/manual/Binary-GCD.html).

I've run some benchmarks with the code in [this repo](https://github.com/Emerentius/bigint_gcd_bench)
This iterates through the sizes of 1-10 `BigDigit`s and generates 300 uniformly distributed random bigints at each size and computes the gcd for each combination with both Euclid's and Stein's algorithm. I'm only looking at combinations of numbers with the same number of `BigDigit`s

The speed gains are sizeable. See the benchmark results below. I'm running this on an ultrabook with a 15W CPU (i5 4210u). Performance may differ on different architectures, in particular if there is no intrinsic for counting trailing zeroes.

Please run the benchmark on your machine. It's just a simple
```
git clone https://github.com/Emerentius/bigint_gcd_bench
cargo run --release
```

```
2^32n bits	euclidean gcd	binary gcd	speedup
n:  1 =>	0.3050s		0.0728s		4.19
n:  2 =>	0.6228s		0.1453s		4.29
n:  3 =>	0.9618s		0.2214s		4.34
n:  4 =>	1.3021s		0.3028s		4.30
n:  5 =>	1.6469s		0.3875s		4.25
n:  6 =>	2.0017s		0.4759s		4.21
n:  7 =>	2.3636s		0.5667s		4.17
n:  8 =>	2.7284s		0.6418s		4.25
n:  9 =>	3.0712s		0.7302s		4.21
n: 10 =>	3.4822s		0.8223s		4.23
```

The guys at gmp say these algorithms are quadratic in N, I'm not sure why they seem almost linear here.
@bors
Copy link
Contributor

bors bot commented Feb 8, 2018

Build succeeded

@bors bors bot merged commit e45b2b7 into rust-num:master Feb 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants