-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
math/big: GCD too slow #15833
Comments
go1.6.2 Same experience of me, math/big is very slow, I have not see the math/big's source code yet, so I try gccgo-4.9 for better performance, BUT it's even slower then gc, 3x slower then gc go build -compiler gc gcd.go go build -compiler gccgo -gccgoflags "-O3 -ffast-math -march=native" gcd.go At last I try https://github.com/ncw/gmp, It's a libgmp wrapper for go, this will be more faster than above |
I was taking a look at this as well as the code in misc/cgo and noticed that in the tree under CGO examples ie misc/cgo/gmp we have a wrapper for the gmp library https://github.com/golang/go/blob/master/misc/cgo/gmp/gmp.go#L365-L372. It was added in 2009, but am sure @ianlancetaylor and @griesemer know about it since y'all edited that file before https://github.com/golang/go/commits/master/misc/cgo/gmp/gmp.go. The code isn't directly a drop in replacement for --- cmd/gcd.go 2016-06-04 12:56:10.000000000 -0700
+++ orig.go 2016-06-04 12:56:59.000000000 -0700
@@ -2,24 +2,21 @@
import (
"fmt"
+ "math/big"
"time"
-
- big "github.com/odeke-em/gmp"
)
func main() {
- a := new(big.Int)
- _ = a.SetString("162446598402363997673493179860749206481570626148774722480464916217519998582106439959196344215073041683672767407342600465416394761370312812652387014077287533579464234592025270747189685651849855568219894257068017511394846018248946392555048664581847849090691446151175543365199139365693828091221296651564817601708173519333480541557916813321048094446738924484963396047875101501912007668905398696596051247666259795425384659975708544211774775625965457167714100319140254438470405084497675188274084057803534621424329554676703869536840619010332513253366351288042987043711316273802950111102675169897156927510722936205243934593761221732977646338069405205393572558504296440846844654915994056051080138684289188618799496532006547298248428910424360291035687113795912784248641162116970198942033380494094025012100218442198181604277428406480729238384656494840163546922286562841336237622223094660753989052636783736736730051815788481272143711008904458168016828251588196506829619252076432097202449298271511577883563293048604787739890118213783329525568780738050849374005547963863874473115226603810373230563816359760094564252571054145767564049725284205901365726049711717022372349529374435379628254191205730293550748181543891163723281674777750059512413586588828314084535671582111735660379159552799083036587458583555292293962217151577840489845885188592504493431102401049590622433757873291425858434918697762180295687147582650879514071801716939533681208102052410202696047460966976913845601990028270324422645505289049580017114480330931133826162290356916061574256540132190171694393638181573510025576808968516762515657212669911734833269039513284253725561578543014863535935809511150296456155470508927211411840144748006797665686217130525378029137093695488159569875226988769985175538075995372140625158000424680862039472332990803861888330870444263274715912499029641167624951512824402013661747715825752288086558167740183961939884790195081886224217177152869350897055478137773963535642498587936119572587325769768711699791252679336168006908091872154677467014758193802812900921873466740528890306650385439501239427958166273861080306246501556064395044538359837700353874621248486133241226805310121486599455004347652284602607212474583483132546550311381260917418037865509077618272877792484328607271857806356", 10)
- b := new(big.Int)
- _ = b.SetString("101784352192926179474691847690295364412146644944337147893018137607751766917729750758328166579925569899765459590972517709626629422864683438193290125043063518436017102066763671846630815217722251657259307553099132772201209113951039569557515190691205890052686828351751884803712581582093817758343808197167567247907540122605321423275330224580551817963867742145560163431380567362003538753856228060374866951036259004102917220472306669995666505172223628113212045865085784184336530203837908112187832616046195538504845768119298751526550607442806456808908879070037709190448307774015887964660855587392536795762810554496174847323701119347378258018659963334626467144614011455875192800289417692534392258372638105441472555031300945385673635743106163778851484848839379940456113937207911595141513680574367846648636353220309193154584372103496857062568070324916862436127133316127989083231670428229702525886963156897871633715273896484904539939624236288247009844489720556858425695244949730267384132132441488654009449316215449359564589915898961035516198007811431244695623908883885142048155515509053435754497981693091320819719426795331186500179253864761247316755487225443097006915430516922716275267385443349373289960634713632572446080364203373228601555282278183545440347991778185107399327758589346260248433939637585528206559787605105051710971698307363677567320558959899872756874248500394667863120050131533976493034552093698090232503340912301095656404952421422358796326151351099379863658968843482129929673487863396575147218726229000469299309801093105964119662371898258712567149943154476612246157910925561894764017985426180288097588055722886924974722109733733123413154279708050676188619796030777587411061473509487256419260726999082015872941209426850601596056518119145537343224544999826199864836904118369929404501365516712854950664599501540383745231341313574626357265253728062815992109070862862165578453391683605892693204046869207638687457235701255574852208310767024525571635865352089126234994777689748203394347241246694312344558416290642852901684770713683262440625612790798884009774609112528636374285113491984040948539722165924050855717036844286332805554836911465419002994297397099377669991903562736772713728306241921475535135050281780162431545636315231851527333384539192469227100575974540", 10)
+
+ a, _ := new(big.Int).SetString("162446598402363997673493179860749206481570626148774722480464916217519998582106439959196344215073041683672767407342600465416394761370312812652387014077287533579464234592025270747189685651849855568219894257068017511394846018248946392555048664581847849090691446151175543365199139365693828091221296651564817601708173519333480541557916813321048094446738924484963396047875101501912007668905398696596051247666259795425384659975708544211774775625965457167714100319140254438470405084497675188274084057803534621424329554676703869536840619010332513253366351288042987043711316273802950111102675169897156927510722936205243934593761221732977646338069405205393572558504296440846844654915994056051080138684289188618799496532006547298248428910424360291035687113795912784248641162116970198942033380494094025012100218442198181604277428406480729238384656494840163546922286562841336237622223094660753989052636783736736730051815788481272143711008904458168016828251588196506829619252076432097202449298271511577883563293048604787739890118213783329525568780738050849374005547963863874473115226603810373230563816359760094564252571054145767564049725284205901365726049711717022372349529374435379628254191205730293550748181543891163723281674777750059512413586588828314084535671582111735660379159552799083036587458583555292293962217151577840489845885188592504493431102401049590622433757873291425858434918697762180295687147582650879514071801716939533681208102052410202696047460966976913845601990028270324422645505289049580017114480330931133826162290356916061574256540132190171694393638181573510025576808968516762515657212669911734833269039513284253725561578543014863535935809511150296456155470508927211411840144748006797665686217130525378029137093695488159569875226988769985175538075995372140625158000424680862039472332990803861888330870444263274715912499029641167624951512824402013661747715825752288086558167740183961939884790195081886224217177152869350897055478137773963535642498587936119572587325769768711699791252679336168006908091872154677467014758193802812900921873466740528890306650385439501239427958166273861080306246501556064395044538359837700353874621248486133241226805310121486599455004347652284602607212474583483132546550311381260917418037865509077618272877792484328607271857806356", 10)
+ b, _ := new(big.Int).SetString("101784352192926179474691847690295364412146644944337147893018137607751766917729750758328166579925569899765459590972517709626629422864683438193290125043063518436017102066763671846630815217722251657259307553099132772201209113951039569557515190691205890052686828351751884803712581582093817758343808197167567247907540122605321423275330224580551817963867742145560163431380567362003538753856228060374866951036259004102917220472306669995666505172223628113212045865085784184336530203837908112187832616046195538504845768119298751526550607442806456808908879070037709190448307774015887964660855587392536795762810554496174847323701119347378258018659963334626467144614011455875192800289417692534392258372638105441472555031300945385673635743106163778851484848839379940456113937207911595141513680574367846648636353220309193154584372103496857062568070324916862436127133316127989083231670428229702525886963156897871633715273896484904539939624236288247009844489720556858425695244949730267384132132441488654009449316215449359564589915898961035516198007811431244695623908883885142048155515509053435754497981693091320819719426795331186500179253864761247316755487225443097006915430516922716275267385443349373289960634713632572446080364203373228601555282278183545440347991778185107399327758589346260248433939637585528206559787605105051710971698307363677567320558959899872756874248500394667863120050131533976493034552093698090232503340912301095656404952421422358796326151351099379863658968843482129929673487863396575147218726229000469299309801093105964119662371898258712567149943154476612246157910925561894764017985426180288097588055722886924974722109733733123413154279708050676188619796030777587411061473509487256419260726999082015872941209426850601596056518119145537343224544999826199864836904118369929404501365516712854950664599501540383745231341313574626357265253728062815992109070862862165578453391683605892693204046869207638687457235701255574852208310767024525571635865352089126234994777689748203394347241246694312344558416290642852901684770713683262440625612790798884009774609112528636374285113491984040948539722165924050855717036844286332805554836911465419002994297397099377669991903562736772713728306241921475535135050281780162431545636315231851527333384539192469227100575974540", 10)
t := time.Now()
for i := 0; i < 10000; i++ {
- divisor := new(big.Int)
- x, y := new(big.Int), new(big.Int)
- big.GcdInt(divisor, x, y, a, b)
+ new(big.Int).GCD(nil, nil, a, b)
}
fmt.Println(time.Since(t))
-}
+
+} $ go run gcd.go
1.347481406s |
CL https://golang.org/cl/50530 mentions this issue. |
CL above is a minor optimization for the extended euclidean algorithm of only calculating one set of cofactors inside the division loop. On my machine it speeds up the extended GCD about 20% and should improve the ModInverse more since only only coefficient is needed in that case. Since the euclidean algorithm is the base case for Lehmer's algorithm, this should still prove useful when Lehmer's algorithm is implemented. |
The current implementation of the extended Euclidean GCD algorithm calculates both cosequences x and y inside the division loop. This is unneccessary since the second Bezout coefficient can be obtained at the end of calculation via a multiplication, subtraction and a division. In case only one coefficient is needed, e.g. ModInverse this calculation can be skipped entirely. This is a standard optimization, see e.g. "Handbook of Elliptic and Hyperelliptic Curve Cryptography" Cohen et al pp 191 Available at: http://cs.ucsb.edu/~koc/ccs130h/2013/EllipticHyperelliptic-CohenFrey.pdf Updates #15833 Change-Id: I1e0d2e63567cfed97fd955048fe6373d36f22757 Reviewed-on: https://go-review.googlesource.com/50530 Reviewed-by: Robert Griesemer <gri@golang.org>
@griesemer I had a go at implementing a basic version of Lehmer's algorithm and got the following results for the problem benchmark above on my machine. Current binary GCD code: Lehmer's algorithm: For around a 7x speed up, but it is still about a factor of two slower than the gmp wrapper 1.010165176s It should be straightforward to use Lehmer's algorithm on the extended GCD and speed that up as well. |
Change https://golang.org/cl/59450 mentions this issue: |
Updates #15833 Lehmer's GCD algorithm uses single precision calculations to simulate several steps of multiple precision calculations in Euclid's GCD algorithm which leads to a considerable speed up. This implementation uses Collins' simplified testing condition on the single digit cosequences which requires only one quotient and avoids any possibility of overflow. name old time/op new time/op delta GCD10x10/WithoutXY-4 1.82µs ±24% 0.28µs ± 6% -84.40% (p=0.008 n=5+5) GCD10x10/WithXY-4 1.69µs ± 6% 1.71µs ± 6% ~ (p=0.595 n=5+5) GCD10x100/WithoutXY-4 1.87µs ± 2% 0.56µs ± 4% -70.13% (p=0.008 n=5+5) GCD10x100/WithXY-4 2.61µs ± 2% 2.65µs ± 4% ~ (p=0.635 n=5+5) GCD10x1000/WithoutXY-4 2.75µs ± 2% 1.48µs ± 1% -46.06% (p=0.008 n=5+5) GCD10x1000/WithXY-4 5.29µs ± 2% 5.25µs ± 2% ~ (p=0.548 n=5+5) GCD10x10000/WithoutXY-4 10.7µs ± 2% 10.3µs ± 0% -4.38% (p=0.008 n=5+5) GCD10x10000/WithXY-4 22.3µs ± 6% 22.1µs ± 1% ~ (p=1.000 n=5+5) GCD10x100000/WithoutXY-4 93.7µs ± 2% 99.4µs ± 2% +6.09% (p=0.008 n=5+5) GCD10x100000/WithXY-4 196µs ± 2% 199µs ± 2% ~ (p=0.222 n=5+5) GCD100x100/WithoutXY-4 10.1µs ± 2% 2.5µs ± 2% -74.84% (p=0.008 n=5+5) GCD100x100/WithXY-4 21.4µs ± 2% 21.3µs ± 7% ~ (p=0.548 n=5+5) GCD100x1000/WithoutXY-4 11.3µs ± 2% 4.4µs ± 4% -60.87% (p=0.008 n=5+5) GCD100x1000/WithXY-4 24.7µs ± 3% 23.9µs ± 1% ~ (p=0.056 n=5+5) GCD100x10000/WithoutXY-4 26.6µs ± 1% 20.0µs ± 2% -24.82% (p=0.008 n=5+5) GCD100x10000/WithXY-4 78.7µs ± 2% 78.2µs ± 2% ~ (p=0.690 n=5+5) GCD100x100000/WithoutXY-4 174µs ± 2% 171µs ± 1% ~ (p=0.056 n=5+5) GCD100x100000/WithXY-4 563µs ± 4% 561µs ± 2% ~ (p=1.000 n=5+5) GCD1000x1000/WithoutXY-4 120µs ± 5% 29µs ± 3% -75.71% (p=0.008 n=5+5) GCD1000x1000/WithXY-4 355µs ± 4% 358µs ± 2% ~ (p=0.841 n=5+5) GCD1000x10000/WithoutXY-4 140µs ± 2% 49µs ± 2% -65.07% (p=0.008 n=5+5) GCD1000x10000/WithXY-4 626µs ± 3% 628µs ± 9% ~ (p=0.690 n=5+5) GCD1000x100000/WithoutXY-4 340µs ± 4% 259µs ± 6% -23.79% (p=0.008 n=5+5) GCD1000x100000/WithXY-4 3.76ms ± 4% 3.82ms ± 5% ~ (p=0.310 n=5+5) GCD10000x10000/WithoutXY-4 3.11ms ± 3% 0.54ms ± 2% -82.74% (p=0.008 n=5+5) GCD10000x10000/WithXY-4 7.96ms ± 3% 7.69ms ± 3% ~ (p=0.151 n=5+5) GCD10000x100000/WithoutXY-4 3.88ms ± 1% 1.27ms ± 2% -67.21% (p=0.008 n=5+5) GCD10000x100000/WithXY-4 38.1ms ± 2% 38.8ms ± 1% ~ (p=0.095 n=5+5) GCD100000x100000/WithoutXY-4 208ms ± 1% 25ms ± 4% -88.07% (p=0.008 n=5+5) GCD100000x100000/WithXY-4 533ms ± 5% 525ms ± 4% ~ (p=0.548 n=5+5) Change-Id: Ic1e007eb807b93e75f4752e968e98c1f0cb90e43 Reviewed-on: https://go-review.googlesource.com/59450 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
I saw your discussion on Gerrit about when is it worth to switch to sub-quadratic algorithm w.r.t. to number of words. This is how GMP goes it, thought you might find it interesting: @bmkessler thanks a lot for the implementation! |
Change https://golang.org/cl/78755 mentions this issue: |
Updates #15833 The extended GCD algorithm can be implemented using Lehmer's algorithm with additional updates for the cosequences following Algorithm 10.45 from Cohen et al. "Handbook of Elliptic and Hyperelliptic Curve Cryptography" pp 192. This brings the speed of the extended GCD calculation within ~2x of the base GCD calculation. There is a slight degradation in the non-extended GCD speed for small inputs (1-2 words) due to the additional code to handle the extended updates. name old time/op new time/op delta GCD10x10/WithoutXY-4 262ns ± 1% 266ns ± 2% ~ (p=0.333 n=5+5) GCD10x10/WithXY-4 1.42µs ± 2% 0.74µs ± 3% -47.90% (p=0.008 n=5+5) GCD10x100/WithoutXY-4 520ns ± 2% 539ns ± 1% +3.81% (p=0.008 n=5+5) GCD10x100/WithXY-4 2.32µs ± 1% 1.67µs ± 0% -27.80% (p=0.008 n=5+5) GCD10x1000/WithoutXY-4 1.40µs ± 1% 1.45µs ± 2% +3.26% (p=0.016 n=4+5) GCD10x1000/WithXY-4 4.78µs ± 1% 3.43µs ± 1% -28.37% (p=0.008 n=5+5) GCD10x10000/WithoutXY-4 10.0µs ± 0% 10.2µs ± 3% +1.80% (p=0.008 n=5+5) GCD10x10000/WithXY-4 20.9µs ± 3% 17.9µs ± 1% -14.20% (p=0.008 n=5+5) GCD10x100000/WithoutXY-4 96.8µs ± 0% 96.3µs ± 1% ~ (p=0.310 n=5+5) GCD10x100000/WithXY-4 196µs ± 3% 159µs ± 2% -18.61% (p=0.008 n=5+5) GCD100x100/WithoutXY-4 2.53µs ±15% 2.34µs ± 0% -7.35% (p=0.008 n=5+5) GCD100x100/WithXY-4 19.3µs ± 0% 3.9µs ± 1% -79.58% (p=0.008 n=5+5) GCD100x1000/WithoutXY-4 4.23µs ± 0% 4.17µs ± 3% ~ (p=0.127 n=5+5) GCD100x1000/WithXY-4 22.8µs ± 1% 7.5µs ±10% -67.00% (p=0.008 n=5+5) GCD100x10000/WithoutXY-4 19.1µs ± 0% 19.0µs ± 0% ~ (p=0.095 n=5+5) GCD100x10000/WithXY-4 75.1µs ± 2% 30.5µs ± 2% -59.38% (p=0.008 n=5+5) GCD100x100000/WithoutXY-4 170µs ± 5% 167µs ± 1% ~ (p=1.000 n=5+5) GCD100x100000/WithXY-4 542µs ± 2% 267µs ± 2% -50.79% (p=0.008 n=5+5) GCD1000x1000/WithoutXY-4 28.0µs ± 0% 27.1µs ± 0% -3.29% (p=0.008 n=5+5) GCD1000x1000/WithXY-4 329µs ± 0% 42µs ± 1% -87.12% (p=0.008 n=5+5) GCD1000x10000/WithoutXY-4 47.2µs ± 0% 46.4µs ± 0% -1.65% (p=0.016 n=5+4) GCD1000x10000/WithXY-4 607µs ± 9% 123µs ± 1% -79.70% (p=0.008 n=5+5) GCD1000x100000/WithoutXY-4 260µs ±17% 245µs ± 0% ~ (p=0.056 n=5+5) GCD1000x100000/WithXY-4 3.64ms ± 1% 0.93ms ± 1% -74.41% (p=0.016 n=4+5) GCD10000x10000/WithoutXY-4 513µs ± 0% 507µs ± 0% -1.22% (p=0.008 n=5+5) GCD10000x10000/WithXY-4 7.44ms ± 1% 1.00ms ± 0% -86.58% (p=0.008 n=5+5) GCD10000x100000/WithoutXY-4 1.23ms ± 0% 1.23ms ± 1% ~ (p=0.056 n=5+5) GCD10000x100000/WithXY-4 37.3ms ± 0% 7.3ms ± 1% -80.45% (p=0.008 n=5+5) GCD100000x100000/WithoutXY-4 24.2ms ± 0% 24.2ms ± 0% ~ (p=0.841 n=5+5) GCD100000x100000/WithXY-4 505ms ± 1% 56ms ± 1% -88.92% (p=0.008 n=5+5) Change-Id: I25f42ab8c55033acb83cc32bb03c12c1963925e8 Reviewed-on: https://go-review.googlesource.com/78755 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
An update: using Go 1.11.1 it now completes here in 1.74s (down from 9.11s) |
@TuomLarsen I am going to close this. If you feel there's need for more improvement, feel free to re-open. Thanks. |
@griesemer Well, it still almost 50% slower than Python 3, which does not use any assembler subroutines. But then it is also more than 5x faster than it used to be so I'm not complaining. On the contrary so thank you @bmkessler very much for the improvement! |
why is it slower than python? What does python do to be faster. Is it implemented in C? |
@TuomLarsen, @pjebs I'm guessing here but I suspect Python uses GMP underneath (which is highly optimized and does use assembly cores, like we do). There's definitively more work that we could do on the Go side (e.g. faster division - there's a pending CL I believe). We have open issues for more improvements. In short, while our code is actually written entirely in Go, with some assembly support at the bottom, the Python code is very likely just a thin wrapper around GMP. |
@griesemer Python uses their own C-only implementation of Lehmer's algorithm (the one also implemented in Go). The relevant parts in Go are written in assembly but, and now I'm guessing, the small things accumulated: not inlining assembler, passing arguments via memory and not registers, not distinguishing GCD and extended GCD, function calls apparently being slower in Go than in C, etc. I'm not complaining, just trying to come up with an explanation. |
go version
)?go1.6.1
go env
)?linux/amd64
Recently I found out that the bottleneck in my program is the computation of greatest common divisor. CPython 3.5 uses Lehmer's algorithm and runs almost 8x faster. (For very large integers there are even faster algorithms.)
(play.golang.org: "process took too long")
Following Python program runs under CPython 3.5 in 1.18s
The above go program runs on my machine 9.11s
The text was updated successfully, but these errors were encountered: