Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

math/big: better multiply primitives #9245

Open
griesemer opened this issue Dec 10, 2014 · 8 comments
Open

math/big: better multiply primitives #9245

griesemer opened this issue Dec 10, 2014 · 8 comments
Assignees
Milestone

Comments

@griesemer
Copy link
Contributor

@griesemer griesemer commented Dec 10, 2014

Suggestions from Torbjörn Granlund (personal e-mail):

"
The multiply primitives, in particular addMulVVW surely deserves more
attention:

Offset the pointers so that you can index with a counter register
which goes from -n to 0, saving the CMPQ.

Unroll. You can save most of the ADCQ $0, R that way. Basically,
do one run with just MULQ where you sum the old highpart (DX) with
the new lowpart (AX). You will need some MOVQ to move DX
out-of-the-way too. Then do a new run over these sums where you
bring in the memory addend. This should double the speed on some
newer CPUs.

A good addMulVVW is probably really the first thing to write in
assembly; addition and subtraction is much less important, usually.
"

@griesemer griesemer self-assigned this Dec 10, 2014
@rsc rsc added this to the Unplanned milestone Apr 10, 2015
@griesemer griesemer added this to the Go1.9Maybe milestone Feb 25, 2017
@griesemer griesemer removed this from the Unplanned milestone Feb 25, 2017
@griesemer griesemer added this to the Go1.10 milestone May 9, 2017
@griesemer griesemer removed this from the Go1.9Maybe milestone May 9, 2017
@griesemer griesemer removed this from the Go1.10 milestone Nov 3, 2017
@griesemer griesemer added this to the Go1.11 milestone Nov 3, 2017
@vielmetti
Copy link

@vielmetti vielmetti commented Nov 9, 2017

See also https://go-review.googlesource.com/c/go/+/76270 (for arm64) where it is reported:

The lack of proper addMulVVW implementation for arm64 hurts RSA
performance

This is an optimized implementation, it improves RSA2048 performance
by 10X to 15X on ARMv8 based server processors.

Loading

@odeke-em
Copy link
Member

@odeke-em odeke-em commented Mar 5, 2018

Loading

@griesemer
Copy link
Contributor Author

@griesemer griesemer commented May 24, 2018

Pushing to next release. There are some discussions about other math/bits primitive operations; maybe we can write some of this code in Go rather than assembly at some point.

Loading

@griesemer griesemer removed this from the Go1.11 milestone May 24, 2018
@griesemer griesemer added this to the Go1.12 milestone May 24, 2018
@griesemer griesemer removed this from the Go1.12 milestone Sep 17, 2018
@griesemer griesemer added this to the Unplanned milestone Sep 17, 2018
@andig
Copy link
Contributor

@andig andig commented Oct 13, 2021

Has this potentially been solved by https://go-review.googlesource.com/c/go/+/74851/ mentioned in #20058 (comment)?

Sorry if OT, I was researching around Go performance topics and stumbled here.

Loading

@griesemer
Copy link
Contributor Author

@griesemer griesemer commented Oct 13, 2021

I believe this was for ARMv8; there's more to do here. The Go team is pre-occupied with generics for 1.18, so this is unlikely to happen for 1.18 unless somebody else wants to step in, preferably with experience in performance-critical arithmetic routines.

Loading

@andig
Copy link
Contributor

@andig andig commented Oct 13, 2021

I believe this was for ARMv8; there's more to do here

CL mentioned is after issue was raised, fixes amd64 and has been merged ;)

Loading

@griesemer
Copy link
Contributor Author

@griesemer griesemer commented Oct 13, 2021

Indeed, I misread, my apologies. So what's left to do is porting this to other architectures?

Loading

@andig
Copy link
Contributor

@andig andig commented Oct 14, 2021

I read the issue as related to x86. addMulVVW as such already has special cases for amd64 and arm64 afaikt.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants