-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Open
Labels
FeatureRequestIssues asking for a new feature that does not need a proposal.Issues asking for a new feature that does not need a proposal.ImplementationIssues describing a semantics-preserving change to the Go implementation.Issues describing a semantics-preserving change to the Go implementation.NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.arch-arm64compiler/runtimeIssues related to the Go compiler and/or runtime.Issues related to the Go compiler and/or runtime.
Milestone
Description
Consider the following Montgomery Reduction implementation:
func MontReduce(v uint64) uint32 {
const (
R = 1 << 32
qInv = 2164260865
)
m := uint32(v) * qInv
t, borrow := bits.Sub64(v, uint64(m)*q, 0)
res := uint32(t / R)
if borrow != 0 {
res += q
}
return res
}Using objdump I get the following assembly (the comments are my own):
MOVD $-65535, R1 // low part of qInv
MOVK $(33024<<16), R1 // high part of qInv. Now R1 = qInv
MULW R1, R0, R1 // R0 = v, R1 = m = v * qInv
MOVD $1, R2
MOVK $(32512<<16), R2 // R2 = q
MUL R2, R1, R1 // R1 = m*q
SUBS R1, R0, R1 // R1 = t = v - m*q
NGC ZR, R3 // R3 = -borrow
NEG R3, R3 // R3 = borrow
LSR $32, R1, R1 // R1 = res = t / R
ADD R2, R1, R2 // R2 = res + q
CMP $0, R3
CSEL NE, R2, R1, R0 // if borrow ! = 0 etc
RET
However, the following simpler code also works:
TEXT ·MontReduce(SB), NOSPLIT, $16-8
MOVD v+0(FP), R0 // R0 = v
MOVD $1, R2 // low part of qInv
MOVK $(33024<<16), R2 // high part of qInv. Now R2 = qInv
MULW R2, R0, R1 // R0 = v, R1 = m = v * qInv
MOVK $(32512<<16), R2 // R2 = q
MUL R2, R1, R1 // R1 = m*q
SUBS R1, R0, R1 // R1 = t = v - m*q
LSR $32, R1, R1 // R1 = res = t / R
ADD R2, R1, R2 // R2 = res + q
CSEL CS, R1, R2, R0
MOVD R0, ret+8(FP)
RET
In the generated code, we're using an extra instruction to make sure a nonzero borrow value would be equal to 1, not -1, even though the condition only checks if it is zero, and is thus unaffected by the NEG.
Furthermore, since none of the instructions after the SUBS modify the carry flag, it doesn't have to explicitly reside in a register at all. One can directly use it in CSEL and remove the CMP (saving an addition equivalent.)
I don't know if something like this is easy enough to patch to be worth the modest perf gains. Just thought it was interesting. :)
Metadata
Metadata
Assignees
Labels
FeatureRequestIssues asking for a new feature that does not need a proposal.Issues asking for a new feature that does not need a proposal.ImplementationIssues describing a semantics-preserving change to the Go implementation.Issues describing a semantics-preserving change to the Go implementation.NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.arch-arm64compiler/runtimeIssues related to the Go compiler and/or runtime.Issues related to the Go compiler and/or runtime.
Type
Projects
Status
Todo