Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

> Am I missing any? Does that sound like a good categorization? #47664

Closed
Yawning opened this issue Aug 12, 2021 · 1 comment
Closed

> Am I missing any? Does that sound like a good categorization? #47664

Yawning opened this issue Aug 12, 2021 · 1 comment

Comments

@Yawning
Copy link

Yawning commented Aug 12, 2021

Am I missing any? Does that sound like a good categorization?

It maybe worth revisiting some of these in the medium to long-term, since the performance penalty for using fiat can be reduced.

As some of the participants in this issue know, I have been looking into fiat's Go performance recently. As of right now, there are a number of changes that can be made to fiat-crypto's 64-bit Go output that will provide performance increases for "real-world" use-cases.

  • Removing the addcarryxU64/subcarryxU64 wrappers dramatically improves CarryMul/CarrySquare performance (Go code cosmetic fixes mit-plv/fiat-crypto#949).
  • Selectznz is really really slow, compared to how edwards25519 does it.
  • Working around the compiler (aka "At most, people must copy and paste Go code.") provides gains for at least the following calls:
    • Fusing Add/Sub/Opp with Carry (go: Additional curve25519 routines mit-plv/fiat-crypto#1004). I filed the issue for curve25519 because that was what I was looking at, but skimming the P-521 commit, it applies there as well.
    • Providing a routine that does repeated squaring in a for loop is a moderate gain for curve25519, due to making inversions faster.
    • Providing routines that merge Add/Sub with CarryMul and CarrySquare is a minor gain, though this depends on how the code is called.
* curve25519 and edwards25519 64-bit
  
  * Using the code we are already landing for edwards25519 (https://golang.org/cl/276272, https://golang.org/cl/315269) is a little faster on arm64, and way faster on amd64 (https://golang.org/cl/c/crypto/+/314889/4#message-f16cf2274fc82aaf4a5df6836517ed52eadd32b5). Fiat is 15% faster on POWER9, but that alone doesn't feel worth carrying it (https://golang.org/cl/c/crypto/+/315269/1#message-22671e8bfc8583e4ae329a88ba25ab5c49cc3016).

With the addcarryxU64 removal, and fused Add/Sub/Opp + Carry calls, the existing code is slightly faster than (fiat runtime is +9~17%) on amd64[0]. I did not benchmark copy-pasting CarrySquare into a for-loop, since the existing inversion code won't leverage it (so there is an optimization opportunity in the base code as well). Likewise I did not benchmark the impact of adding a * (b + c)/a * (b - c)/(a + b)^2 routines.

Under the assumption that upstream pulls in at least the optimizations I did benchmark (and they seemed open to the idea), the tradeoffs from my perspective are:

  • (Pro) "ed25519 and curve25519 32-bits" for "free". This will almost certainly be faster than the existing code as well.
  • (Pro) Formal verification of part of the implementation (though the current code is quite nice)
  • (Pro) Less assembly to maintain (amd64/arm64 multiply/square have assembly impls)
  • (Con) Slightly slower.

But this is contingent on changes to the generated code happening, so there is no rush. On a positive note, when the changes do happen P-521 will be faster for "free".

[0]: ScalarMult/ScalarBaseMult performance will be significantly worse if fiat's Selectznz is used, so I didn't.

Originally posted by @Yawning in #40171 (comment)

@Yawning
Copy link
Author

Yawning commented Aug 12, 2021

Whoops. Did not mean to open this here, sorry about the noise.

@Yawning Yawning closed this as completed Aug 12, 2021
@golang golang locked and limited conversation to collaborators Aug 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants