New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement blinding for scalar multiplication #11361
Implement blinding for scalar multiplication #11361
Conversation
Thanks for your submission! @romen - can you take a look at this? Note, that we will need a CLA from all authors of the patch. See https://www.openssl.org/policies/cla.html The other point to note is that 1.1.0 and 1.0.2 are both out of public support and therefore we're not pushing any fixes to the public branches for those releases. There are some people on extended support for 1.0.2, so we can potentially make this fix available to them. |
Nice work @dfaranha and team! Out of curiosity, is there any paper with more details publicly available somewhere? |
Thanks! A paper will be written when our ongoing computation finishes. I also added a patch to fix the prime curve case while preserving the code shared with binary curves. |
@dfaranha is there any update on the CLA matter from all code authors? |
Collecting the last one, hopefully today! |
Close/reopen to kick CLA bot |
Ping @romen - all CLA issues are now resolved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dfaranha and all the other authors for this contribution, and sorry for the delay in my review: the initial part was done quite early, but then I got delayed entering the rabbit hole of how binary curves end up using ec_mul_consttime()
.
I have a few comments that I would like to address together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @dfaranha for all the work!
@paulidale does it make sense for 1.0.2 to have to deal with what would happen if an entropy source is not available or faulty? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will defer to @romen on the maths - but the code looks good.
crypto/ec/ec_mult.c
Outdated
|
||
/* first randomize r->Z to blind s. */ | ||
do { | ||
if (!BN_rand(&r->Z, BN_num_bits(&group->field), 0, 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit: sorry wrong place in the code for that comment :\
Here you want
BN_rand_range(&r->Z, &group->field);
crypto/ec/ec2_mult.c
Outdated
|
||
/* now generate another random field element to blind (x1,z1) */ | ||
do { | ||
if (!BN_rand(z1, BN_num_bits(&group->field), 0, 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct. E.g. for B-233 the field polynomial will have 234 bits while field elements have 233 bits. I think it's
BN_rand(z1, BN_num_bits(&group->field) - 1, -1, 0))
where I think the top=-1
allows the top bit to be random and not fixed.
Can you verify in the debugger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm, good point, let me check!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW this is what we're doing in 111 and master
Lines 731 to 738 in 3cb55fe
/* s blinding: make sure lambda (s->Z here) is not zero */ | |
do { | |
if (!BN_priv_rand_ex(s->Z, BN_num_bits(group->field) - 1, | |
BN_RAND_TOP_ANY, BN_RAND_BOTTOM_ANY, ctx)) { | |
ECerr(EC_F_EC_GF2M_SIMPLE_LADDER_PRE, ERR_R_BN_LIB); | |
return 0; | |
} | |
} while (BN_is_zero(s->Z)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BN_num_bits(group->field) - 1
makes total sense. We wanted to fix the top bit to guarantee that z1 and z2 had exactly the same length, but this is indeed not necessary anymore. I can commit and push both fixes shortly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reconfirm (sorry this slipped under the radar)
Note to self/committer: we should squash all commits in this PR together.
Draft of the commit message as I intend it to be written upon final merge, please @dfaranha confirm if this is fine with you (I am trying to avoid to ask a force-push from you with squash+msg_edit as that would technically require reconfirmation and restart the grace-period). Ping to @mattcaswell and @mspncp as well as we discussed these editorial changes privately. Note to @mattcaswell : please check that in the
|
(also I noticed I was too hasty in applying the |
Fully agreed, thanks for the writeup! |
@romen the commit message looks great, only two nits:
|
@romen another nit: instead of adding a markdown link to the pull request
via an explicit reference link
I would just use the hashtag notation which GitHub autoconverts to links:
Because the URL in your commit message would duplicate (Side note: I am surprised that |
TIL: it's called a shortcut reference link. |
Here is the updated commit message taking into account @mspncp feedback:
|
24 hours has passed since 'approval: done' was set, but as this PR has been updated in that time the label 'approval: ready to merge' is not being automatically set. Please review the updates and set the label manually. |
This commit implements coordinate blinding for the generic implementations of both binary and prime elliptic curves in 1.0.2, to avoid leaking bits of the scalar and, potentially, bug attacks. While blinding is implemented in the 1.1.1 and master branches, it was deliberately decided to avoid backporting those changes as they were originally written for the newer branches, as the solution adopted there required major restructuring of code and structures that was deemed not suitable for 1.0.2. A group of security researchers and cryptographers from academia and industry, listed below, reported a successful cache timing attack in OpenSSL 1.0.2u against specific prime and binary curves whose order or field length is close to a word boundary. In this commit, as a possible fix, the authors propose implementing coordinate randomization to balance the two possibilities for the key bit in the first loop iteration of the Montgomery ladder. This way, the Z coordinates of both accumulator points will be non-trivial and the multiplication latency will be similar, with a tiny performance penalty. The original GitHub Pull Request openssl#11361 includes more details about the reported attack, literature references and discussions on how the originally proposed fix was incrementally edited to reflect the relevant details of the 1.1.1 and master branches regarding coordinate blinding. The authors of the original report and fix are Diego F. Aranha and Akira Takahashi (both from Aarhus University), Mehdi Tibouchi (NTT Corporation) and Yuval Yarom (University of Adelaide). Co-authored-by: Akira Takahashi <takahashi@cs.au.dk> Co-authored-by: Mehdi Tibouchi <tibouchi.mehdi@lab.ntt.co.jp> Co-authored-by: Yuval Yarom <yval@cs.adelaide.edu.au>
I have now made this available to our support customers (via git), and it will be included the next time we do a 1.0.2 release for them. Thank you very much for your contribution! |
@dfaranha is this the related paper? If it is, I would suggest to edit the description to add a link to the manuscript for future reference! Thanks again to all the team for your contribution! |
Updated openssl to 1.0.2v to include this PR: openssl/openssl#11361 Also, removed changes in tools/c_rehash as this repo have tools/c_rehash.in which will reflect the changes in tools/c_rehash. Change-Id: Iacba3c048365374b672fa9c8350d77d128c71fbb Signed-off-by: Tapas Kundu <tkundu@vmware.com> Reviewed-on: http://photon-jenkins.eng.vmware.com:8082/10154 Tested-by: gerrit-photon <photon-checkins@vmware.com> Reviewed-by: Anish Swaminathan <anishs@vmware.com>
Updated openssl to 1.0.2v to include this PR: openssl/openssl#11361 Also, removed changes in tools/c_rehash as this repo have tools/c_rehash.in which will reflect the changes in tools/c_rehash. Change-Id: I7bbca7f521af67becd1a6a963c1608b65ae872d3 Signed-off-by: Tapas Kundu <tkundu@vmware.com> Reviewed-on: http://photon-jenkins.eng.vmware.com:8082/10153 Tested-by: gerrit-photon <photon-checkins@vmware.com> Reviewed-by: Anish Swaminathan <anishs@vmware.com>
Updated openssl to 1.0.2v to include this PR: openssl/openssl#11361 Also, removed changes in tools/c_rehash as this repo have tools/c_rehash.in which will reflect the changes in tools/c_rehash. Change-Id: Id2ef76f5416504ba5a959d430ef7fd9319356cdf Signed-off-by: Tapas Kundu <tkundu@vmware.com> Reviewed-on: http://photon-jenkins.eng.vmware.com:8082/10152 Tested-by: gerrit-photon <photon-checkins@vmware.com> Reviewed-by: Anish Swaminathan <anishs@vmware.com>
Updated openssl to 1.0.2v to include this PR: openssl/openssl#11361 Also, removed changes in tools/c_rehash as this repo have tools/c_rehash.in which will reflect the changes in tools/c_rehash. Change-Id: I927ce163ca2965cdb8a6b7ecc58efc21ee1fcdac Signed-off-by: Tapas Kundu <tkundu@vmware.com> Reviewed-on: http://photon-jenkins.eng.vmware.com:8082/10151 Tested-by: gerrit-photon <photon-checkins@vmware.com> Reviewed-by: Anish Swaminathan <anishs@vmware.com>
Note: today I stumbled on Twitter over this blog post about the LadderLeak attack: |
…ia and industry, and would like to continue the report of a security vulnerability in OpenSSL's implementation of binary/prime field ECDSA signatures. We initiated contact by e-mail in December 2019 and decided to open a pull request publicly to collaborate further with a fix. Affected versions: 1.0.2u and 1.1.0l (current stable releases except 1.1.1 branch) Affected curve parameters: sect163r1, sect283r1/k1, sect409k1, sect571r1 (i.e. binary curves with group order slightly below the power of two) for 1.0.2u and 1.1.0l secp192r1/k1, secp224r1, secp256r1/k1, secp384r1, secp521r1 (i.e. prime curves with group order slightly below the power of two) for 1.0.2u Severity: full key exposure via cache timing attack Executive summary We discovered non-constant time implementations of Montgomery ladder scalar multiplication in the aforementioned releases, which enable the attacker to learn 1-bit of secret nonce with high precision making use of FLUSH+RELOAD cache timing attack technique [1]. Such a small leakage of nonces yields to key-recovery attacks from sufficiently many ECDSA signatures, due to our optimized version of Bleichenbacher's technique [2,3]. A full description of the attack can be found in [4] below. [1] Y. Yarom, K. Falkner. "FLUSH+RELOAD: a High Resolution, Low Noise,L3 Cache Side-Channel Attack". USENIX Security 2014. [2] D. Bleichenbacher. "On the generation of one-time keys in DL signature schemes". Presentation at IEEE P1363 working group meeting. 2000. [3] A. Takahashi, M. Tibouchi, M. Abe. "New Bleichenbacher records: fault attacks on qDSA signatures". TCHES 2018(3), pp. 331–371, 2018. [4] D. F. Aranha, F. R. Novaes, A. Takahashi, M. Tibouchi, Y. Yarom. "LadderLeak: Breaking ECDSA With Less Than One Bit Of Nonce Leakage". Cryptology ePrint Archive: Report 2020/615, available at https://eprint.iacr.org/2020/615 Overview of the vulnerability The attack starts with the detection based on the second topmost bit using a cache-timing attack and follows with the Bleichenbacher methodology. Although the vulnerabilities are similar we split the discussion in the binary and prime curve cases. For the binary curve case, the Montomery ladder is implemented in function ec_GF2m_montgomery_point_multiply() within file ec2_mult.c using López-Dahab coordinates. The function computes scalar multiplication kP for fixed-length scalar k and input point P = (x,y). The ladder starts by initializing two points (X1,Z1) = (X, 1) and (X2,Z2) = 2P = (x^4 + b, x^2). The first loop iteration follows after a conditional swap function that exchanges these two points based on the value of the second topmost key bit. The first function to be called within the first iteration is gf2m_Madd(), which starts by multiplying by value Z1. However, since the finite field arithmetic is not implemented in constant-time for binary fields, there is a timing difference between multiplying by (1) or (x^2), since modular reduction is only needed in the second case. In particular, a modular reduction will be computed when Z1 is x^2 after the conditional swap. This happens when the second topmost bit is 1 because the conditional swap effectively swapped the two sets of values. Although the timing difference is very small, it can be amplified by running a FLUSH-RELOAD attack that measures the amount of time the first multiplication takes while multiple threads in the background penalize the modular reduction code by evicting it from the cache. We observed that it is possible to amplify the timing difference to more than 100,000 cycles on multiple processors, which allows for a detection probability of success above 95% when FLUSH-RELOAD is used. For the prime curve case, the analysis is a little more involved. OpenSSL implements the Montgomery Ladder by using optimized formulas for elliptic curve arithmetic in the Weierstrass model. The algorithm is implemented in function ec_mul_consttime(), but which does not run in constant-time from a cache perspective. The ladder starts again by initializing two accumulators r = P (in affine coordinates) and s = 2P (in projective coordinates). The first loop iteration is non-trivial and computes a point addition and a point doubling after a conditional swap. Depending on the key bit, the conditional swap is effective and only one point will remain stored in projective coordinates. Both the point addition and point doubling functions have optimizations in place for mixed addition, and our detection works on the point doubling case implemented in function ec_GFp_simple_dbl(). When the input point for the doubling function is in affine coordinates, a field multiplication is replaced by a faster call to BN_copy(). This happens when the two accumulators are not swapped in the ladder, which means that point r in affine coordinates is doubled and the second topmost bit is 0. The timing difference is again very small, but can be amplified to at least 15,000 cycles using performance degradation threads that evict the BN_copy() code from the cache. Our detection code implements the FLUSH-RELOAD technique and correctly determines the second topmost bit with around 99% probability of success. Validation of the attack We have conducted an experiment that recovers the signing key of a sect163r1 ECDSA key pair given about 2^26 signatures generated by OpenSSL, with relatively modest computational resources (around 3000 CPU hours and 720GB on a high-performance workstation). Even fewer signatures would suffice with a slightly bigger computation. Our attack code also generalizes to other larger parameters in theory, although the required number of signatures and time complexity are orders of magnitude larger. We're currently executing a practical experiment of our attack against secp192r1. Impact of the vulnerability The vulnerability impacts private keys for ECDSA signatures instantiated with the affected curves. The most likely attack scenario is targeting a server's private key, in which the attacker has execution capabilities in the same machine. How to fix A possible fix amounts to implementing coordinate randomization to balance the two possibilities for the key bit in the first loop iteration of the Montgomery ladder. This way, the Z coordinates of both accumulator points will be non-trivial and the multiplication latency will be similar, with a tiny performance penalty. This pull request implement such a countermeasure for the binary case in version 1.0.2, but we are happy to contribute additional patches for prime curves and version 1.1.0 if necessary. Contact information Diego F. Aranha @dfaranha (Aarhus University) Akira Takahashi @akiratk0355 (Aarhus University) Mehdi Tibouchi @mti (NTT Corporation) Yuval Yarom @javali7 (University of Adelaide) Obtained from: openssl/openssl#11361
@dfaranha Can your work be adapted for koblitz curves (secp256k1)? Or does it also relies on how Openssl sign things in addition of the bit leakage. |
@ytrezq projective coordinate blinding, including secp256k1, has been around since about 2018 from .. v1.1.0+? This particular PR was just about backporting that feature to older v1.0.2, with a very different OpenSSL API. |
@bbbrumley I wasn t thinking about Openssl but other project where 1 bits of the nonce is leaked in general. So I wanted to know if the current exploit method could be repurposed for Koblitz curves. |
Ah I see. Yes, I believe the answer is yes. You need Bleichenbacher-style techniques for less than 3 bits of nonce leakage. But historically speaking, yes -- nonce bias in ElGamal-family signatures doesn't end well, from the security perspective. |
Yes, an implementation of secp256k1 leaking one nonce bit per ECDSA signature would be vulnerable in the same way as LadderLeak describes. Please send me an email if you want me to take a look. :) I should quickly add that curves with efficient endomorphisms such as secp256k1 might also be affected in a different way if they produce biased subscalars through the GLV method. We study such attacks in our Asiacrypt'14 paper. For a similar vulnerability found mere days ago (more leakage -> lattice attacks) see https://cert.europa.eu/publications/security-advisories/2024-039/ |
@dfaranha : just a question : knowing the variables disclosed at https://etherscan.io/address/0x271682deb8c4e0901d1a1550ad2e64d568e69909#code#F27#L562 would it be possible to do something similar on this special type of secp256k1 curve use |
We are a group of security researchers and cryptographers from academia and industry, and would like to continue the report of a security vulnerability in OpenSSL's implementation of binary/prime field ECDSA signatures. We initiated contact by e-mail in December 2019 and decided to open a pull request publicly to collaborate further with a fix.
1.0.2u
and1.1.0l
(current stable releases except1.1.1
branch)sect163r1
,sect283r1/k1
,sect409k1
,sect571r1
(i.e. binary curves with group order slightly below the power of two) for1.0.2u
and1.1.0l
secp192r1/k1
,secp224r1
,secp256r1/k1
,secp384r1
,secp521r1
(i.e. prime curves with group order slightly below the power of two) for1.0.2u
Severity: full key exposure via cache timing attack
Executive summary
We discovered non-constant time implementations of Montgomery ladder scalar multiplication in the aforementioned releases, which enable the attacker to learn 1-bit of secret nonce with high precision making use of FLUSH+RELOAD cache timing attack technique [1]. Such a small leakage of nonces yields to key-recovery attacks from sufficiently many ECDSA signatures, due to our optimized version of Bleichenbacher's technique [2,3].
A full description of the attack can be found in [4] below.
[1] Y. Yarom, K. Falkner. "FLUSH+RELOAD: a High Resolution, Low Noise,L3 Cache Side-Channel Attack". USENIX Security 2014.
[2] D. Bleichenbacher. "On the generation of one-time keys in DL signature schemes". Presentation at IEEE P1363 working group meeting. 2000.
[3] A. Takahashi, M. Tibouchi, M. Abe. "New Bleichenbacher records: fault attacks on qDSA signatures". TCHES 2018(3), pp. 331–371, 2018.
[4] D. F. Aranha, F. R. Novaes, A. Takahashi, M. Tibouchi, Y. Yarom. "LadderLeak: Breaking ECDSA With Less Than One Bit Of Nonce Leakage". Cryptology ePrint Archive: Report 2020/615, available at https://eprint.iacr.org/2020/615
Overview of the vulnerability
The attack starts with the detection based on the second topmost bit using a cache-timing attack and follows with the Bleichenbacher methodology. Although the vulnerabilities are similar we split the discussion in the binary and prime curve cases.
For the binary curve case, the Montomery ladder is implemented in function
ec_GF2m_montgomery_point_multiply()
within fileec2_mult.c
using López-Dahab coordinates. The function computes scalar multiplication kP for fixed-length scalar k and input point P = (x,y). The ladder starts by initializing two points (X1,Z1) = (X, 1) and (X2,Z2) = 2P = (x^4 + b, x^2). The first loop iteration follows after a conditional swap function that exchanges these two points based on the value of the second topmost key bit. The first function to be called within the first iteration isgf2m_Madd()
, which starts by multiplying by value Z1. However, since the finite field arithmetic is not implemented in constant-time for binary fields, there is a timing difference between multiplying by (1) or (x^2), since modular reduction is only needed in the second case. In particular, a modular reduction will be computed when Z1 is x^2 after the conditional swap. This happens when the second topmost bit is 1 because the conditional swap effectively swapped the two sets of values. Although the timing difference is very small, it can be amplified by running a FLUSH-RELOAD attack that measures the amount of time the first multiplication takes while multiple threads in the background penalize the modular reduction code by evicting it from the cache. We observed that it is possible to amplify the timing difference to more than 100,000 cycles on multiple processors, which allows for a detection probability of success above 95% when FLUSH-RELOAD is used.For the prime curve case, the analysis is a little more involved. OpenSSL implements the Montgomery Ladder by using optimized formulas for elliptic curve arithmetic in the Weierstrass model. The algorithm is implemented in function
ec_mul_consttime()
, but which does not run in constant-time from a cache perspective. The ladder starts again by initializing two accumulatorsr = P
(in affine coordinates) ands = 2P
(in projective coordinates). The first loop iteration is non-trivial and computes a point addition and a point doubling after a conditional swap. Depending on the key bit, the conditional swap is effective and only one point will remain stored in projective coordinates. Both the point addition and point doubling functions have optimizations in place for mixed addition, and our detection works on the point doubling case implemented in functionec_GFp_simple_dbl()
. When the input point for the doubling function is in affine coordinates, a field multiplication is replaced by a faster call toBN_copy()
. This happens when the two accumulators are not swapped in the ladder, which means that pointr
in affine coordinates is doubled and the second topmost bit is 0. The timing difference is again very small, but can be amplified to at least 15,000 cycles using performance degradation threads that evict theBN_copy()
code from the cache. Our detection code implements the FLUSH-RELOAD technique and correctly determines the second topmost bit with around 99% probability of success.Validation of the attack
We have conducted an experiment that recovers the signing key of a
sect163r1
ECDSA key pair given about 2^26 signatures generated by OpenSSL, with relatively modest computational resources (around 3000 CPU hours and 720GB on a high-performance workstation). Even fewer signatures would suffice with a slightly bigger computation.Our attack code also generalizes to other larger parameters in theory, although the required number of signatures and time complexity are orders of magnitude larger. We're currently executing a practical experiment of our attack against
secp192r1
.Impact of the vulnerability
The vulnerability impacts private keys for ECDSA signatures instantiated with the affected curves. The most likely attack scenario is targeting a server's private key, in which the attacker has execution capabilities in the same machine.
How to fix
A possible fix amounts to implementing coordinate randomization to balance the two possibilities for the key bit in the first loop iteration of the Montgomery ladder. This way, the Z coordinates of both accumulator points will be non-trivial and the multiplication latency will be similar, with a tiny performance penalty.
This pull request implement such a countermeasure for the binary case in version
1.0.2
, but we are happy to contribute additional patches for prime curves and version1.1.0
if necessary.Contact information