Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve FIPS RSA keygen performance - change BN_gcd() test #19578

Closed
wants to merge 8 commits into from

Conversation

slontis
Copy link
Member

@slontis slontis commented Nov 2, 2022

FIPS 186-4 has 5 different algorithms for key generation, and all of them rely on testing GCD(a,n) == 1 many times.

Cachegrind was showing that the function BN_gcd() was taking a considerable percentage of the total cycles, during a rsa keygen operation.

The default provider uses multiprime keygen, which seemed to be much faster. This is because it uses BN_mod_inverse() instead.

For a 4096 bit key, the entropy of a key that was taking a long time to generate was recorded and fed back into subsequent runs. Roughly 40% of the cycle time was BN_gcd() with most of the remainder in the prime testing. Changing to use the inverse resulted in the cyle count being 96% in the prime testing.

Checklist
  • documentation is added or updated
  • tests are added or updated

@github-actions github-actions bot added the severity: fips change The pull request changes FIPS provider sources label Nov 2, 2022
@slontis
Copy link
Member Author

slontis commented Nov 2, 2022

inv
gcd

crypto/bn/bn_rsa_fips186_4.c Outdated Show resolved Hide resolved
crypto/bn/bn_rsa_fips186_4.c Outdated Show resolved Hide resolved
@slontis slontis added branch: master Merge to master branch branch: 3.1 Merge to openssl-3.1 labels Nov 2, 2022
@t8m t8m added the triaged: feature The issue/pr requests/adds a feature label Nov 2, 2022
Copy link
Member

@t8m t8m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice optimization, Shane!

crypto/bn/bn_rsa_fips186_4.c Outdated Show resolved Hide resolved
@paulidale
Copy link
Contributor

My primary concern is that BN_mod_inverse() isn't strictly constant time:

  • It copies the input BNs, dropping any constant time flags
    • It adds constant times flags for the modulo and division operations
    • It doesn't for the multiplication and addition
  • It doing Euclid's algorithm by changing pointers rather than constant time swaps or copies
    • This was flagged as badness in #19309
    • However, all values are accessed each iteration so exploiting the cache would seem to be a difficult side channel

These points can be addressed easily enough but will cost some performance.

  • Constant time multiply and add will be fast compared to division and modulo
  • Constant time BN copy will be much faster than the BN swap in #19309 because it uses memcpy(2) instead of constant time select in a loop

@t8m
Copy link
Member

t8m commented Nov 2, 2022

I would like to see even a theoretical attack on this BN_mod_inverse use in the RSA keygen. How would it work given the key is generated once so there is no real chance for the attacker to do repeated measurement to gather some statistics. But yeah, if making BN_mod_inverse consttime would not cost much, it would make sense.

@paulidale
Copy link
Contributor

If this attack were only theoretical, why did we go to such lengths to be constant time in the BN mod function for keygen? I'm not saying it's exploitable, just that we need to be cautious.

@romen
Copy link
Member

romen commented Nov 2, 2022

I would like to see even a theoretical attack on this BN_mod_inverse use in the RSA keygen. How would it work given the key is generated once so there is no real chance for the attacker to do repeated measurement to gather some statistics. But yeah, if making BN_mod_inverse consttime would not cost much, it would make sense.

@t8m These seem to be very closely related:

@tomato42
Copy link
Contributor

tomato42 commented Nov 2, 2022

If the attacker can probe the keygen process often enough (either through power side channel or through microarchitecutral means), they may be able to deduce exact code path taken

That being said, this:

            /* (Step 1) GCD(2r1, r2) = 1 */
            && gcd_isone(tmp, r1x2, r2, ctx)
            /* (Step 2) R = ((r2^-1 mod 2r1) * r2) - ((2r1^-1 mod r2)*2r1) */
            && BN_mod_inverse(R, r2, r1x2, ctx)

is calling BN_mod_inverse twice over the same arguments, so if inverse is leaky in an exploitable way, it's a problem already, and this change doesn't make the situation worse.

The second change is much harder to say if it makes the code more vulnerable or not.

@kroeckx
Copy link
Member

kroeckx commented Nov 2, 2022

Do you think it would be useful to have a public symbol BN_gcd_isone()?

@slontis
Copy link
Member Author

slontis commented Nov 3, 2022

Not sure if anyone noticed the comment related to the default provider. i.e the multi prime keygen already uses the inverse instead of BN_gcd() - so this has been used like this for quite a long time.

@paulidale
Copy link
Contributor

Do you think it would be useful to have a public symbol BN_gcd_isone()?

I think we should expose it as a BN_are_coprime() call.

@slontis
Copy link
Member Author

slontis commented Nov 7, 2022

What am I supposed to do with the tmp inv param? I was trying to avoid allocating it internally? (add it as an optional last param?)

@paulidale
Copy link
Contributor

BN_mod_inverse is allocating a pile of BIGNUMs internal, I doubt one more will hurt any.

@t8m
Copy link
Member

t8m commented Nov 9, 2022

Needs rebase or putting the libcrypto.num entry at some more clever place 😁

FIPS 186-4 has 5 different algorithms for key generation,
and all of them rely on testing GCD(a,n) == 1 many times.

Cachegrind was showing that during a RSA keygen operation,
the function BN_gcd() was taking a considerable percentage
of the total cycles.

The default provider uses multiprime keygen, which seemed to
be much faster. This is because it uses BN_mod_inverse()
instead.

For a 4096 bit key, the entropy of a key that was taking a
long time to generate was recorded and fed back into subsequent
runs. Roughly 40% of the cycle time was BN_gcd() with most of the
remainder in the prime testing. Changing to use the inverse
resulted in the cycle count being 96% in the prime testing.
crypto/bn/bn_gcd.c Show resolved Hide resolved
doc/man3/BN_mod_inverse.pod Outdated Show resolved Hide resolved
@slontis slontis added the tests: present The PR has suitable tests present label Nov 16, 2022
@paulidale paulidale added the approval: review pending This pull request needs review by a committer label Nov 16, 2022
crypto/bn/bn_gcd.c Outdated Show resolved Hide resolved
@t8m t8m added approval: done This pull request has the required number of approvals and removed approval: review pending This pull request needs review by a committer labels Nov 16, 2022
@slontis slontis changed the title Improve FIPS RSA keygen performance. Improve FIPS RSA keygen performance - change BN_gcd() test Nov 17, 2022
@openssl-machine openssl-machine added approval: ready to merge The 24 hour grace period has passed, ready to merge and removed approval: done This pull request has the required number of approvals labels Nov 17, 2022
@openssl-machine
Copy link
Collaborator

This pull request is ready to merge

doc/man3/BN_cmp.pod Outdated Show resolved Hide resolved
@hlandau hlandau removed the approval: ready to merge The 24 hour grace period has passed, ready to merge label Nov 18, 2022
@t8m t8m added the approval: review pending This pull request needs review by a committer label Nov 18, 2022
@t8m
Copy link
Member

t8m commented Nov 18, 2022

@paulidale please reconfirm

@@ -5481,3 +5481,4 @@ BIO_f_zstd ? 3_2_0 EXIST::FUNCTION:COMP
d2i_PUBKEY_ex_fp ? 3_2_0 EXIST::FUNCTION:STDIO
d2i_PUBKEY_ex_bio ? 3_2_0 EXIST::FUNCTION:
COMP_zlib_oneshot ? 3_2_0 EXIST::FUNCTION:COMP
BN_are_coprime ? 3_2_0 EXIST::FUNCTION:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So - 3_1_0 or 3_2_0 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3_1_0. I'll fix it when merging.

@paulidale paulidale added approval: done This pull request has the required number of approvals and removed approval: review pending This pull request needs review by a committer labels Nov 18, 2022
@openssl-machine openssl-machine removed the approval: done This pull request has the required number of approvals label Nov 19, 2022
@openssl-machine
Copy link
Collaborator

This pull request is ready to merge

@openssl-machine openssl-machine added the approval: ready to merge The 24 hour grace period has passed, ready to merge label Nov 19, 2022
openssl-machine pushed a commit that referenced this pull request Nov 21, 2022
FIPS 186-4 has 5 different algorithms for key generation,
and all of them rely on testing GCD(a,n) == 1 many times.

Cachegrind was showing that during a RSA keygen operation,
the function BN_gcd() was taking a considerable percentage
of the total cycles.

The default provider uses multiprime keygen, which seemed to
be much faster. This is because it uses BN_mod_inverse()
instead.

For a 4096 bit key, the entropy of a key that was taking a
long time to generate was recorded and fed back into subsequent
runs. Roughly 40% of the cycle time was BN_gcd() with most of the
remainder in the prime testing. Changing to use the inverse
resulted in the cycle count being 96% in the prime testing.

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from #19578)
@t8m
Copy link
Member

t8m commented Nov 21, 2022

Merged to master and 3.1 branches (after fixing the version in libcrypto.num entry). Thank you.

@t8m t8m closed this Nov 21, 2022
openssl-machine pushed a commit that referenced this pull request Nov 21, 2022
FIPS 186-4 has 5 different algorithms for key generation,
and all of them rely on testing GCD(a,n) == 1 many times.

Cachegrind was showing that during a RSA keygen operation,
the function BN_gcd() was taking a considerable percentage
of the total cycles.

The default provider uses multiprime keygen, which seemed to
be much faster. This is because it uses BN_mod_inverse()
instead.

For a 4096 bit key, the entropy of a key that was taking a
long time to generate was recorded and fed back into subsequent
runs. Roughly 40% of the cycle time was BN_gcd() with most of the
remainder in the prime testing. Changing to use the inverse
resulted in the cycle count being 96% in the prime testing.

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from #19578)

(cherry picked from commit dd1d7bc)
beldmit pushed a commit to beldmit/openssl that referenced this pull request Dec 26, 2022
FIPS 186-4 has 5 different algorithms for key generation,
and all of them rely on testing GCD(a,n) == 1 many times.

Cachegrind was showing that during a RSA keygen operation,
the function BN_gcd() was taking a considerable percentage
of the total cycles.

The default provider uses multiprime keygen, which seemed to
be much faster. This is because it uses BN_mod_inverse()
instead.

For a 4096 bit key, the entropy of a key that was taking a
long time to generate was recorded and fed back into subsequent
runs. Roughly 40% of the cycle time was BN_gcd() with most of the
remainder in the prime testing. Changing to use the inverse
resulted in the cycle count being 96% in the prime testing.

Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from openssl#19578)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approval: ready to merge The 24 hour grace period has passed, ready to merge branch: master Merge to master branch branch: 3.1 Merge to openssl-3.1 severity: fips change The pull request changes FIPS provider sources tests: present The PR has suitable tests present triaged: feature The issue/pr requests/adds a feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants