Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RSA decryption API is very hard to use safely #13421

Closed
tomato42 opened this issue Nov 16, 2020 · 13 comments
Closed

RSA decryption API is very hard to use safely #13421

tomato42 opened this issue Nov 16, 2020 · 13 comments
Labels
triaged: feature The issue/pr requests/adds a feature triaged: OTC evaluated This issue/pr was triaged by OTC
Milestone

Comments

@tomato42
Copy link
Contributor

tomato42 commented Nov 16, 2020

It's well known that incorrect use of RSA decryption functions leads to Bleichenbacher oracles and thus vulnerability to Bleichenbacher attack, allowing for ciphertext decryption or signature forgery.

What's far less well known is how to use RSA decryption APIs correctly. Two most recent examples close to OpenSSL are CVE-2020-25659 (in pyca/cryptography) and CVE-2020-25657 (in M2Crypto).

As such, I think the issue is with the OpenSSL API: it's very hard to use correctly, so applications that do use it will get it wrong.

While the simple solution is to "just stop using PKCS#1 v1.5 decryption", I don't think it's a realistic approach for OpenSSL, it's too widely deployed for us to remove it completely (e.g. JSON Web Tokens use it).

So, I'd like to propose a change to the way the decryption API works, both the RSA_private_decrypt and EVP_PKEY_decrypt with RSA_PKCS1_PADDING.

My proposal is similar to the one recommended in RFC5246: make decryption always generate a decrypted message, even if padding checks failed.

The Bleichenbacher attack depends on the attacker being able to tell apart the whole systems behaviour in case the
decryption failed because of RSA padding check and between the case where the decryption was successful but later use of the decrypted message failed (for example, because the later HMAC check failed). Note: during the attack, certain fraction of attacker-generated ciphertexts will have valid PKCS#1 padding, may decrypt to the message of expected length or even have expected structure (like the case of TLS pre-master secret starting with the TLS version).
What's important, is that the attacker doesn't know what the ciphertexts he or she sends decrypt to.

Let's consider a case of an attacker that can both perform the API call and get the decrypted value itself.
In such case, if it feeds the messages generated from the Bleichenbacher algorithm to the API and gets some random message
back, the only way of knowing if that value is an actual decrypted value or if it was the randomly generated secret is to generate all possible ciphertexts with that message using all possible padding values: task with a work factor of at least 2^64 (as the minimal padding length is 8 bytes) and far more for if the returned message is smaller.
If the attacker can't see the actual decrypted value, just that the application received a secret message of specific length, then there is no offline operation that it can perform to guess if that message length really came from the ciphertext or if it was the result of the random number generator.

So, I think that both EVP_PKEY_decrypt and RSA_private_decrypt with PKCS#1 padding should use something like HMAC(hash(private_key), ciphertext) to seed an RNG that they then use to select the length of returned message and the contents of it.

Any arguments against it?

(yes, I've read #8446)

@tomato42 tomato42 added the issue: bug report The issue was opened to report a bug label Nov 16, 2020
@kroeckx
Copy link
Member

kroeckx commented Nov 16, 2020 via email

@davidben
Copy link
Contributor

davidben commented Nov 16, 2020

I agree changing the behavior to map the error away wouldn't work well. While any application that relies on it is likely already vulnerable to RSAES-PKCS1-v1_5's flaws, silently changing the success criteria won't make them any better. It's also not clear to me how it'd work anyway. EVP_PKEY_decrypt and RSA_private_decrypt do not know the expected length, so they can't generate the replacement string. (Lengths are closely related to memory access and allocation patterns, so constant-time code needs to understand all the public and private lengths involved, and tailor the API shape accordingly.)

Rather, I'd suggest:

  • Document clearly that, absent some external source of authentication, the existing RSAES-PKCS1-v1_5 decryption API is broken and should not be used. I don't think there is any saving it. Maybe eventually remove it. Though this is made a little tricky by RSAES-PKCS1-v1_5 being the default for RSA keys when using EVP_PKEY_decrypt. (Perhaps OpenSSL 3.0 should start by removing the notion of a default RSAES padding and require callers specify it?)

  • Study existing protocols' use of RSAES-PKCS1-v1_5, their established workarounds, and build API(s) tailored to them. TLS needs a constant-time swap to random, with a known length. I expect this pattern is common. Go has this API, though note it doesn't cover the funny (and somewhat useless) version check. I believe Go's crypto/tls implementation just doesn't bother with it. Another possibility is an optional parameter. My understanding is SSH needs a similar constant-time replacement workaround. Are there other patterns? Note this isn't compatible with EVP_PKEY_decrypt / RSA_private_decrypt at all due to the pre-supplied length.

  • Make sure existing protocols' use of RSAES-PKCS1-v1_5 are suitably deprecated so we can start the (very long) removal clock. We've done this for TLS. I didn't know JWT also used it. That's disappointing. Is there work anywhere to deprecate it in JWT?

We also need to be clear on what we can accomplish here. The swap to random isn't a general solution to make RSAES-PKCS1-v1_5 safe. It's something that specifically works for TLS (and SSH?) because of how the rest of the protocol is constructed. For instance, it doesn't work for SSL2.

@tomato42
Copy link
Contributor Author

@kroeckx wrote:
I think changing what the current RSA_PKCS1_PADDING does is probably not a good idea. People will most likely rely on the current behaviour where an error is returned,

but when would that matter in regular use case, when not under attack?

and I think your proposal includes never returning an error. I have no idea how most applications will then continue with that random data. So if you want to change the behaviour, I think we need to add a new option to the function that then documents that behaviour.

the basic assumption of the attack is that the attacker can force the application to process an RSA ciphertext; the attacker generally also has access to the public key; so the application already must handle messages that will decrypt to unexpected lengths—the attacker can create ciphertexts that decrypt to arbitrary messages already

@tomato42
Copy link
Contributor Author

tomato42 commented Nov 16, 2020

@davidben wrote:
I agree changing the behavior to map the error away wouldn't work well. While any application that relies on it is likely already vulnerable to RSAES-PKCS1-v1_5's flaws, silently changing the success criteria won't make them any better. It's also not clear to me how it'd work anyway. EVP_PKEY_decrypt and RSA_private_decrypt do not know the expected length, so they can't generate the replacement string. (Lengths are closely related to memory access and allocation patterns, so constant-time code needs to understand all the public and private lengths involved, and tailor the API shape accordingly.)

that's exactly my point, we don't need to know what's the length application expects: if there are two cases that can generate message of given length, namely an actual valid PKCS#1 ciphertext that decrypts to such length, and a ciphertext for which we create a message of a given length, and the attacker can't differentiate between them, then they can't use it as an oracle in the Bleichenbacher attack, making the whole attack fail

please read the opening post again: I'm proposing a modification that works even if the attacker can get access to the supposedly decrypted value

Rather, I'd suggest:

  • Document clearly that, absent some external source of authentication, the existing RSAES-PKCS1-v1_5 decryption API is broken and should not be used. I don't think there is any saving it. Maybe eventually remove it. Though this is made a little tricky by RSAES-PKCS1-v1_5 being the default for RSA keys when using EVP_PKEY_decrypt. (Perhaps OpenSSL 3.0 should start by removing the notion of a default RSAES padding and require callers specify it?)

just because we document it, won't change code that already uses OpenSSL, it will remain insecure

the only option is to remove support for PKCS#1 v1.5, but that's not realistic

  • Study existing protocols' use of RSAES-PKCS1-v1_5, their established workarounds, and build API(s) tailored to them. TLS needs a constant-time swap to random, with a known length. I expect this pattern is common. Go has this API, though note it doesn't cover the funny (and somewhat useless) version check. I believe Go's crypto/tls implementation just doesn't bother with it. Another possibility is an optional parameter. My understanding is SSH needs a similar constant-time replacement workaround. Are there other patterns? Note this isn't compatible with EVP_PKEY_decrypt / RSA_private_decrypt at all due to the pre-supplied length.

Is there an SSH Key exchange that uses RSA encryption? I'm not familiar with it, but then, I've only looked at what OpenSSH and libssh support...

  • Make sure existing protocols' use of RSAES-PKCS1-v1_5 are suitably deprecated so we can start the (very long) removal clock. We've done this for TLS. I didn't know JWT also used it. That's disappointing. Is there work anywhere to deprecate it in JWT?

I don't think so

We also need to be clear on what we can accomplish here. The swap to random isn't a general solution to make RSAES-PKCS1-v1_5 safe. It's something that specifically works for TLS (and SSH?) because of how the rest of the protocol is constructed. For instance, it doesn't work for SSL2.

that's because nobody tried to implement workarounds in SSLv2, we just killed it

@davidben
Copy link
Contributor

that's exactly my point, we don't need to know what's the length application expects: if there are two cases that can generate message of given length, namely an actual valid PKCS#1 ciphertext that decrypts to such length, and a ciphertext for which we create a message of a given length, and the attacker can't differentiate between them, then they can't use it as an oracle in the Bleichenbacher attack, making the whole attack fail

please read the opening post again: I'm proposing a modification that works even if the attacker can get access to the supposedly decrypted value

Ah, I see. My initial thought is that the synthesized length will be unlikely to match the application's expected output size. (E.g. consider a naive TLS-like protocol using OpenSSL's API. It would check for RSA_private_decrypt's error, then check the output size is 48. The synthesized length is most likely not 48 (say 1/245 if picking valid RSA-2048 lengths entirely uniformly), so we still go down an error path, so it seems we still have an S-PKCS-conformity oracle (borrowing terminology from https://eprint.iacr.org/2003/052.pdf), albeit a noisy one.

But that's not really the probabilities we're comparing. The oracle will return 1 if:

  • the input is S-PKCS-conforming, OR
  • the input is not PKCS-conforming and the synthesized length happens to be 48

PKCS-conforming and S-PKCS-conforming plaintexts are common enough that we can find them, but they're still fairly rare, so I think the second bucket is... usually larger. (Bleichenbacher's paper bounds the probability of PKCS-conforming ciphertext between 0.18 * 2^-16 and 0.97 * 2^-8 for RSA-512 and higher. For RSA-2048, the first term seems to be around 0.6. The second term is closer to 2^16 if your modulus is a multiple of 8 bits.)

That might be enough to frustrate the attack? This probably needs some more analysis.

(This also assumes we're okay blanket-removing the error case from all uses of RSAES-PKCS1-v1_5. See note below. If we're making a new API, I don't think there's any need to play games with lengths and we can just ask the caller to tell us the length they expect. An API tailored to session keys also nudges developers away from using RSAES-PKCS1-v1_5 as a general-purpose encryption scheme.)

just because we document it, won't change code that already uses OpenSSL, it will remain insecure

Yeah, any fix to RSAES-PKCS1-v1_5's flaws requires changing the behavior of that existing code. Whether that behavior change comes from OpenSSL or the application, depends on the change. We can add a new API with the fixed behavior and get folks to migrate, or we can change the existing API's behavior. The latter gets us where we want faster, but carries a risk of breaking things if applications were relying on the old behavior.

If, say, fixing a bug in the TLS implementation, it makes sense to just change the existing API. But here we're talking about dramatically changing the meaning of an existing API.

the only option is to remove support for PKCS#1 v1.5, but that's not realistic

That's not the only option. We can remove support for that API, which would ask the remaining RSAES-PKCS1-v1_5 users move to the new one.

Is there an SSH Key exchange that uses RSA encryption? I'm not familiar with it, but then, I've only looked at what OpenSSH and libssh support...

I'm not very familiar with SSH either. It looks like I may be wrong here (yay!). For some reason I remembered seeing DecryptPKCS1v15SessionKey used by Go's SSH implementation as well, but now I can't find it. I'm probably misremembering. (Or perhaps it was some old mode and they since removed it?)

that's because nobody tried to implement workarounds in SSLv2, we just killed it

Hehe, you've gotten me to read the DROWN paper again to remind myself how this works. :-) The SSLv3/TLS swap-with-random countermeasure works because we're effectively using the client Finished message (which immediately follows ClientKeyExchange) as an authenticator. That is, whether the counter measure works depends on the rest of the protocol.

SSLv2 doesn't look like that. See section 4.1. It's got a ServerVerify message in between ClientMasterKey (SSLv2 ClientKeyExchange) and client Finished, where the server encrypts some data. When using an export cipher, the master key was too short so the attacker could observe the mitigation triggering some non-determinism. There's also a variation based on some OpenSSL bug (section 5.1). Of course, export ciphers and the OpenSSL bug are a problem in themselves, but the point remains that we depend on the rest of the protocol.

DROWN also relied on the countermeasure using a random substitute. Perhaps some deterministic replacement, like the HMAC you suggest, would have worked.

@tomato42
Copy link
Contributor Author

that's exactly my point, we don't need to know what's the length application expects: if there are two cases that can generate message of given length, namely an actual valid PKCS#1 ciphertext that decrypts to such length, and a ciphertext for which we create a message of a given length, and the attacker can't differentiate between them, then they can't use it as an oracle in the Bleichenbacher attack, making the whole attack fail
please read the opening post again: I'm proposing a modification that works even if the attacker can get access to the supposedly decrypted value

Ah, I see. My initial thought is that the synthesized length will be unlikely to match the application's expected output size. (E.g. consider a naive TLS-like protocol using OpenSSL's API. It would check for RSA_private_decrypt's error, then check the output size is 48. The synthesized length is most likely not 48 (say 1/245 if picking valid RSA-2048 lengths entirely uniformly), so we still go down an error path, so it seems we still have an S-PKCS-conformity oracle (borrowing terminology from https://eprint.iacr.org/2003/052.pdf), albeit a noisy one.

and that noisiness is precisely what helps us, there are two scenarios that can generate S-PKCS conformant plaintextext and the attacker can't differentiate between them, even if the hypothetical TLS implementation using the API leaks like a sieve

But that's not really the probabilities we're comparing. The oracle will return 1 if:

  • the input is S-PKCS-conforming, OR

  • the input is not PKCS-conforming and the synthesized length happens to be 48

PKCS-conforming and S-PKCS-conforming plaintexts are common enough that we can find them, but they're still fairly rare, so I think the second bucket is... usually larger. (Bleichenbacher's paper bounds the probability of PKCS-conforming ciphertext between 0.18 * 2^-16 and 0.97 * 2^-8 for RSA-512 and higher. For RSA-2048, the first term seems to be around 0.6. The second term is closer to 2^16 if your modulus is a multiple of 8 bits.)

That might be enough to frustrate the attack? This probably needs some more analysis.

yes, it's enough if we synthesize the message deterministically; as the attack will fail if even one positive guess is incorrect

and we will return "yes, the decryption went fine" for invalid ciphertexts far more often than for valid, so for the few tens of thousands of oracle calls necessary (in the optimistic scenario of system giving positive signal for positive decryptions of any length) with the improved algorithm we're basically certain that at least one guess by the attacker will be incorrect (I mean 0.5^20000 is as good as impossible, and attacker has lower chances than 0.5 of hitting real S-PKCS conforming ciphertexts)

(This also assumes we're okay blanket-removing the error case from all uses of RSAES-PKCS1-v1_5. See note below. If we're making a new API, I don't think there's any need to play games with lengths and we can just ask the caller to tell us the length they expect. An API tailored to session keys also nudges developers away from using RSAES-PKCS1-v1_5 as a general-purpose encryption scheme.)

yes, there might be uses that require to get a real error in case of decryption failure, but I hazard a guess that they are far and few in between, so it's better to document well and special case them than to special case regular use (and fix the 10000 code examples on the Internet that tell how to use the RSA decryption API)

just because we document it, won't change code that already uses OpenSSL, it will remain insecure

Yeah, any fix to RSAES-PKCS1-v1_5's flaws requires changing the behavior of that existing code. Whether that behavior change comes from OpenSSL or the application, depends on the change. We can add a new API with the fixed behavior and get folks to migrate, or we can change the existing API's behavior. The latter gets us where we want faster, but carries a risk of breaking things if applications were relying on the old behavior.

If, say, fixing a bug in the TLS implementation, it makes sense to just change the existing API. But here we're talking about dramatically changing the meaning of an existing API.

I'm afraid it's more like "hope that at least some people migrate, maybe, after much time" than "get folks to migrate"

the only option is to remove support for PKCS#1 v1.5, but that's not realistic

That's not the only option. We can remove support for that API, which would ask the remaining RSAES-PKCS1-v1_5 users move to the new one.

true, but I still worry that there may be users that don't know the size of the decrypted message a-priori... and an API that behaves like I've described will be still needed
also, we have 1.1.1, where we can't remove this API

@tomato42
Copy link
Contributor Author

I did some back-of-the-envelope calculations, and if we assume 20000 oracle calls necessary (very optimistic, for bad version oracle we are usually talking about few million calls) to decrypt an RSA ciphertext, then an oracle having a true positive rate of 0.99555 (a false positive rate of just 0.00445) would still give us a a security margin of 128bits (log_2(0.99555^20000) ~= -128.69)

the only realistic attack scenario is with an attacker having access to the decrypted message, and then looking for the ciphertexts that are max size (not uncommon) and then brute-forcing the padding bytes (embarrassingly parallelizable but not cheap at 2^64 RSA encryptions per oracle call²), since it's the attacker that decides if the decryption was successful or not, the few tens of thousands of oracle calls per decryption are realistic, let's assume the 20k calls, ie. 14 bits, and 2^16 cycles per RSA encryption (what I get from 2048 bit RSA on i7 4790K), we're talking about security margin of about 94 bits, which I think is good enough

² - I'm using 2^64 not the 2^32 birthday paradox because the vast majority of ciphertexts will be false positives (only few thousand true positive oracle calls are necessary to decrypt the ciphertext, most oracle calls are expected to fail) and we can prove that the oracle is a false positive only by exhaustion: by showing that there are no padding bytes that will encrypt to the matching ciphertext

@paulidale paulidale added the triaged: OTC evaluated This issue/pr was triaged by OTC label Nov 17, 2020
@paulidale paulidale added this to the Post 3.0.0 milestone Nov 17, 2020
@paulidale paulidale added triaged: feature The issue/pr requests/adds a feature and removed issue: bug report The issue was opened to report a bug labels Nov 17, 2020
@tomato42
Copy link
Contributor Author

@paulidale added this to the Post 3.0.0 milestone

should I read it as "none of the core OpenSSL developers will work on this before 3.0.0 is out" or as "we will not accept a PR for 3.0.0 alpha that implements this"

@mattcaswell
Copy link
Member

should I read it as "none of the core OpenSSL developers will work on this before 3.0.0 is out" or as "we will not accept a PR for 3.0.0 alpha that implements this"

Strictly speaking it means the latter - but I'd argue we should should remove the milestone altogether to mean the former. I personally, have no objection to this being addressed, but we don't have the bandwidth within the dev team to take this on at the moment. Note if there are API changes then it would have be completed and merged before beta 1 to be eligible for inclusion in 3.0.0.

@tomato42
Copy link
Contributor Author

should I read it as "none of the core OpenSSL developers will work on this before 3.0.0 is out" or as "we will not accept a PR for 3.0.0 alpha that implements this"

Strictly speaking it means the latter - but I'd argue we should should remove the milestone altogether to mean the former. I personally, have no objection to this being addressed, but we don't have the bandwidth within the dev team to take this on at the moment. Note if there are API changes then it would have be completed and merged before beta 1 to be eligible for inclusion in 3.0.0.

would you consider addition of a new flag to be an API change? not a new symbol, just a new #define?

@mattcaswell
Copy link
Member

would you consider addition of a new flag to be an API change?

Yes - that would be considered an API change IMO.

@tomato42
Copy link
Contributor Author

I've implemented the proposed fix in #13817, please review.

@tomato42
Copy link
Contributor Author

merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged: feature The issue/pr requests/adds a feature triaged: OTC evaluated This issue/pr was triaged by OTC
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants