Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.1.1 SNI breaks hand shake #7244

Closed
litespeedtech opened this issue Sep 17, 2018 · 12 comments
Closed

1.1.1 SNI breaks hand shake #7244

litespeedtech opened this issue Sep 17, 2018 · 12 comments
Milestone

Comments

@litespeedtech
Copy link

@litespeedtech litespeedtech commented Sep 17, 2018

In the process of debugging an hand shake failure with combination of openssl 1.1.1 + openlitespeed web server, we found that there is a bug that causes handshake failure.

The handshake failure error is:
error:14201076:SSL routines:tls_choose_sigalg:no suitable signature algorithm

The code execution flow is:
inside tls_post_process_client_hello()

  1. s->cert->shared_sigalgslen was set to non-zero by tls1_set_shared_sigalgs()
  2. s->cert->cert_cb() is called to lookup new SSL_CTX based on SSL_get_servername()
  3. When new SSL_CTX found, SSL_set_SSL_CTX(s, new_ctx) is called.
  4. inside SSL_set_SSL_CTX(s, new_ctx) , s->cert->shared_sigalgslen was set to 0 when ssl_cert_free(ssl->cert) is called.
  5. later, when tls_choose_sigalg() is called, it errors out at t1_lib.c:2646
                if (i == s->cert->shared_sigalgslen) {
                    if (!fatalerrs)
                        return 1;
                    SSLfatal(s, SSL_AD_HANDSHAKE_FAILURE,
                             SSL_F_TLS_CHOOSE_SIGALG,
                             SSL_R_NO_SUITABLE_SIGNATURE_ALGORITHM);
                    return 0;
                }

looks like tls1_set_shared_sigalgs() need to be called again, or called between step 4 and 5 instead at step 1.

@kaduk
Copy link
Contributor

@kaduk kaduk commented Sep 17, 2018

I am having trouble matching some of your description to the code; can you confirm that it is the final 1.1.1 release (as opposed to a prerelease version) being used?

Also, independently of the actual issue here, I would suggest moving from the cert_cb to the client_hello_cb, which runs at the very start of the handshake, allowing for greater control over things like version negotiation, as well as virtually eliminating the risk of this sort of ordering issue between library handshake functioning and callback modifications.

@shinrich
Copy link

@shinrich shinrich commented Sep 18, 2018

I am seeing something very similar trying the release version of openssl 1.1.1 with Apache Traffic Server. I had not worked back to steps 1-4, but we do indeed use the SNI to select the certificate as @litespeedtech outlines. But our increase in failures definitely hit step 5 in the same function. s->cert->shared_sigalgslen == 0 and s->cert->shared_sigalgs is null at that point.

We don't see these problems in 1.1.0, and I don't think we saw them with the pre-1.1.1.

Will dig back into our cert selection logic and try to get more details. Not all of our handshakes fail, and they all do the same certificate selection, so that is puzzling.

@shinrich
Copy link

@shinrich shinrich commented Sep 18, 2018

Looking into this more closely, the cases that were working for us were the default case, so SSL_set_SSL_CTX does not need to replace s->cert, and the cert->shared_sigalgs are not cleared. The values set in the earlier call to tls1_set_shared_sigalgs() are preserved (called from tls_early_post_process_client_hello). Best that I can tell, this function is not called again to set the shared->sigalgs for the replacement cert.

shinrich pushed a commit to shinrich/openssl that referenced this issue Sep 18, 2018
When called from the certcallback there does not seem to be another
opportunity in the state machine to reset shared_sigalgs on the new
cert, so this patch just copies over the shared_sigalgs from the origin
cert.

Identified in issue openssl#7244
shinrich pushed a commit to shinrich/openssl that referenced this issue Sep 18, 2018
When called from the certcallback there does not seem to be another
opportunity in the state machine to reset shared_sigalgs on the new
cert, so this patch just copies over the shared_sigalgs from the origin
cert.

Fixes openssl#7244
@shinrich
Copy link

@shinrich shinrich commented Sep 18, 2018

I put up a rather ham-handed patch at PR #7256

With this change TrafficServer running against openssl 1.1.1 will successfully negotiate with client requests that use the non-default certificate.

mattcaswell added a commit to mattcaswell/openssl that referenced this issue Sep 18, 2018
Otherwise the sig algs are reset if SSL_set_SSL_CTX() gets called.

Fixes openssl#7244
@mattcaswell
Copy link
Member

@mattcaswell mattcaswell commented Sep 18, 2018

@litespeedtech or @shinrich - do you think you could test out the patch in #7257? (An alternative approach to the one in #7256). If that works I'll try an pull together a test case for it.

@mattcaswell
Copy link
Member

@mattcaswell mattcaswell commented Sep 18, 2018

Incidentally this issue seems to come about because of using the certificate callback for SNI handling, rather than the intended SNI callback (or the new, and now preferred client_hello_cb as mentioned by @kaduk). The cert_cb is called relatively late in the process, whilst the SNI callback is called near the beginning of extensions processing. The client_hello_cb is called even earlier still.

@shinrich
Copy link

@shinrich shinrich commented Sep 18, 2018

I thought we were supposed to use the cert_cb to select the certificate for the SSL object. Originally we were using the sni callback for the cert selection (back in 1.0.1 and early 1.0.2). We moved the cert selection logic to the cert_callback because we wanted to have the option of stalling the handshake processing while wandering off to find the certificate to use. At the time the SNI callback was not stallable but the cert_cb was stallable.

@mattcaswell
Copy link
Member

@mattcaswell mattcaswell commented Sep 18, 2018

Changing the SSL_CTX and changing the cert are different (although related) things. The intention (I think) is to change the SSL_CTX in the servername callback. Typically you are swapping to an SSL_CTX that already has the new cert associated with it. If that's not the case then you can additionally update the cert at the same time (i.e. in the servername callback), or (if you want it to be stallable) do it in the cert_cb. In practice though it seems people use these callbacks interchangeably so we probably have to continue to support changing the SSL_CTX in the cert_cb.

@shinrich
Copy link

@shinrich shinrich commented Sep 18, 2018

Hmm. I didn't realize that you could set the cert on an SSL object independent of the SSL_CTX. SSL_set_SSL_CTX() was the call in place when I first came upon the relevant Traffic Server code so that is what I have concentrated on.

I just noticed your comment about testing the patch. I must run out this evening, but I will take the patch for a spin in the morning. Thanks for the quick response!

@shinrich
Copy link

@shinrich shinrich commented Sep 18, 2018

Had a bit of time after all. The patch looks good. I no longer see the no suitable signature algorithm error. Will leave it run overnight.

@litespeedtech
Copy link
Author

@litespeedtech litespeedtech commented Sep 19, 2018

I tested patch #7257, looks good. prefer it over #7256 . Thanks for the quick fix.
We use cert_cb because we need to pause handshake, and it is compatible with BoringSSL.
Looks like client_hello_cb is a better choice now, as it can pause handshake as well.
It makes more sense to select certificate as early as possible. We will look into using that.

@shinrich
Copy link

@shinrich shinrich commented Sep 19, 2018

Looks very good overnight. Closed PR #7256. Thanks for the fix.

levitte pushed a commit that referenced this issue Sep 21, 2018
Otherwise the sig algs are reset if SSL_set_SSL_CTX() gets called.

Fixes #7244

Reviewed-by: Ben Kaduk <kaduk@mit.edu>
(Merged from #7257)

(cherry picked from commit 524006d)
@levitte levitte closed this in 524006d Sep 21, 2018
@mspncp mspncp added this to the 1.1.1a milestone Oct 23, 2018
kaduk added a commit to kaduk/openssl that referenced this issue Jun 13, 2019
…alled"

This reverts commit 524006d.

While this change did prevent the sigalgs from getting inadvertently
clobbered by SSL_set_SSL_CTX(), it also caused the sigalgs to not be
set when the cert_cb runs.  This, in turn, caused significant breakage,
such as SSL_check_chain() failing to find any valid chain.  An alternate
approach to fixing the issue from openssl#7244 will follow.
kaduk added a commit to kaduk/openssl that referenced this issue Jun 14, 2019
…alled"

This reverts commit 524006d.

While this change did prevent the sigalgs from getting inadvertently
clobbered by SSL_set_SSL_CTX(), it also caused the sigalgs to not be
set when the cert_cb runs.  This, in turn, caused significant breakage,
such as SSL_check_chain() failing to find any valid chain.  An alternate
approach to fixing the issue from openssl#7244 will follow.
kaduk added a commit to kaduk/openssl that referenced this issue Jun 17, 2019
…alled"

This reverts commit 524006d.

While this change did prevent the sigalgs from getting inadvertently
clobbered by SSL_set_SSL_CTX(), it also caused the sigalgs to not be
set when the cert_cb runs.  This, in turn, caused significant breakage,
such as SSL_check_chain() failing to find any valid chain.  An alternate
approach to fixing the issue from openssl#7244 will follow.
kaduk added a commit to kaduk/openssl that referenced this issue Jun 20, 2019
…alled"

This reverts commit 524006d.

While this change did prevent the sigalgs from getting inadvertently
clobbered by SSL_set_SSL_CTX(), it also caused the sigalgs to not be
set when the cert_cb runs.  This, in turn, caused significant breakage,
such as SSL_check_chain() failing to find any valid chain.  An alternate
approach to fixing the issue from openssl#7244 will follow.
kaduk added a commit to kaduk/openssl that referenced this issue Jun 20, 2019
…alled"

This reverts commit 524006d.

While this change did prevent the sigalgs from getting inadvertently
clobbered by SSL_set_SSL_CTX(), it also caused the sigalgs to not be
set when the cert_cb runs.  This, in turn, caused significant breakage,
such as SSL_check_chain() failing to find any valid chain.  An alternate
approach to fixing the issue from openssl#7244 will follow.
kaduk added a commit to kaduk/openssl that referenced this issue Jun 26, 2019
…alled"

This reverts commit 524006d.

While this change did prevent the sigalgs from getting inadvertently
clobbered by SSL_set_SSL_CTX(), it also caused the sigalgs to not be
set when the cert_cb runs.  This, in turn, caused significant breakage,
such as SSL_check_chain() failing to find any valid chain.  An alternate
approach to fixing the issue from openssl#7244 will follow.
levitte pushed a commit that referenced this issue Jun 26, 2019
…alled"

This reverts commit 524006d.

While this change did prevent the sigalgs from getting inadvertently
clobbered by SSL_set_SSL_CTX(), it also caused the sigalgs to not be
set when the cert_cb runs.  This, in turn, caused significant breakage,
such as SSL_check_chain() failing to find any valid chain.  An alternate
approach to fixing the issue from #7244 will follow.

Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from #9157)
levitte pushed a commit that referenced this issue Jun 26, 2019
…alled"

This reverts commit 524006d.

While this change did prevent the sigalgs from getting inadvertently
clobbered by SSL_set_SSL_CTX(), it also caused the sigalgs to not be
set when the cert_cb runs.  This, in turn, caused significant breakage,
such as SSL_check_chain() failing to find any valid chain.  An alternate
approach to fixing the issue from #7244 will follow.

Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from #9157)

(cherry picked from commit 6f34d7b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

5 participants
You can’t perform that action at this time.