New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.1.1 SNI breaks hand shake #7244

Closed
litespeedtech opened this Issue Sep 17, 2018 · 12 comments

Comments

Projects
None yet
4 participants
@litespeedtech

litespeedtech commented Sep 17, 2018

In the process of debugging an hand shake failure with combination of openssl 1.1.1 + openlitespeed web server, we found that there is a bug that causes handshake failure.

The handshake failure error is:
error:14201076:SSL routines:tls_choose_sigalg:no suitable signature algorithm

The code execution flow is:
inside tls_post_process_client_hello()

  1. s->cert->shared_sigalgslen was set to non-zero by tls1_set_shared_sigalgs()
  2. s->cert->cert_cb() is called to lookup new SSL_CTX based on SSL_get_servername()
  3. When new SSL_CTX found, SSL_set_SSL_CTX(s, new_ctx) is called.
  4. inside SSL_set_SSL_CTX(s, new_ctx) , s->cert->shared_sigalgslen was set to 0 when ssl_cert_free(ssl->cert) is called.
  5. later, when tls_choose_sigalg() is called, it errors out at t1_lib.c:2646
                if (i == s->cert->shared_sigalgslen) {
                    if (!fatalerrs)
                        return 1;
                    SSLfatal(s, SSL_AD_HANDSHAKE_FAILURE,
                             SSL_F_TLS_CHOOSE_SIGALG,
                             SSL_R_NO_SUITABLE_SIGNATURE_ALGORITHM);
                    return 0;
                }

looks like tls1_set_shared_sigalgs() need to be called again, or called between step 4 and 5 instead at step 1.

@kaduk

This comment has been minimized.

Show comment
Hide comment
@kaduk

kaduk Sep 17, 2018

Contributor

I am having trouble matching some of your description to the code; can you confirm that it is the final 1.1.1 release (as opposed to a prerelease version) being used?

Also, independently of the actual issue here, I would suggest moving from the cert_cb to the client_hello_cb, which runs at the very start of the handshake, allowing for greater control over things like version negotiation, as well as virtually eliminating the risk of this sort of ordering issue between library handshake functioning and callback modifications.

Contributor

kaduk commented Sep 17, 2018

I am having trouble matching some of your description to the code; can you confirm that it is the final 1.1.1 release (as opposed to a prerelease version) being used?

Also, independently of the actual issue here, I would suggest moving from the cert_cb to the client_hello_cb, which runs at the very start of the handshake, allowing for greater control over things like version negotiation, as well as virtually eliminating the risk of this sort of ordering issue between library handshake functioning and callback modifications.

@shinrich

This comment has been minimized.

Show comment
Hide comment
@shinrich

shinrich Sep 18, 2018

I am seeing something very similar trying the release version of openssl 1.1.1 with Apache Traffic Server. I had not worked back to steps 1-4, but we do indeed use the SNI to select the certificate as @litespeedtech outlines. But our increase in failures definitely hit step 5 in the same function. s->cert->shared_sigalgslen == 0 and s->cert->shared_sigalgs is null at that point.

We don't see these problems in 1.1.0, and I don't think we saw them with the pre-1.1.1.

Will dig back into our cert selection logic and try to get more details. Not all of our handshakes fail, and they all do the same certificate selection, so that is puzzling.

shinrich commented Sep 18, 2018

I am seeing something very similar trying the release version of openssl 1.1.1 with Apache Traffic Server. I had not worked back to steps 1-4, but we do indeed use the SNI to select the certificate as @litespeedtech outlines. But our increase in failures definitely hit step 5 in the same function. s->cert->shared_sigalgslen == 0 and s->cert->shared_sigalgs is null at that point.

We don't see these problems in 1.1.0, and I don't think we saw them with the pre-1.1.1.

Will dig back into our cert selection logic and try to get more details. Not all of our handshakes fail, and they all do the same certificate selection, so that is puzzling.

@shinrich

This comment has been minimized.

Show comment
Hide comment
@shinrich

shinrich Sep 18, 2018

Looking into this more closely, the cases that were working for us were the default case, so SSL_set_SSL_CTX does not need to replace s->cert, and the cert->shared_sigalgs are not cleared. The values set in the earlier call to tls1_set_shared_sigalgs() are preserved (called from tls_early_post_process_client_hello). Best that I can tell, this function is not called again to set the shared->sigalgs for the replacement cert.

shinrich commented Sep 18, 2018

Looking into this more closely, the cases that were working for us were the default case, so SSL_set_SSL_CTX does not need to replace s->cert, and the cert->shared_sigalgs are not cleared. The values set in the earlier call to tls1_set_shared_sigalgs() are preserved (called from tls_early_post_process_client_hello). Best that I can tell, this function is not called again to set the shared->sigalgs for the replacement cert.

shinrich pushed a commit to shinrich/openssl that referenced this issue Sep 18, 2018

Susan Hinrichs
Set shared_sigalgs from SSL_set_SSL_CTX
When called from the certcallback there does not seem to be another
opportunity in the state machine to reset shared_sigalgs on the new
cert, so this patch just copies over the shared_sigalgs from the origin
cert.

Identified in issue openssl#7244

shinrich pushed a commit to shinrich/openssl that referenced this issue Sep 18, 2018

Susan Hinrichs
Set shared_sigalgs from SSL_set_SSL_CTX
When called from the certcallback there does not seem to be another
opportunity in the state machine to reset shared_sigalgs on the new
cert, so this patch just copies over the shared_sigalgs from the origin
cert.

Fixes openssl#7244
@shinrich

This comment has been minimized.

Show comment
Hide comment
@shinrich

shinrich Sep 18, 2018

I put up a rather ham-handed patch at PR #7256

With this change TrafficServer running against openssl 1.1.1 will successfully negotiate with client requests that use the non-default certificate.

shinrich commented Sep 18, 2018

I put up a rather ham-handed patch at PR #7256

With this change TrafficServer running against openssl 1.1.1 will successfully negotiate with client requests that use the non-default certificate.

mattcaswell added a commit to mattcaswell/openssl that referenced this issue Sep 18, 2018

Delay setting the sig algs until after the cert_cb has been called
Otherwise the sig algs are reset if SSL_set_SSL_CTX() gets called.

Fixes openssl#7244
@mattcaswell

This comment has been minimized.

Show comment
Hide comment
@mattcaswell

mattcaswell Sep 18, 2018

Member

@litespeedtech or @shinrich - do you think you could test out the patch in #7257? (An alternative approach to the one in #7256). If that works I'll try an pull together a test case for it.

Member

mattcaswell commented Sep 18, 2018

@litespeedtech or @shinrich - do you think you could test out the patch in #7257? (An alternative approach to the one in #7256). If that works I'll try an pull together a test case for it.

@mattcaswell

This comment has been minimized.

Show comment
Hide comment
@mattcaswell

mattcaswell Sep 18, 2018

Member

Incidentally this issue seems to come about because of using the certificate callback for SNI handling, rather than the intended SNI callback (or the new, and now preferred client_hello_cb as mentioned by @kaduk). The cert_cb is called relatively late in the process, whilst the SNI callback is called near the beginning of extensions processing. The client_hello_cb is called even earlier still.

Member

mattcaswell commented Sep 18, 2018

Incidentally this issue seems to come about because of using the certificate callback for SNI handling, rather than the intended SNI callback (or the new, and now preferred client_hello_cb as mentioned by @kaduk). The cert_cb is called relatively late in the process, whilst the SNI callback is called near the beginning of extensions processing. The client_hello_cb is called even earlier still.

@shinrich

This comment has been minimized.

Show comment
Hide comment
@shinrich

shinrich Sep 18, 2018

I thought we were supposed to use the cert_cb to select the certificate for the SSL object. Originally we were using the sni callback for the cert selection (back in 1.0.1 and early 1.0.2). We moved the cert selection logic to the cert_callback because we wanted to have the option of stalling the handshake processing while wandering off to find the certificate to use. At the time the SNI callback was not stallable but the cert_cb was stallable.

shinrich commented Sep 18, 2018

I thought we were supposed to use the cert_cb to select the certificate for the SSL object. Originally we were using the sni callback for the cert selection (back in 1.0.1 and early 1.0.2). We moved the cert selection logic to the cert_callback because we wanted to have the option of stalling the handshake processing while wandering off to find the certificate to use. At the time the SNI callback was not stallable but the cert_cb was stallable.

@mattcaswell

This comment has been minimized.

Show comment
Hide comment
@mattcaswell

mattcaswell Sep 18, 2018

Member

Changing the SSL_CTX and changing the cert are different (although related) things. The intention (I think) is to change the SSL_CTX in the servername callback. Typically you are swapping to an SSL_CTX that already has the new cert associated with it. If that's not the case then you can additionally update the cert at the same time (i.e. in the servername callback), or (if you want it to be stallable) do it in the cert_cb. In practice though it seems people use these callbacks interchangeably so we probably have to continue to support changing the SSL_CTX in the cert_cb.

Member

mattcaswell commented Sep 18, 2018

Changing the SSL_CTX and changing the cert are different (although related) things. The intention (I think) is to change the SSL_CTX in the servername callback. Typically you are swapping to an SSL_CTX that already has the new cert associated with it. If that's not the case then you can additionally update the cert at the same time (i.e. in the servername callback), or (if you want it to be stallable) do it in the cert_cb. In practice though it seems people use these callbacks interchangeably so we probably have to continue to support changing the SSL_CTX in the cert_cb.

@shinrich

This comment has been minimized.

Show comment
Hide comment
@shinrich

shinrich Sep 18, 2018

Hmm. I didn't realize that you could set the cert on an SSL object independent of the SSL_CTX. SSL_set_SSL_CTX() was the call in place when I first came upon the relevant Traffic Server code so that is what I have concentrated on.

I just noticed your comment about testing the patch. I must run out this evening, but I will take the patch for a spin in the morning. Thanks for the quick response!

shinrich commented Sep 18, 2018

Hmm. I didn't realize that you could set the cert on an SSL object independent of the SSL_CTX. SSL_set_SSL_CTX() was the call in place when I first came upon the relevant Traffic Server code so that is what I have concentrated on.

I just noticed your comment about testing the patch. I must run out this evening, but I will take the patch for a spin in the morning. Thanks for the quick response!

@shinrich

This comment has been minimized.

Show comment
Hide comment
@shinrich

shinrich Sep 18, 2018

Had a bit of time after all. The patch looks good. I no longer see the no suitable signature algorithm error. Will leave it run overnight.

shinrich commented Sep 18, 2018

Had a bit of time after all. The patch looks good. I no longer see the no suitable signature algorithm error. Will leave it run overnight.

@litespeedtech

This comment has been minimized.

Show comment
Hide comment
@litespeedtech

litespeedtech Sep 19, 2018

I tested patch #7257, looks good. prefer it over #7256 . Thanks for the quick fix.
We use cert_cb because we need to pause handshake, and it is compatible with BoringSSL.
Looks like client_hello_cb is a better choice now, as it can pause handshake as well.
It makes more sense to select certificate as early as possible. We will look into using that.

litespeedtech commented Sep 19, 2018

I tested patch #7257, looks good. prefer it over #7256 . Thanks for the quick fix.
We use cert_cb because we need to pause handshake, and it is compatible with BoringSSL.
Looks like client_hello_cb is a better choice now, as it can pause handshake as well.
It makes more sense to select certificate as early as possible. We will look into using that.

@shinrich

This comment has been minimized.

Show comment
Hide comment
@shinrich

shinrich Sep 19, 2018

Looks very good overnight. Closed PR #7256. Thanks for the fix.

shinrich commented Sep 19, 2018

Looks very good overnight. Closed PR #7256. Thanks for the fix.

levitte pushed a commit that referenced this issue Sep 21, 2018

Delay setting the sig algs until after the cert_cb has been called
Otherwise the sig algs are reset if SSL_set_SSL_CTX() gets called.

Fixes #7244

Reviewed-by: Ben Kaduk <kaduk@mit.edu>
(Merged from #7257)

(cherry picked from commit 524006d)

@levitte levitte closed this in 524006d Sep 21, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment