Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Massive performance degradation in OpenSsl 3.0 if used in a heavy multi threaded server application #17064

Open
thkdev2 opened this issue Nov 18, 2021 · 127 comments
Assignees
Labels
branch: master Merge to master branch branch: 3.0 Merge to openssl-3.0 branch branch: 3.1 Merge to openssl-3.1 branch: 3.2 Merge to openssl-3.2 triaged: bug The issue/pr is/fixes a bug triaged: performance The issue/pr reports/fixes a performance concern

Comments

@thkdev2
Copy link

thkdev2 commented Nov 18, 2021

We have updated our multi threaded server application from OpenSsl 1.0.2 to 3.0 which is working but nevertheless unusable.
We are using mainly the AES and SHA method.
Our code is running on Windows and Linux.
We always use a self built library of a release version.
If we do a load test we have many parallel instances which are working in different threads.
Using the OpenSsl 1.0.2 on a 32 core machine results on a load of about 7% (of 100%) in Windows and about 800% (of 3200%) in Linux.
Only switching to the OpenSsl 3.0.0 results on a load of about 21% (of 100%) in Windows and about 3000% (of 3200%) in Linux with the same test.
The Linux version performs so badly that our overload/blocking checker kills the server!
So the OpenSsl 3.0 is absolutely unusable from our perspective.

I took a look with a profiler in linux and found out that the most time is spent in the pthread wait functions called by CRYPTO_THREAD_read_lock, CRYPTO_THREAD_unlock and CRYPTO_THREAD_write_lock called by ossl_lib_ctx_ge_data and others.
I know that the times also counted if the threads are sleeping.
But of course it's clear that all the threads are colliding at such a hot code path which is de facto single threaded now.
And this is a major change to the 1.x version where the instances running in different threads are really run in parallel.

I don't know if there is something we can do in our use to improve the situation.
Otherwise we see this as a serious bug preventing us to use the OpenSsl 3.0.

Let me know if you need further information.

@thkdev2 thkdev2 added the issue: bug report The issue was opened to report a bug label Nov 18, 2021
@t8m t8m added branch: 3.0 Merge to openssl-3.0 branch branch: master Merge to master branch triaged: bug The issue/pr is/fixes a bug and removed issue: bug report The issue was opened to report a bug labels Nov 18, 2021
@t8m
Copy link
Member

t8m commented Nov 18, 2021

Let me know if you need further information.

It would be really helpful to describe what are the concrete operations you do in the multiple parallel threads.

@mattcaswell
Copy link
Member

Are you able to get stack traces for particularly "hot" locks? It would be useful to see what codepaths are encountering this and might suggest ways to improve performance. For example algorithm fetching can be quite an expensive operation. Doing explicit fetches up front and the using the pre-fetched algorithm could have performance benefits and avoid a lot of code requiring locking.

@thkdev2
Copy link
Author

thkdev2 commented Nov 18, 2021

Let me know if you need further information.

It would be really helpful to describe what are the concrete operations you do in the multiple parallel threads.

Thank you for your fast response.

Since our server is a kind of a media server we use the crypto stuff mainly for SRTP.
This means we call the appropriate functions to encrypt/decrypt AES and calculate/verify SHA.
So we have EVP_CipherInit_ex, EVP_CipherUpdate and EVP_CipherFinal_ex for encrypting/decrypting and EVP_DigestInit_ex, EVP_DigestUpdate and EVP_DigestFinal_ex for the hash calculation.

@thkdev2
Copy link
Author

thkdev2 commented Nov 18, 2021

Are you able to get stack traces for particularly "hot" locks? It would be useful to see what codepaths are encountering this and might suggest ways to improve performance. For example algorithm fetching can be quite an expensive operation. Doing explicit fetches up front and the using the pre-fetched algorithm could have performance benefits and avoid a lot of code requiring locking.

Yes, the algorithm fetch is the problematic operation.
I could only provide a screenshot at the moment.
I hope it is helpful.
image

@mattcaswell
Copy link
Member

The stack traces don't quite supply enough "depth" to see where they are being called from. However it does look like it is related to fetching.

If you have code like this:

void *thread_worker(void *arg)
{
    int i = 0;

    for (int i = 0; i < 1000; i++) {
        EVP_CipherInit_ex(ctx, EVP_aes128_cbc(), null, key, iv, e);
        /* more stuff */
    }
}

int main(void)
{
    /* Create lots of threads based on thread_worker */
}

Then you might want to consider refactoring it to look more like this:

EVP_CIPHER *aes128cbc;

void *thread_worker(void *arg)
{
    int i = 0;

    for (int i = 0; i < 1000; i++) {
        EVP_CipherInit_ex(ctx, aes128cbc, null, key, iv, e);
        /* more stuff */
    }
}

int main(void)
{
    aes128cbc = EVP_CIPHER_fetch(NULL, "AES-128-CBC", NULL);
    /* Create lots of threads based on thread_worker */
}

Similarly with digest fetches.

@thkdev2
Copy link
Author

thkdev2 commented Nov 18, 2021

Thank you for your suggestion!
Yes, our code looks more or less as you wrote in your first code snippet.
I try if I could change it so that the algorithms will be fetched only once and let you know if it helps.

@kroeckx
Copy link
Member

kroeckx commented Nov 18, 2021 via email

@kroeckx
Copy link
Member

kroeckx commented Nov 18, 2021 via email

@paulidale
Copy link
Contributor

The underlying problem is implicit fetching not caching. Implicit fetching is done late and always calls fetch. Fetches are cached but can be expensive -- here it is lock contention causing problems. I suspect that there is only one library context and accesses to it are locked, and some locks are write: hence contention. This is expected in a heavy multi threaded server.

This was a known trade-off that was made a long time back in order to maintain compatibility: old applications will run but not necessarily be fast. The update to calling fetch up front wasn't considered to be too onerous.

@thkdev2
Copy link
Author

thkdev2 commented Nov 19, 2021

I tried what @mattcaswell suggested.
Unfortunately without any success.
The profiler log didn't change.
I even verified the generated assembly code to be sure.
So I guess this isn't the problem.

I could try to find out what is calling all this locking stuff.
If you could give me some suggestions how to enable some more tracing or statistic.
I can also do some local test patches because we self build the library.

@mattcaswell
Copy link
Member

Hmm.

Are you able to expand out ossl_lib_ctx_get_data from the screenshot you showed so we can see which callers are causing the problem?

@thkdev2
Copy link
Author

thkdev2 commented Nov 19, 2021

Yes, can do that.
image

@mattcaswell
Copy link
Member

Ok...keep going! The presence of evp_generic_fetch shows us that there is still a fetch happening somewhere. Keep expanding that out so we can find the source of the fetch.

@thkdev2
Copy link
Author

thkdev2 commented Nov 19, 2021

The previous screenshots are from the original version.
The attached text file contains all expanded calls now from a profile with the version with your suggested patch.
You can find the call tree at the end of the file.
Of course the init functions are still called, only fetching of the AES and SHA algorithms should not be there anymore.
profile_0006.zoom.txt

@t8m
Copy link
Member

t8m commented Nov 19, 2021

Seems the contention is on fetching the HMAC algorithm from the legacy EVP_PKEY_CTX implementation and the fetch of the digest algorithm for the HMAC. The first thing could be removed by using the EVP_MAC API directly. Unfortunately the digest algorithm fetch would be still there. Perhaps preinitializing the EVP_MAC_CTX once and duplicating it in the threads would help?

@mattcaswell
Copy link
Member

Yes, as @t8m says it appears to be due mostly to the EVP_DigestSignInit call in m5t::CSrtp::HashMsgAuthenticationCodeWithRoc. This way of doing MACs is considered legacy (although not actually deprecated). Using the new EVP_MAC APIs directly should be much more performant. See:

https://www.openssl.org/docs/man3.0/man3/EVP_MAC_init.html

Avoiding the digest fetch is trickier but @t8m's method should work, i.e. create an EVP_MAC_CTX, call EVP_MAC_CTX_set_params() to set the digest to use (which does the fetch) - just do that once, and then each time you need to use it call EVP_MAC_CTX_dup() to get a new EVP_MAC_CTX pre-initialised with the fetched digest. The main issue to be careful of is that a single EVP_MAC_CTX must not be shared between threads.

There also seems to be an EVP_CIPHER_fetch still happening. This is happening in an EVP_CipherInit_ex call in m5t::CAesMitosFw::Begin - did you miss changing this one?

@thkdev2
Copy link
Author

thkdev2 commented Nov 19, 2021

@mattcaswell The function m5t::CAesMitosFw::Begin is the only function which is used for AES and which I changed.
I can try to figure out in the debugger from where the call is triggered.

I will also try the rest of your suggestions.

Even if this should work I see this only as a half workaround.

We have also the requirement to use the FIPS provider which could be enabled during the runtime.
So I would have to cache at least all algorithms 2 times, once with and once without FIPS.
And I have to deal with the decision which one to use.
Probably this is feasible if I read the documentation again and find the right place.

But additional we also use another 3rd party library, the Google GRPC lib which itself also uses the OpenSsl.
I don't know what exactly they do but I have the apprehension that we run in the same problems there if they haven't already adopted their code.

And also every other client working in this way has to overcome this problem with all these workarounds.
So I would suggest to think about implementing something inside the library to overcome these problems if this should be possible.

@t8m
Copy link
Member

t8m commented Nov 19, 2021

I suppose to make the fetches lockless we would have to add some kind of OSSL_LIB_CTX_freeze() kind of API that would disallow load/unload of providers into the libctx. Then the cached implementations could be just under a read lock.

@mattcaswell
Copy link
Member

I actually think we should consider refactoring the OSSL_LIB_CTX code. It's based around CRYPTO_EX_DATA which makes conceptual sense - bit it makes the locking considerably harder than it needs to be. Take a look at the locking in ossl_lib_ctx_get_data and then back slowly away. If we could get away without having to take the ctx->lock all the time (which I think might be possible) that might be a big win.

@thkdev2
Copy link
Author

thkdev2 commented Nov 19, 2021

I commented out to much code for the testing purpose. :(
After reenabling I see still the calls to EVP_CIPHER_fetch ending at the appropriate ossl_xxx functions which are calling the (un)locking.
Was it expected that they are not called anymore?

@mattcaswell
Copy link
Member

Was it expected that they are not called anymore?

They should not need to be called. In the code snippet I posted way back above in this thread, the only fetch that should be happening is at the start up. Subsequent to that EVP_CipherInit_ex calls should be passing in the EVP_CIPHER that you previously fetched. Since its already been fetched it should not need to fetch again

@thkdev2
Copy link
Author

thkdev2 commented Nov 19, 2021

I built a Windows debug version now.
As you can see in the attached call stack EVP_CIPHER_fetch is always called from EVP_CipherInit_ex regardless if we have already a cipher or not:
image

@paulidale
Copy link
Contributor

That call only happens if the cipher isn't from a provider. I.e. it is one that hasn't been fetched already. If you pre-fetch the cipher, that code path is not taken. Line 166 is where this magic happens.

@mattcaswell
Copy link
Member

@thkdev2 - are you sure you are passing the pre-fetched cipher to the EVP_CipherInit_ex call? As @paulidale says - on line 166 cipher->prov should not be NULL if you are passing a pre-fetched cipher.

@thkdev2
Copy link
Author

thkdev2 commented Nov 22, 2021

Thank you both for the hint.
The cipher wasn't properly fetched.
I fixed this, now it is.
However, situation didn't really changed.
I'm now in the process of pre-loading the hash algorithm.
Only pre-fetching doesn't seem to help here.
I try to follow your instructions for this from above.

@mike-zukowski
Copy link

mike-zukowski commented Jan 31, 2024

It's unbearable. OpenSSL 1.1.1 is now considered outdated and v3 is unusable.

All older versions (including 1.1.1, 1.1.0, 1.0.2, 1.0.0 and 0.9.8) are now out of support and should not be used. Users of these older versions are encouraged to upgrade to 3.2 or 3.0 as soon as possible.

https://www.openssl.org/source/

@nhorman
Copy link
Contributor

nhorman commented May 22, 2024

Is there a status update that I can get here? Reading through this, it seems like the problem still exits in some form, but several improvements have been made. I resurrected the perf tests that @hlandau made earlier and re-ran them again on my system and performance for those seems to be at parity.
I'd like to try drive this to closure if possible. It would be helpful

  1. To have a clear statement as to where performance sits with any of the applications referenced in here on openssl 3.3.0 vs openssl 1.0.2 and 1.1.1

  2. If possible, a description of where the slowdown might be (something more granular than connections per second if possble. A perf run comparison (or the windows equivalant) is would be ideal.

@rsbeckerca
Copy link
Contributor

@nhorman I have a chance to retest two different treading models in the next few weeks. Would you like data from this performance test? I might not be able to share publicly, but you can reach me OOB. I would need details on what @hlandau did.

@nhorman
Copy link
Contributor

nhorman commented May 22, 2024

@hlandau tests are here

I would like performance numbers please, but while they are interesting, I'm not sure their relevant to this issue (though I'm not 100% sure). From what I read on this issue, this particular issue might be constrained to windows. Part of the issue in re-reading this is that I feel like we've lost the focus of the thread here. We started with a description of the problem, fixed a bunch of things in the interim, and now I'm not sure where the disposition of the issue is. So as much as new numbers, I was hoping to get a refresh on the problem statement, as to what the original reporters are experiencing in terms of performance, on what platforms, with what workloads, etc

@glic3rinu
Copy link

glic3rinu commented May 23, 2024

I think you are right @nhorman i haven't tested 3.3.0 extensively yet, but my preliminary tests indicate that concurrency is now on the same order of magnitude when comparing to 1.1.1. My use cases are around using python with gevent/threads and doing loads of http requests. I wrote a Dockerfile to be able to build python using either openssl version and then run a simple test script: https://gist.github.com/glic3rinu/0878f9c2d1e72dc07932325bca8f6a4a

@nhorman
Copy link
Contributor

nhorman commented May 23, 2024

oh, that is helpful @glic3rinu , thank you. That makes these larger scale test cases significantly more accessible. I think if I can reproduce similar results using that we can perhaps consider this issue closed and open new issues for subsequent performance degradation

@glic3rinu
Copy link

glic3rinu commented May 23, 2024

I ran some quick tests against www.google.com, 3K requests with 300 concurrent threads (I was being rate limited by google so those are all 302 body-less responses, more variation could be probably observed with bigger http responses, but not sure how to quickly test it):

  • 1.1.1 194.7 req/sec
  • 3.3.0 141.6 req/sec
  • 3.0.12 55.1 req/sec

@nhorman
Copy link
Contributor

nhorman commented May 23, 2024

Ok, so I'm reading that as a 27% decrease from 1.1.1 in 3.3.0, which is much better than 3.0.12, but still not great. I don't suppose in that testing harness you are able to run perf on the 3.3.0 server and report the flame graph back, are you?

@glic3rinu
Copy link

glic3rinu commented May 23, 2024

unfortunately i am not very familiar with C profiling, I did try to profile it with the python's builtin cProfile and as expected the major timing differences are happening inside openssl C bindings, but i don't get to see whats going on inside just the entry calls. e.g.

version | ncalls | tottime | percall | cumtime | percall | filename:lineno(function)

1.1.1  3000    19.46   0.006487    19.46   0.006487    ~:0(<method 'set_default_verify_paths' of '_ssl._SSLContext' objects>)
3.3.0  3000    22.78   0.007594    22.78   0.007594    ~:0(<method 'set_default_verify_paths' of '_ssl._SSLContext' objects>)
# note requests library creates a new ssl context per request 

1.1.1  77094/77087 0.3912  5.074e-06   0.3921  5.086e-06   ~:0(<built-in method __new__ of type object at 0x555555a5ff50>)
3.3.0  77094/77087 2.082   2.7e-05     2.082   2.701e-05   ~:0(<built-in method __new__ of type object at 0x555555a5ff50>)

1.1.1  6258    3.826   0.0006114   3.826   0.0006114   ~:0(<method 'do_handshake' of '_ssl._SSLSocket' objects>)
3.3.0  6209    4.212   0.0006784   4.212   0.0006784   ~:0(<method 'do_handshake' of '_ssl._SSLSocket' objects>)

@mattcaswell
Copy link
Member

# note requests library creates a new ssl context per request

@nhorman this is an actionable thing we can work on. Creation of an SSL_CTX is known to be expensive. At the time of writing 3.0 we implemented certain caches in the SSL_CTX with the thinking that SSL object creation is common, but SSL_CTX creation is less frequent. Since then we have come to realise that many applications create an SSL_CTX per connection. A better model is probably to move the caches into the OSSL_LIB_CTX instead. With the changes made by @Sashan in #24414 we now have a mechanism for accessing lib ctx data indexes from inside libssl which was previously a barrier to pursuing this.

@mattcaswell
Copy link
Member

We should also have a variant of our handshake performance test that reflects this model of connection creation.

@nhorman
Copy link
Contributor

nhorman commented May 24, 2024

I've reproduced the results that @glic3rinu (or close to it, 281 req/sec on openssl 1.1.1v, 230 req/sec on openssl 3.3.0)

Noteable hotspots as reported by perf using the reproducer in the gist here https://gist.github.com/glic3rinu/0878f9c2d1e72dc07932325bca8f6a4a

openssl 3.3.0:

+   39.81%     0.00%  python3.11  [.] by_file_ctrl_ex                                                                     ◆
+   39.81%     0.00%  python3.11  [.] X509_load_cert_crl_file_ex                                                          ▒
+   39.81%     0.00%  python3.11  [.] X509_STORE_set_default_paths_ex                                                     ▒
+   39.06%     0.01%  python3.11  [.] PEM_X509_INFO_read_bio_ex                                                           ▒
+   31.49%     0.02%  python3.11  [.] ASN1_item_d2i                                                                       ▒
+   31.44%     1.47%  python3.11  [.] asn1_item_embed_d2i                                                                 ▒
+   31.15%     0.65%  python3.11  [.] asn1_template_noexp_d2i                                                             ▒
+   15.89%     0.04%  python3.11  [.] x509_pubkey_ex_d2i_ex                                                               ▒
+   14.19%     0.01%  python3.11  [.] OSSL_DECODER_from_data                                                              ▒
+   14.00%     0.01%  python3.11  [.] OSSL_DECODER_from_bio                                                               ▒
+   13.96%     0.08%  python3.11  [.] decoder_process                                                                     ▒
+   13.72%     0.04%  python3.11  [.] spki2typespki_decode                                                                ▒
+   11.64%     0.09%  python3.11  [.] x509_name_ex_d2i                                                                    ▒
+   10.56%     0.04%  python3.11  [.] der2key_decode                                                                      ▒
+    7.72%     0.62%  python3.11  [.] x509_name_canon                                                                     ▒
+    7.62%     0.22%  python3.11  [.] PEM_read_bio_ex                                                                     ▒
+    5.61%     0.02%  python3.11  [.] d2i_PUBKEY_int                                                                      ▒
+    5.17%     3.62%  python3.11  [.] EVP_DecodeUpdate         

openssl 1.1.1v:

+   32.72%     0.00%  python3.11  [.] by_file_ctrl                                                                        ◆
+   32.72%     0.01%  python3.11  [.] X509_load_cert_crl_file                                                             ▒
+   32.72%     0.00%  python3.11  [.] X509_STORE_set_default_paths                                                        ▒
+   32.25%     0.01%  python3.11  [.] PEM_X509_INFO_read_bio                                                              ▒
+   23.15%     0.01%  python3.11  [.] ASN1_item_d2i                                                                       ▒
+   23.12%     1.50%  python3.11  [.] asn1_item_embed_d2i                                                                 ▒
+   21.96%     0.60%  python3.11  [.] asn1_template_noexp_d2i                                                             ▒
+   12.77%     0.08%  python3.11  [.] x509_name_ex_d2i                                                                    ▒
+    9.44%     0.28%  python3.11  [.] PEM_read_bio_ex                                                                     ▒
+    8.44%     0.70%  python3.11  [.] x509_name_canon                                                                     ▒
+    6.35%     4.42%  python3.11  [.] EVP_DecodeUpdate                                                                    ▒
+    5.66%     0.04%  python3.11  [.] pubkey_cb                                                                           ▒
+    5.03%     0.21%  python3.11  [.] OPENSSL_sk_pop_free                                                                 ▒
+    5.02%     0.01%  python3.11  [.] x509_pubkey_decode        

this suggests to me that we're slowing down most significantly in our loading/and decoding of X509 certificates. I believe that @vdukhovni merged a patch here #24140 that may help with this. going to try the head of the master branch for comparison

@nhorman
Copy link
Contributor

nhorman commented May 24, 2024

hmm simmilar results on the master branch, so no help there. Looking a little more closely at where our cycle samples are comming from:

openssl 1.1.1v:

-   32.72%     0.00%  python3.11  [.] by_file_ctrl                                                                        ◆
   - 32.72% by_file_ctrl                                                                                                  ▒
      - 32.71% X509_load_cert_crl_file                                                                                    ▒
         - 32.24% PEM_X509_INFO_read_bio                                                                                  ▒
            - 22.34% ASN1_item_d2i                                                                                        ▒
               - 22.31% asn1_item_embed_d2i                                                                               ▒
                  - 21.26% asn1_template_noexp_d2i                                                                        ▒
                     - 21.19% asn1_item_embed_d2i                                                                         ▒
                        - 18.98% asn1_template_noexp_d2i                                                                  ▒
                           - 12.46% x509_name_ex_d2i                                                                      ▒
                              + 8.04% x509_name_canon                                                                     ▒
                              + 3.63% ASN1_item_ex_d2i                                                                    ▒
                           + 6.32% asn1_item_embed_d2i                                                                    ▒
                        + 1.83% asn1_template_ex_d2i                                                                      ▒
                  + 0.70% asn1_item_embed_new                                                                             ▒
            - 9.34% PEM_read_bio_ex                                                                                       ▒
               - 6.33% EVP_DecodeUpdate                                                                                   ▒
                    1.91% evp_decodeblock_int                                                                             ▒
               - 1.37% BIO_puts                                                                                           ▒
                  + 1.01% mem_write            

openssl-master:

-   40.20%     0.00%  python3.11  [.] by_file_ctrl_ex                                                                     ◆
   - by_file_ctrl_ex                                                                                                      ▒
      - 40.20% X509_load_cert_crl_file_ex                                                                                 ▒
         - 39.39% PEM_X509_INFO_read_bio_ex                                                                               ▒
            - 30.41% ASN1_item_d2i                                                                                        ▒
               - 30.40% asn1_item_embed_d2i                                                                               ▒
                  - 30.10% asn1_template_noexp_d2i                                                                        ▒
                     - 30.04% asn1_item_embed_d2i                                                                         ▒
                        - 27.74% asn1_template_noexp_d2i                                                                  ▒
                           - 16.03% asn1_item_embed_d2i                                                                   ▒
                              - 15.09% x509_pubkey_ex_d2i_ex                                                              ▒
                                 + 13.39% OSSL_DECODER_from_data                                                          ▒
                                   0.81% OSSL_DECODER_CTX_new_for_pkey                                                    ▒
                                 + 0.55% ASN1_item_ex_d2i                                                                 ▒
                           + 11.54% x509_name_ex_d2i                                                                      ▒
                        + 1.91% asn1_template_ex_d2i                                                                      ▒
            - 7.74% PEM_read_bio_ex                                                                                       ▒
               + 5.24% EVP_DecodeUpdate                                                                                   ▒
               + 1.21% BIO_puts                                                                                           ▒
            + 0.77% X509_new_ex                                                                                           ▒
         + 0.72% X509_STORE_add_cert 

Need to look at this closely, but this appears to suggest that the addition of the decoder code is primarily responsible for the slowdown here, specifically the OSSL_DECODER_from_data function and its subordonates.

@glic3rinu
Copy link

glic3rinu commented May 24, 2024

Hey I updated the test script to be able to reuse ssl context across calls (using urllib library instead of requests). I've also made a few other changes to make the runtime more consistent (timeouts and better thread pool). Preliminary results between both versions seem to be on par when reusing context 🥳

OpenSSL 3.3.0 9 Apr 2024 - 238.3 req/sec
OpenSSL 1.1.1v  1 Aug 2023 - 236.7 req/sec

@nhorman
Copy link
Contributor

nhorman commented May 24, 2024

i concur, I've rerun here with the new tests, and gotten simmilar results (387 req/s on 1.1.1v and 386 req/s with 3.3.0)

So it seems @mattcaswell was correct in that reusing contexts was the slowdown here (perf, perhaps corroboratingly shows a signficant reduction in the number of cycles spent in OSSL_DECODER_from_data)

So I suppose the take aways here are:

Am I missing anything here? Is there additional work to be done on this ticket, or can it be closed?

@davidben
Copy link
Contributor

Reuse SSL_CTX whenever possible

This was discussed in another bug, but this is not good general advice. SSL_CTX sharing reflects application semantics.

  1. It is convenient for callers to configure a host of common options on SSLs once
  2. More importantly, TLS has features, foremost resumption, that share state across connections. SSL_CTX exists to facilitate that sharing. There are many complex considerations (security, privacy, etc.) around when an application should and should not share this.

Simply saying "Reuse SSL_CTX whenever possible" forgets that there was a purpose to this API. OpenSSL 3.x reused it to work around some performance regressions, but it is only a workaround. It does not work for applications whose semantics demand that they make separate SSL_CTXs. Making new SSL_CTXs is an important use case for OpenSSL 3.x to performantly support, if it wishes to be a backwards-compatible, robust, and full-featured tookit for TLS.

@mattcaswell
Copy link
Member

mattcaswell commented May 27, 2024

@nhorman as I said above "A better model is probably to move the caches into the OSSL_LIB_CTX instead" - so we should ensure we have an issue to do this work.

@mattcaswell
Copy link
Member

The DECODER slow down also still warrants further work. While we can do optimisations to avoid calling decoders in certain circumstances there will still be workloads that need to use them.

@nhorman
Copy link
Contributor

nhorman commented May 27, 2024

@mattcaswell I agree on the decoder work, but aren't we already tracking that here openssl/project#100

@mattcaswell
Copy link
Member

Ah - yes, right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch: master Merge to master branch branch: 3.0 Merge to openssl-3.0 branch branch: 3.1 Merge to openssl-3.1 branch: 3.2 Merge to openssl-3.2 triaged: bug The issue/pr is/fixes a bug triaged: performance The issue/pr reports/fixes a performance concern
Projects
Status: Refine
Development

Successfully merging a pull request may close this issue.