Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tls: OpenSSL 3/1.1.1 is using shared ERR_STATE in all workers #3695

Closed
space88man opened this issue Jan 2, 2024 · 10 comments
Closed

tls: OpenSSL 3/1.1.1 is using shared ERR_STATE in all workers #3695

space88man opened this issue Jan 2, 2024 · 10 comments

Comments

@space88man
Copy link
Contributor

space88man commented Jan 2, 2024

Description

References: #3635

Early initialization by tls of OpenSSL 3/1.1.1 in rank 0 results in the use of shared state ERR_STATE *. Under heavy traffic the workers will corrupt the state. This is a race condition which results in intermittent crashes (as in #3635).

Initialization calls such as OPENSSL_init_ssl are responsible for creating and initializing ERR_STATE in rank 0, which is then inherited and reused by all worker processes without reinitialization. This results in memory corruption (not observable on a lightly loaded system).

This bug is much less evident with OpenSSL 1.1.1 as that version of the library has less aggressive dynamic memory management (particularly in crypto/err/err.c). Most kamailio + OpenSSL 3.x bug reports show that SIGSEGV happens mostly inside crypto/err/*.[ch].

[Update] OpenSSL 1.1.1—for a similar reason a set of thread-local variables(public_drbg, private_drbg) is the reason why there is a need for RAND replacement in tls_rand.c. These variables are inherited by the worker without proper initialization—hence failure of the RAND system if it is not replaced.


Note: this is not related to shared memory allocation contention as this protected by a (multi-process) futex. Ping @miconda re:

1a9b0b6

Since qm/fm et al are already protected by a multi-process futex this commit is redundant (it puts a pthread mutex around the futex). I have been able to reproduce OpenSSL 3 crashes with heavy loading with this commit.


The shared object in question is the thread-local ERR_STATE. It should be thread-local but:

CRITICAL: <core> [core/mem/q_malloc.c:555]: qm_free(): BUG: freeing already freed pointer
    (0x7f1d7f9c77b0), called from tls: tls_init.c: ser_free(535), first free
    tls: tls_init.c: ser_free(535) - ignoring

shows that multiple workers are accessing the same object.

Troubleshooting

N/A

Reproduction

  • verify that ERR_STATE * is initialized in rank 0
  • create heavy loading of TLS clients
  • observe shm logging errors

Debugging Data

Dump ERR_STATE * from two different processes: observe that these are identical meaning both workers are using the same struct.

# !!!!IMPORTANT!!!! err.c:err_thread_local == OpenSSL thread-local key for ERR_STATE *
(gdb) p err_thread_local
$3 = 2

# this is worker 1, process_no 7, rank 3
Reading symbols from /usr/local/kamailio/lib64/kamailio/modules/debugger.so...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f1df054e80a in epoll_wait (epfd=5, events=0x7f1db0504990, maxevents=1, timeout=2000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30        return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) p pthread_getspecific(2)
$1 = (void *) 0x7f1d700a65a0

# this is worker 2, process_no 8, rank 4
Reading symbols from /usr/local/kamailio/lib64/kamailio/modules/stun.so...
Reading symbols from /usr/local/kamailio/lib64/kamailio/modules/debugger.so...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
--Type <RET> for more, q to quit, c to continue without paging--c
0x00007f1df054e80a in epoll_wait (epfd=5, events=0x7f1db0504990, maxevents=1, timeout=2000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30        return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) p pthread_getspecific(2)
$2 = (void *) 0x7f1d700a65a0

Log Messages

CRITICAL: <core> [core/mem/q_malloc.c:555]: qm_free(): BUG: freeing already freed pointer (0x7f1d7f9c77b0), called from tls: tls_init.c: ser_free(535), first free tls: tls_init.c: ser_free(535) - ignoring

SIP Traffic

N/A

Possible Solutions

  • avoid doing OpenSSL initialization in rank 0; this will require cooperation from all modules that use OpenSSL. Alternately run all OpenSSL initialization in a worker thread—ERR_STATE * is thread-local so will not be propagated to the main thread
  • run tls hooks in a worker thread—this ensures that they get their own copy of ERR_STATE *—and will not be affected by rank 0

Additional Information

  • OpenSSL(1.1.1, 3.x) has initialize-once and initialize-once-per-thread semantics. The (per-thread-singleton) object ERR_STATE is of the type initialize-once-per-thread.

  • when kamailio does OpenSSL initialization in rank 0, the workers inherit all "initialize-once*" objects. If these objects are intended to be mutable the workers will contend for the same state

  • due to the design of the OpenSSL 3 a lot of this state (static variables/functions, one time initialization, no accessors) cannot be reset in child processes. Initialize-once-per-thread state can only be "renewed" in new threads.

  • at the point of fork(): all mutable thread-local state created in rank 0 should be uninitialized. Currently (as of OpenSSL 3.2.0) the main culprit is ERR_STATE. I haven't found any other points of contention.

  • I have a local branch based on master, it skips all OpenSSL initialization in rank 0 (by the time of fork() - pthread_getspecific(err_thread_local) is still returning NULL), and it passes all my load testing. To complicate the scenario I load outbound.so in the config and finesse ERR_STATE * initialization by running most of outbound_mod.c:mod_init in pthread_create.

Rank 0 calls that set ERR_STATE *

In kamailio start-up these are the functions that call ossl_err_state_get_int: this will set ERR_STATE in rank 0 and all child processes. This start-up does not include any other modules that use OpenSSL.

  • OPENSSL_init_ssl() in tls_h_mod_pre_init_f()
  • SSL_load_error_strings() in tls_h_mod_pre_init_f()
  • RAND_xxxxxxxxxxxx calls that enable threading in the randctx in tls_h_mod_pre_init_f()
  • tls_fix_domains_cfg() in mod_child() for PROC_INIT

Example module using OpenSSL: outbound.so - this calls ossl_err_get_state_int() in mod_init(); to avoid initializing ERR_STATE in the primary thread of rank 0, run the crypto parts of mod_init() in pthread_create(...).

@space88man space88man changed the title tls: OpenSSL 3 root cause of crashes is shared state tls: OpenSSL 3 root cause of crashes is shared ERR_STATE Jan 2, 2024
@space88man
Copy link
Contributor Author

@space88man space88man changed the title tls: OpenSSL 3 root cause of crashes is shared ERR_STATE tls: OpenSSL 3/1.1.1 root cause of crashes is shared ERR_STATE Jan 4, 2024
@space88man space88man changed the title tls: OpenSSL 3/1.1.1 root cause of crashes is shared ERR_STATE tls: OpenSSL 3/1.1.1 is using shared ERR_STATE in all workers Jan 4, 2024
@henningw
Copy link
Contributor

henningw commented Jan 4, 2024

Thank you for the detailed analysis and also the POC. I remember from a custom module that I maintained for some year that using threads in Kamailio can bring some challenges due to its multi-process architectures. So its probably need to be discussed more thoroughly. The other option you mentioned would be to just don't do the TLS initialization in rank 0, right? If we need to touch all openssl using modules anyway, maybe this is an easier and less intrusive way?

@space88man
Copy link
Contributor Author

space88man commented Jan 4, 2024

...just don't do the TLS initialization in rank 0, right? If we need to touch all openssl using modules anyway, maybe this is an easier and less intrusive way?

One solution is to have each module declare a mod_init_openssl() if needed; then provide a helper macro to run mod_init_openssl() in a transient thread inside mod_init().

// this can be done in rank 0 using
// a transient thread to sandbox OpenSSL
int mod_init(void){

// do mod_init stuff here..
// run OpenSSL init in transient thread
if(mod_init_openssl!=NULL) {
pthread_create(... mod_init_openssl, )
// do OpenSSL stuff here
pthread_join(...)
}

// do mod_init continued...

}

Then this thread will disappear after mod_init in rank 0—all the OpenSSL thread-local varables in rank 0(thread#1) will be "clean". I give this example in the POC for outbound.

BTW this study explains why even OpenSSL 1.1.1 is so odd - per child replicated SSL_CTX*, and RNG replacement with RAND_set_rand_method. The root cause is the same: there are thread-local variables in rank 0(thread#1) that are replicated in the workers—after fork() OpenSSL doesn't properly reinitialize these states. In the case of RAND there are two thread-local variables (public_drbg, private_drbg)— because these vars are inherited by the worker the RAND system will fail—hence kamailio had to replace the RAND system using tls_rand.c.

I have also gone back to look at the OpenSSL 1.1.1 implementation - by putting all initialization (SSL_CTX_new, tls_fix_domains_cfg etc) into a transient thread none of the workarounds are necessary any more(!) - in particular the tls_rand.c stuff is not needed.

To be clear, the dlsym-pthreads stuff(src/main.c) is still needed to handle multi-process locks.

If we fix the handling of rank 0 thread-local variables correctly, then all the pain that Kamailio has experience with OpenSSL over the years might be over!

@miconda
Copy link
Member

miconda commented Jan 4, 2024

@space88man: thanks for digging deep into this one!

I am fine to try the proposed approach in the PR #3696 and see how it goes, the only question would be about the impact of other libs that link behind with libssl, like libcurl or libmysqlclient. Does a similar approach needs for the modules linked with such libraries?

Regarding multi-threading in Kamailio, there are couple of modules already using threads, like lwsc, secsipid/_proc. Issues with multi-threading and multi-processing is when there are active threads at fork() time, but if I got right the PR #3696, the process with rank 0 waits for the created libssl-initialization thread to terminate. Thats why secsipid_proc was split out of secsipid, to bind to it after fork(), which then initialize the golang library (that creates internally threads for memory management). lwsc creates threads only inside child processes, which is after fork().

@oej
Copy link
Member

oej commented Jan 4, 2024

I have mentioned this before as an issue and it was a driving factor behind creating the API to the curl module. Having multiple modules using Curl that initiates OpenSSL (or non-curl modules initiating OpenSSL) will lead to problems. I remember that Kevin Fleming while working with Asterisk wrote a wrapper library that initialised OpenSSL once only for all modules. I have no idea if that still exists, but if it does, it could be an inspiration.

@miconda
Copy link
Member

miconda commented Jan 4, 2024

The tls module initializes very early the libssl, which was ok for libssl 0.9.x, 1.0.x. With libssl 1.1.x we had to import random number generator to go around some of the libssl-specific multi-threading. With libssl 3.x seems to be more impact, somehow related to what was introduced in libssl 1.1.x, but expanded to other globals.

In other words, I don't think that a wrapper library done long time ago, pre libssl 1.1.x/3.x can help nowadays.

@space88man
Copy link
Contributor Author

space88man commented Jan 4, 2024

Ok thanks guys for your feedback: I am going to proceed with a series of commits to master. These can then be reverted easily. To each commit message I will also add the label thread-local
to facilitate review and search.

@miconda you are correct regarding libssl-initialization threads in rank 0; they are run-and-done type and will complete before fork().

@miconda
Copy link
Member

miconda commented Jan 4, 2024

@space88man: one remark regarding the commit message format in the PR #3696, do not make first line like:

tls: POC for OpenSSL 3

Where I assume POC stands for proof of concept. Make it more explicit about what the commit does, maybe like:

tls: init libssl in separate thread for v3.+

@space88man
Copy link
Contributor Author

space88man commented Jan 4, 2024

I have implemented no RAND replacement in OpenSSL 1.1.1.

The initial set of commits for 1.1.1/3.x are in master. Verification:
OpenSSL 3:

  • after kamailio is started, gdb attach rank 0 PID, verify that

      (gdb) p pthread_getspecific(err_thread_local)
      # should be 0x0
    

OpenSSL 1.1.1:

  • after kamailio is started, gdb attach rank 0 PID and a worker PID, verify that

      # in rank 0 and worker rank these pointers should be different
      (gdb) p ERR_get_state()
      # in rank 0 these two values should be 0x0
      (gdb) p pthread_getspecific(public_drbg)
      (gdb) p pthread_getspecific(private_drbg)
    

@space88man
Copy link
Contributor Author

Closing now with commits on master—reopen with separate issues for OpenSSL 1.1.1/OpenSSL 3.

space88man added a commit that referenced this issue Jan 5, 2024
- the 2nd lock was put in place as defensive programming for shm contention
- GH #3695: the underlying issue is early init of thread-locals
space88man added a commit that referenced this issue Jan 9, 2024
- the 2nd lock was put in place as defensive programming for shm contention
- GH #3695: the underlying issue is early init of thread-locals

(cherry-pick from 1c70775)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants