Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tlsa + OpenSSL 3.0.x: multiple unpredictable shm corruption #3222

Closed
space88man opened this issue Aug 22, 2022 · 14 comments
Closed

tlsa + OpenSSL 3.0.x: multiple unpredictable shm corruption #3222

space88man opened this issue Aug 22, 2022 · 14 comments

Comments

@space88man
Copy link
Contributor

space88man commented Aug 22, 2022

Description

  • kamailio 5.6.1 on AlmaLinux 9 and OpenSSL 3.0.5
  • [update] kamailio 5.6.2 on AlmaLinux 9 and OpenSSL 3.0.7
  • Used tlsa to separate the OS version of OpenSSL (which is 3.0.1). The same issue happens
    with the OS version of OpenSSL
  • Generate TLS client traffic at > 200 conn/sec
  • Multiple unpredicatable types of shm corruption

Troubleshooting

  • No issue with tlsa + OpenSSL 1.1.1q, 1.1.1s; the module can sustain 500 conn/sec up to 500 persistent connections.

Reproduction

  1. Generate > 200 conn/sec up to 500 client connections
  2. Used kamailio configuration from the outbound module documentation; i.e., configure kamailio as a TLS edge proxy. To isolate the registrar and lookup() is on another kamailio system.
  3. Use a SIP load tester to generate REGISTER traffic(Expires: 600) ; at the end of the REGISTER, keep the connection alive and re- REGISTER at 300 secs.
  4. Generate traffic at > 250 conn/sec (with < 200 conn/sec the test usually succeeds)

Debugging Data

BT in comments below

Log Messages

SIP Traffic

Possible Solutions

Additional Information

5.6.1

  • Operating System:
  • AlmaLinux 9, with (OS) OpenSSL 3.0.1. I used tlsa + OpenSSL 3.0.5
  • AlmaLinux 9, tlsa + OpenSSL 3.0.7
@space88man
Copy link
Contributor Author

BT1

(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007fa59a93b4a3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2  0x00007fa59a8eed06 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007fa59a8c17d3 in __GI_abort () at abort.c:79
#4  0x00000000006fac4e in qm_debug_check_frag (qm=0x7fa588343000, f=0x7fa58cf9ff40, file=0x7fa59871b6e5 "tlsa: ../tls/tls_init.c", line=326, efile=0x881b19 "core/mem/q_malloc.c", eline=511) at core/mem/q_malloc.c:129
#5  0x00000000006fee76 in qm_free (qmp=0x7fa588343000, p=0x7fa58cf9ff78, file=0x7fa59871b6e5 "tlsa: ../tls/tls_init.c", func=0x7fa59871d090 <__func__.0> "ser_free", line=326, mname=0x7fa59871b6e0 "tlsa")
    at core/mem/q_malloc.c:511
#6  0x000000000070997c in qm_shm_free (qmp=0x7fa588343000, p=0x7fa58cf9ff78, file=0x7fa59871b6e5 "tlsa: ../tls/tls_init.c", func=0x7fa59871d090 <__func__.0> "ser_free", line=326, mname=0x7fa59871b6e0 "tlsa")
    at core/mem/q_malloc.c:1350
#7  0x00007fa59844d337 in ser_free (ptr=0x7fa58cf9ff78, fname=0x7fa59874ea41 "crypto/stack/stack.c", fline=415) at ../tls/tls_init.c:326
#8  0x00007fa598486561 in ssl_cert_clear_certs () from /usr/local/lib64/kamailio/modules/tlsa.so
#9  0x00007fa598486600 in ssl_cert_free () from /usr/local/lib64/kamailio/modules/tlsa.so
#10 0x00007fa5984946f4 in SSL_free () from /usr/local/lib64/kamailio/modules/tlsa.so
#11 0x00007fa598472a64 in tls_h_tcpconn_clean_f (c=0x7fa58cee7e90) at ../tls/tls_server.c:701
#12 0x000000000066ec91 in _tcpconn_free (c=0x7fa58cee7e90) at core/tcp_main.c:1582
#13 0x000000000066ece6 in _tcpconn_rm (c=0x7fa58cee7e90) at core/tcp_main.c:1593
#14 0x0000000000695c77 in tcpconn_destroy_all () at core/tcp_main.c:4766
#15 0x0000000000697e76 in destroy_tcp () at core/tcp_main.c:4952
#16 0x000000000041f243 in cleanup (show_status=1) at main.c:580
#17 0x0000000000420ac1 in shutdown_children (sig=15, show_status=1) at main.c:704
#18 0x0000000000423a67 in handle_sigs () at main.c:802
#19 0x00000000004308b1 in main_loop () at main.c:1900
#20 0x000000000043995c in main (argc=13, argv=0x7ffe4cbd5b98) at main.c:3078

@space88man
Copy link
Contributor Author

BT2

#0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:74
#1  0x00007fa59a91e228 in __vfprintf_internal (s=s@entry=0xf78980, format=format@entry=0x881ca0 "%s: %.*s%s%s%sBUG: qm: fragm. %p (address %p) beginning overwritten (%lx)! Memory allocator was called from %s:%u. Fragment marked by %s:%lu. Exec from %s:%u.\n", 
    ap=ap@entry=0x7ffe4cbb4210, mode_flags=mode_flags@entry=0) at vfprintf-internal.c:1647
#2  0x00007fa59a9e1c4a in __vsyslog_internal (pri=<optimized out>, fmt=0x881ca0 "%s: %.*s%s%s%sBUG: qm: fragm. %p (address %p) beginning overwritten (%lx)! Memory allocator was called from %s:%u. Fragment marked by %s:%lu. Exec from %s:%u.\n", ap=0x7ffe4cbb4210, 
    mode_flags=0) at syslog.c:229
#3  0x00007fa59a9e212a in __syslog (pri=<optimized out>, fmt=<optimized out>) at syslog.c:109
#4  0x00000000006fac27 in qm_debug_check_frag (qm=0x7fa588343000, f=0x7fa58cf9ff30, file=0x7fa59871b6e5 "tlsa: ../tls/tls_init.c", line=326, efile=0x881b19 "core/mem/q_malloc.c", eline=511) at core/mem/q_malloc.c:123
#5  0x00000000006fee76 in qm_free (qmp=0x7fa588343000, p=0x7fa58cf9ff68, file=0x7fa59871b6e5 "tlsa: ../tls/tls_init.c", func=0x7fa59871d090 <__func__.0> "ser_free", line=326, mname=0x7fa59871b6e0 "tlsa") at core/mem/q_malloc.c:511
#6  0x000000000070997c in qm_shm_free (qmp=0x7fa588343000, p=0x7fa58cf9ff68, file=0x7fa59871b6e5 "tlsa: ../tls/tls_init.c", func=0x7fa59871d090 <__func__.0> "ser_free", line=326, mname=0x7fa59871b6e0 "tlsa") at core/mem/q_malloc.c:1350
#7  0x00007fa59844d337 in ser_free (ptr=0x7fa58cf9ff68, fname=0x7fa598758da8 "providers/implementations/kdfs/hkdf.c", fline=123) at ../tls/tls_init.c:326
#8  0x00007fa5985d5537 in kdf_hkdf_reset () from /usr/local/lib64/kamailio/modules/tlsa.so
#9  0x00007fa5985d5b4e in kdf_hkdf_free () from /usr/local/lib64/kamailio/modules/tlsa.so
#10 0x00007fa598534bc4 in EVP_KDF_CTX_free () from /usr/local/lib64/kamailio/modules/tlsa.so
#11 0x00007fa5984a0758 in tls13_hkdf_expand () from /usr/local/lib64/kamailio/modules/tlsa.so
#12 0x00007fa5984a097d in derive_secret_key_and_iv () from /usr/local/lib64/kamailio/modules/tlsa.so
#13 0x00007fa5984a15c5 in tls13_change_cipher_state () from /usr/local/lib64/kamailio/modules/tlsa.so
#14 0x00007fa5984c988a in ossl_statem_server_post_work () from /usr/local/lib64/kamailio/modules/tlsa.so
#15 0x00007fa5984b82db in state_machine.part () from /usr/local/lib64/kamailio/modules/tlsa.so
#16 0x00007fa59846cab2 in tls_accept (c=0x7fa58cf03ef0, error=0x7ffe4cbd4ca4) at ../tls/tls_server.c:468
#17 0x00007fa598477e81 in tls_h_read_f (c=0x7fa58cf03ef0, flags=0x7ffe4cbd50e0) at ../tls/tls_server.c:1173
#18 0x00000000006ac4ae in tcp_read_headers (c=0x7fa58cf03ef0, read_flags=0x7ffe4cbd50e0) at core/tcp_read.c:441
#19 0x00000000006b4686 in tcp_read_req (con=0x7fa58cf03ef0, bytes_read=0x7ffe4cbd50e4, read_flags=0x7ffe4cbd50e0) at core/tcp_read.c:1469
#20 0x00000000006b97b5 in handle_io (fm=0x7fa59897d728, events=1, idx=-1) at core/tcp_read.c:1780
#21 0x00000000006a720f in io_wait_loop_epoll (h=0x979f00 <io_w>, t=2, repeat=0) at core/io_wait.h:1070
#22 0x00000000006bc539 in tcp_receive_loop (unix_sock=23) at core/tcp_read.c:1976
#23 0x000000000069c18b in tcp_init_children (woneinit=0x7ffe4cbd549c) at core/tcp_main.c:5227
#24 0x000000000042fab9 in main_loop () at main.c:1849
#25 0x000000000043995c in main (argc=13, argv=0x7ffe4cbd5b98) at main.c:3078

@space88man
Copy link
Contributor Author

BT3

(gdb) bt
#0  0x00007f6564051738 in WPACKET_close () from /usr/local/lib64/kamailio/modules/tlsa.so
#1  0x00007f6563faf1a1 in tls_construct_extensions () from /usr/local/lib64/kamailio/modules/tlsa.so
#2  0x00007f6563fc770e in tls_construct_server_hello () from /usr/local/lib64/kamailio/modules/tlsa.so
#3  0x00007f6563fb98e5 in state_machine.part () from /usr/local/lib64/kamailio/modules/tlsa.so
#4  0x00007f6563f6dab2 in tls_accept (c=0x7f6558953738, error=0x7ffc279a4854) at ../tls/tls_server.c:468
#5  0x00007f6563f78e81 in tls_h_read_f (c=0x7f6558953738, flags=0x7ffc279a4c90) at ../tls/tls_server.c:1173
#6  0x00000000006ac4ae in tcp_read_headers (c=0x7f6558953738, read_flags=0x7ffc279a4c90) at core/tcp_read.c:441
#7  0x00000000006b4686 in tcp_read_req (con=0x7f6558953738, bytes_read=0x7ffc279a4c94, read_flags=0x7ffc279a4c90) at core/tcp_read.c:1469
#8  0x00000000006b97b5 in handle_io (fm=0x7f656447e6f8, events=1, idx=-1) at core/tcp_read.c:1780
#9  0x00000000006a720f in io_wait_loop_epoll (h=0x979f00 <io_w>, t=2, repeat=0) at core/io_wait.h:1070
#10 0x00000000006bc539 in tcp_receive_loop (unix_sock=21) at core/tcp_read.c:1976
#11 0x000000000069c18b in tcp_init_children (woneinit=0x7ffc279a504c) at core/tcp_main.c:5227
#12 0x000000000042fab9 in main_loop () at main.c:1849
#13 0x000000000043995c in main (argc=13, argv=0x7ffc279a5748) at main.c:3078

@space88man
Copy link
Contributor Author

BT4

#0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:74
#1  0x00007fb4296c2228 in __vfprintf_internal (s=s@entry=0x1d2f980, format=format@entry=0x881ca0 "%s: %.*s%s%s%sBUG: qm: fragm. %p (address %p) beginning overwritten (%lx)! Memory allocator was called from %s:%u. Fragment marked by %s:%lu. Exec from %s:%u.\n", 
    ap=ap@entry=0x7ffffe23d280, mode_flags=mode_flags@entry=0) at vfprintf-internal.c:1647
#2  0x00007fb429785c4a in __vsyslog_internal (pri=<optimized out>, fmt=0x881ca0 "%s: %.*s%s%s%sBUG: qm: fragm. %p (address %p) beginning overwritten (%lx)! Memory allocator was called from %s:%u. Fragment marked by %s:%lu. Exec from %s:%u.\n", ap=0x7ffffe23d280, 
    mode_flags=0) at syslog.c:229
#3  0x00007fb42978612a in __syslog (pri=<optimized out>, fmt=<optimized out>) at syslog.c:109
#4  0x00000000006fac27 in qm_debug_check_frag (qm=0x7fb4170e7000, f=0x7fb41b2b2cd8, file=0x7fb4274bf6e5 "tlsa: ../tls/tls_init.c", line=326, efile=0x881b19 "core/mem/q_malloc.c", eline=511) at core/mem/q_malloc.c:123
#5  0x00000000006fee76 in qm_free (qmp=0x7fb4170e7000, p=0x7fb41b2b2d10, file=0x7fb4274bf6e5 "tlsa: ../tls/tls_init.c", func=0x7fb4274c1090 <__func__.0> "ser_free", line=326, mname=0x7fb4274bf6e0 "tlsa") at core/mem/q_malloc.c:511
#6  0x000000000070997c in qm_shm_free (qmp=0x7fb4170e7000, p=0x7fb41b2b2d10, file=0x7fb4274bf6e5 "tlsa: ../tls/tls_init.c", func=0x7fb4274c1090 <__func__.0> "ser_free", line=326, mname=0x7fb4274bf6e0 "tlsa") at core/mem/q_malloc.c:1350
#7  0x00007fb4271f1337 in ser_free (ptr=0x7fb41b2b2d10, fname=0x7fb4274e1030 "crypto/err/err_local.h", fline=88) at ../tls/tls_init.c:326
#8  0x00007fb4272cb666 in ERR_pop_to_mark () from /usr/local/lib64/kamailio/modules/tlsa.so
#9  0x00007fb42734b129 in ossl_prov_digest_load_from_params () from /usr/local/lib64/kamailio/modules/tlsa.so
#10 0x00007fb42738aca2 in hmac_set_ctx_params () from /usr/local/lib64/kamailio/modules/tlsa.so
#11 0x00007fb4272dde13 in EVP_Q_mac () from /usr/local/lib64/kamailio/modules/tlsa.so
#12 0x00007fb427379e75 in kdf_tls1_3_derive () from /usr/local/lib64/kamailio/modules/tlsa.so
#13 0x00007fb427244e31 in tls13_generate_secret () from /usr/local/lib64/kamailio/modules/tlsa.so
#14 0x00007fb42726d733 in ossl_statem_server_post_work () from /usr/local/lib64/kamailio/modules/tlsa.so
#15 0x00007fb42725c2db in state_machine.part () from /usr/local/lib64/kamailio/modules/tlsa.so
#16 0x00007fb427210ab2 in tls_accept (c=0x7fb41b2527a8, error=0x7ffffe25dd74) at ../tls/tls_server.c:468
#17 0x00007fb42721be81 in tls_h_read_f (c=0x7fb41b2527a8, flags=0x7ffffe25e1b0) at ../tls/tls_server.c:1173
#18 0x00000000006ac4ae in tcp_read_headers (c=0x7fb41b2527a8, read_flags=0x7ffffe25e1b0) at core/tcp_read.c:441
#19 0x00000000006b4686 in tcp_read_req (con=0x7fb41b2527a8, bytes_read=0x7ffffe25e1b4, read_flags=0x7ffffe25e1b0) at core/tcp_read.c:1469
#20 0x00000000006b97b5 in handle_io (fm=0x7fb4277216f8, events=1, idx=-1) at core/tcp_read.c:1780
#21 0x00000000006a720f in io_wait_loop_epoll (h=0x979f00 <io_w>, t=2, repeat=0) at core/io_wait.h:1070
#22 0x00000000006bc539 in tcp_receive_loop (unix_sock=21) at core/tcp_read.c:1976
#23 0x000000000069c18b in tcp_init_children (woneinit=0x7ffffe25e56c) at core/tcp_main.c:5227
#24 0x000000000042fab9 in main_loop () at main.c:1849
#25 0x000000000043995c in main (argc=13, argv=0x7ffffe25ec68) at main.c:3078

@miconda
Copy link
Member

miconda commented Aug 22, 2022

Based on various reports on tracker and mailing lists, probably the support for OpenSSL v3.0 has to be reviewed, checking their public API and internal changes.

At this moment you should use OpenSSL 1.1.x.

@space88man space88man changed the title tlsa + OpenSSL 3.0.5: multiple unpredictable shm corruption tlsa + OpenSSL 3.0.x: multiple unpredictable shm corruption Dec 20, 2022
@miconda
Copy link
Member

miconda commented Jan 23, 2023

Can you try with git master branch and setting tls moduparam lock_mode to 1?

@linuxmaniac
Copy link
Member

@space88man can you please try master now? We introduced some changes related to openssl 3.0 recently

@space88man
Copy link
Contributor Author

space88man commented Jul 6, 2023

Update — I have had good results with 5.7.1!

Tested EL9 + kamailio-tls (packaged module, i.e., not locally compiled):

  • kamailio-5.7.1-0.el9.centos.x86_64
  • openssl-3.0.7-16.el9_2.x86_64
  • concurrent 500 UAs, 500 connections/s

Same load test as with OpenSSL 1.1.1, and tls_wolfssl.

I will continue to test Debian 12 then I think this ticket can be closed.

[Update Debian 12] Working correctly with:

  • kamailio 5.7.1+bpo12
  • openssl 3.0.9-1

@space88man
Copy link
Contributor Author

space88man commented Jul 6, 2023

Can you try with git master branch and setting tls moduparam lock_mode to 1?

I did not need to use lock_mode = 1 to get this work on OpenSSL 3.0.x + kamailio 5.7.1 -— are there other mitigations in place now?

OK using - modparam("tls", "init_mode", 1)

@dilyanpalauzov
Copy link
Contributor

Do you use this thing: https://github.com/kamailio/kamailio/tree/master/src/modules/tls/utils/openssl_mutex_shared ? The documentation says on some places, it is needed, on other — it is not needed anymore.

@space88man
Copy link
Contributor Author

Do you use this thing: https://github.com/kamailio/kamailio/tree/master/src/modules/tls/utils/openssl_mutex_shared ? The documentation says on some places, it is needed, on other — it is not needed anymore.

@dilyanpalauzov : in recent versions of kamailio this is already included

@miconda
Copy link
Member

miconda commented Jul 7, 2023

@space88man: thanks for testing and providing some feedback so far!

I would rather close all the old opened issues related to openssl 3.0 because the code base changed. If you get new crashes with master branch or 5.7.1+ versions, it is better to open new one with traces relevant to the latest versions. This issue is almost one year old anyhow, many other parts of code changed.

If you disagree and feel it is still better to continue here , I am fine to reopen it.

@miconda miconda closed this as completed Jul 7, 2023
@dilyanpalauzov
Copy link
Contributor

@dilyanpalauzov : in recent versions of kamailio this is already included

Yes, but it is unclear, if this is still necessary. That is what I asked: is openssl_mutex_shared needed by your setup.

Do you mean by included, that you have D_PRELOAD=/usr/local/lib64/kamailio/openssl_mutex_shared/openssl_mutex_shared.so /usr/local/sbin/kamailio -f /usr/local/etc/kamailio/kamailio.cfg?

@space88man
Copy link
Contributor Author

Do you mean by included, that you have D_PRELOAD=/usr/local/lib64/kamailio/openssl_mutex_shared/openssl_mutex_shared.so /usr/local/sbin/kamailio -f /usr/local/etc/kamailio/kamailio.cfg?

@dilyanpalauzov — the function of openssl_mutex_shared.so is inside kamailio src/main.c
conditionally based on the macro KSR_PTHREAD_MUTEX_SHARED since 2019. This macro is set when libssl ≥ 1.1 is detected.

See src/Makefile.defs:

# libssl version greater or equal than 1.1
ifeq ($(shell [ $(LIBSSL_VERNUM) -ge 1001000 ] && echo libssl11plus), libssl11plus)
LIBSSL_SET_MUTEX_SHARED := 1
endif

endif

endif

# dlopen requires -ldl on some systems, but not others.  Until there
# is clarity on which require -ldl, add just enough ifeq to fix
# systems known not to use it.
ifeq ($(OS), netbsd)
LIBDL=""
else
LIBDL="-ldl"
endif

ifeq ($(LIBSSL_SET_MUTEX_SHARED), 1)
CC_PMUTEX_OPTS = -pthread -DKSR_PTHREAD_MUTEX_SHARED
LD_PMUTEX_OPTS = -pthread -rdynamic $(LIBDL) -Wl,-Bsymbolic-functions
else
CC_PMUTEX_OPTS =
LD_PMUTEX_OPTS =
endif

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants