-
Notifications
You must be signed in to change notification settings - Fork 918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kamailio 5.2.2 - Segmentation fault in libcrypto.so.1.1 #1860
Comments
|
I've looked at similar situations, apparently this can be caused by multi-threaded programming, accessing the socket on different threads at the same time. The solution is to make sure you're not writing to it on one thread and closing it on another but I'm at a loss as to how to check this in Kamailio. Does anyone have any suggestions? |
|
We've been testing this further.
Are there any significant drawbacks setting 'children' to 1? Thus far everything appears to be working as expected. |
|
Is libcrypto using threads and deals with sockets itself from point of view of closing them? Kamailio is multi-process and at a given moment, only one process is supposed to write to the socket. Moreover, afaik, writing to a closed socket is not causing SIGSEGV. From the log messages, it seems an error related to the random number generator from libcrypto. You should install debugging symbols for libssl/libcrypto and see if you can get more details on the related frames from gdb. Last frame related to kamailio code is no. 11, the first 0..10 frames point to libss/libcrypto, but those have no symbols. |
|
We tried changing children to 8 and the problem still occurs, 1 is the only safe value thus far. Not sure exactly on the relationship between Kamailio and Libcrypto, does Kamailio need to interact with Libcrypto in a specific order? For example, if there are multiple messages sent to Kamailio both of which require some form of interaction with Libcrypto but one thread takes slightly than another, does it matter the order in which the threads in Kamailio which are ready interact with Libcrypto? Will look at installing debug symbols for libssl/libcrypto. |
|
About the question "Are there any significant drawbacks setting 'children' to 1? Thus far everything appears to be working as expected." |
|
Have you got the chance to install the debug symbols for libssl and libcrypto? Will allow to track inside those libraries where the crash happened and see if it is related to kamailio in some way. |
|
Not yet, I took a quick look at the process but will have to come back to it at some point later this week when I have more time. We currently have limited use of Kamailio in our production environment pending resolution of this issue. |
|
We've installed the latest debug symbols for libssl and libcrypto compiled from here: We've also updated Kamailio to 5.2.1. Unfortunately the problem remains the same, see above for the updated core dump. Please let me know if there's anything more we can do. |
|
Looking at the trace, looks related to sending a reply. In 5.2.2 I pushed a fix for a case when a reply processing happened at the same when the transaction was supposed to be deleted, after the safety wait timer of 5sec. Can you try with 5.2.2 just to rule out this issue is not a side effect of the same problem? If crash still, would you be able to make a sipp scenario and a docker image to reproduce it? I tried, but could't reproduce, I guess it is a specific combination of events specific for your deployment. |
|
Unfortunately this is still a problem after the upgrade to 5.2.2. We will look at trying to re-produce this in SIPp. |
|
Haven't got around to producing this is in SIPp yet as I've been tied up with other projects and I'm a newbie to SIPp. The problem only occurs when children in Kamailio is set to anything above 1 which points towards a multi-threaded issue, as it only occurs on TLS not TCP or UDP this would appear to be an OpenSSL multi-threaded issue. As far as I can tell OpenSSL isn't multi-threaded, to use with mutli-threaded applications requires that at least two callback functions are set, locking_function and threadid_func. I searched for these functions in the Kamailio GitHub repository but couldn't find anything, is this correct? |
|
Not sure if this helps, each crash follows a slightly different path with-in 'modules/tls/tls_server.c' but always crashes in 'aes_ecb_cipher' at 'crypto/evp/e_aes.c:2699'. Here's our most recent core dump. Here's the disas of aes_ecb_cipher, it was doing a move from memory pointed to in the %rax register plus an offset of 0xf8, to the %rax register. If we look at the register, the problem is %rax is 0. |
|
Thanks for troubleshooting further. Can you try with pre-loaded library workaround that was pushed to git master yesterday? Respectively: It looks like a side effect of not having the pthread locks initialized by libssl 1.1 with process shared option. If proves to work fine, then we can approach openssl project to introduce a way to allow setting the process shared option for locks. |
|
Unfortunately that didn't fix the problem, however the core dump has changed. I've been looking at buffers and limits to see if we're perhaps hitting some kind of limit that could be triggering this and have found that increasing tls_max_connections and tcp_max_connections to 32768 appears to increase the threshold before the segmentation fault occurs. We're now able to get >20000 presence subscriptions with-out any segmentation faults. I have more testing to do before I can reach a conclusion but it appears there may be a correlation between tls_max_connections and the segmentation fault. |
|
Can you get the output of |
|
I've re-tested this morning after making no changes and the problem is as before, tls_max_connections was a red herring. |
|
What was the command you used to start kamailio? |
|
Using systemd, 'service kamailio start'. This is our Kamailio systemd config: /lib/systemd/system/kamailio.service Here is the output of |
|
Here's another core dump, mostly we're seeing: aesni_ecb_encrypt () at crypto/aes/aesni-x86_64.s:624 |
|
I've gone back to our old Vanilla configuration which we used for initial testing with Kamailio and have stripped a lot of it back for the purpose of this test. Unfortunately the problem is still present (using the same systemd script as above). I've included the configuration and back trace below in-case there's anything which helps us. Thus far I've been using multiple Jitsi softphones on Windows XP VMs with large contact lists loaded via custom XML file, I will try to reproduce this on SIPp if I can reproduce then I'll look at creating a docker image. /usr/local/kamailio/etc/kamailio/kamailio.cfg /usr/local/kamailio/etc/kamailio/kamctlrc /usr/local/kamailio/etc/kamailio/tls.cfg gdb backtrace disas aesni_ecb_encrypt i r |
|
Not sure when I get time to look deeper, it's Easter holidays in most of Europe, so likely others are busy as well. What I can suggest for the moment is to compile using libssl 1.0.x and see if works ok with it. That will confirm or rule out if it is an issue with libssl/libcrypti 1.1.x. On some systems, you can get libssl1.0.x along with libssl1.1, so if you have both you may need to tune the Makefile inside src/modules/tls to compile with 1.0.x. If you use Debian 8 for example, iirc, it has only libssl 1.0.x (or at least this is the default installed version). |
|
I'm also having trouble with kamailio and OpenSSL 1.1, with segfaults in TLS. It's not quite the same issue as described here, but I suspect might be related:
I haven't tried with OpenSSL 1.0 yet. |
|
I vaguely remember seeing something recently on slack in regards to OpenSSL
- May want to check the archives on Kamailio slack channels.
On Sat, Apr 20, 2019 at 4:01 PM Nathan Whitehorn ***@***.***> wrote:
I'm also having trouble with kamailio and OpenSSL 1.1, with segfaults in
TLS. It's not quite the same issue as described here, but I suspect might
be related:
- thread #1 <#1>, name =
'kamailio', stop reason = signal SIGSEGV
- frame #0: 0x00000008009c954d libc.so.7__free + 749 frame #1:
0x00000008026c262b libthr.so.3pthread_rwlock_destroy + 59
frame #2 <#2>:
0x0000000802b8abf6 libcrypto.so.111CRYPTO_THREAD_lock_free + 22
frame #3: 0x0000000802b784c6 libcrypto.so.111BIO_free + 182
frame #4 <#4>:
0x00000008028fb1a1 libssl.so.111SSL_free + 193 frame #5:
0x0000000802810d48 tls.sotls_h_tcpconn_clean + 1240
I haven't tried with OpenSSL 1.0 yet.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1860 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AANXT6YNNIX64DNBHRVTMADPROODHANCNFSM4GYVXBZA>
.
--
Sent from Gmail Mobile
|
|
On Sat, Apr 20, 2019 at 4:01 PM Nathan Whitehorn ***@***.***> wrote:
I'm also having trouble with kamailio and OpenSSL 1.1, with segfaults in
TLS. It's not quite the same issue as described here, but I suspect might
be related:
- thread #1 <#1>, name =
'kamailio', stop reason = signal SIGSEGV
- frame #0: 0x00000008009c954d libc.so.7__free + 749 frame #1:
0x00000008026c262b libthr.so.3pthread_rwlock_destroy + 59
frame #2 <#2>:
0x0000000802b8abf6 libcrypto.so.111CRYPTO_THREAD_lock_free + 22
frame #3: 0x0000000802b784c6 libcrypto.so.111BIO_free + 182
frame #4 <#4>:
0x00000008028fb1a1 libssl.so.111SSL_free + 193 frame #5:
0x0000000802810d48 tls.sotls_h_tcpconn_clean + 1240
I haven't tried with OpenSSL 1.0 yet.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1860 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABO7UZJ5H3BXCTYLXS2Q2O3PROODLANCNFSM4GYVXBZA>
.
_______________________________________________
Kamailio (SER) - Development Mailing List
***@***.***
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
--
Sent from Gmail Mobile
|
|
Changing the Makefile for the TLS module to use OpenSSL 1.0 fixes the problem, that at least provides us with a workaround. Just to be sure, I've repeated the process of making the TLS module using OpenSSL 1.1.1b and the problem returns, if I then clean and make the TLS module using OpenSSL 1.0.2r the problem is fixed. Thank you to all for your assistance. |
My skills are poor, but I can build from source. Could you explain in detail what file to edit before making. I would like to try using OpenSSL 1.0 |
|
These are roughly the steps that I've followed to switch to OpenSSL 1.0.2r and rebuild Kamailio TLS module. Hope this helps! First of all you need to make and install OpenSSL 1.0.2r from source. Add this line and save: Link binaries to path: Restart. Check OpenSSL path, should return '/usr/bin/openssl'. Check OpenSSL version, should return 'OpenSSL 1.0.2r 26 Feb 2019'. Modify '/usr/src/kamailio/src/modules/tls/makefile'. Changed this: To this: Make clean, make and make install: |
|
@shaunjstokes - great description, do you would mind to add this to our wiki page as well? This would be great! |
|
For OpenSSL >= 1.1x there will be also a a workaround in the upcoming 5.3 release (efdc141 and following commits). Close this one, as the reporter also found another workaround. |
Description
This occurs more frequently when we have lots of presence subscriptions which we are able to reproduce in a test environment. Thus far it appears this only occurs when using TLS as opposed to UDP\TCP.
Troubleshooting
Appears to be an issue in libcrypto.so.1.1 but not sure where to start.
Reproduction
8000+ presence subscriptions via TLS.
Debugging Data
Log Messages
SIP Traffic
Don't currently have, can capture if required.
Possible Solutions
TBD
Additional Information
kamailio -vThe text was updated successfully, but these errors were encountered: