New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault in rpc_mod_print function while shutting down server. #2433
Comments
This is a race that can show up in a multi-process application and is hard to avoid it completely if you do not "cool down" the server (stop/reduce other interactions with kamailio) -- you run RPC commands as the same time with triggering shut down. I pushed the commit referenced above (which depends on another commit to the core) to narrow down the window of this specific case, but it is not completely avoidable, as the rpc command can be in the middle of iterating some internal lists when kill is performed and some structures can be destroyed by other processes. If someone wants to propose other solutions, make PRs and can be discussed there. |
Thanks Daniel. I'll add this fix as a patch to our kamailio build. |
Updates: A few days ago we've got the same segfault on production server, but in this case it happened without shut down. I think there is a race when calling process_rpc_req function and it does not relate to server cooldown. Core dump: GNU gdb (Debian 7.12-6) 7.12.0.20161007-git Logs: 2020-08-18T12:36:03.755718-07:00 hpbx031-1.va /var/lib/ums/sbin/kamailio[6204]: INFO: <script>: sip_call_id=9bf05e23-bdab-4c7e-93fd-975f1d27d3a9@hpbx031-1.va; Received ACK in a dialog |
Was this a case of running the rpc command many times at the same time? |
I'll ask our LSA about it tomorow. Do you think it could be a problem? I thought that RPC handlers have thread safe (interprocess, in this case) access to shared memory, is it? |
This is a sequence of commands that python script runs every 10 seconds: /var/lib/ums/sbin/kamcmd -s tcp:localhost:2048 stats.get_statistics websocket: This script parses output of every command and gets needed metrics. I see at least one problem here - the script should use udp instead of tcp. |
Bump! Any updates? |
Access to shared memory is unlocked if shutdown is triggered, otherwise some worker processes can be killed while it acquired the lock, leaving kamailio shutdown worker in deadlock. So there are potential races if you run the rpc commands when you trigger the shut down. Probably you can make your stats fetch script to execute only if shutdown is not triggered, by using a state file created on shutdown command. Shut down multiprocess races with shared memory access are hard to avoid completely, being a trade between complexity and being able to do a fast restart. If anyone wants to work on this, he/she is more than welcome to propose a pull request. |
I understand, but we've got a lot of such crashes during normal kamailio operation, NOT during shutdown. This is the issue I'm concerned about. |
The problem is not reproducible on the hosts where we disabled metrics gathering on kamailio. |
Somehow I failed to notice it happens at runtime as well, I thought it was only during shut down. Open a new issue adding the output of |
Created issue #2460 |
I pushed a few commits in master branch trying to address this issue. I haven't implemented the memory mod stats, but as I could see in the code, it didn't seem to be protected for races on accessing the shm fragments. You would need to test with master branch or by using patches from the next commits: |
Description
We have segfault in Kamailio v5.3.1 installed on Debain 9.x 64 bit occured while kamailio was shutting down while our script tryed to get metric using kamcmd utility at the same time.
Troubleshooting
No troubleshooting was done, since it happened on a production server. We simply restarted the server.
Reproduction
The problem periodically happens on production servers during restart. Kamailio crashes when one of our scripts tried getting statistics about websocket and tls modules using kamcmd during server restart. As I see in core dump, shared memory was already freed when rpc_mod_print called in the child process.
Debugging Data
Log Messages
No any useful logs available.
SIP Traffic
No SIP traffic available.
Possible Solutions
Additional Information
kamailio -v
The text was updated successfully, but these errors were encountered: