Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Segmentation fault on tm:t_should_relay_response #1875
I have a segfault in an average of 3 to 6 per day on tm module always on the t_should_relay_response.
No troubleshooting was done, since it happened on a production server.
The problem is random and has happened a couple of times per day. My kamailio uses tm, dialog, htable. All calls is using topoh. The server run with an average of 1000-1200 concurrent calls. I've seen this segfault happen with less than 400 concurrent calls too.
No logs available.
No SIP traffic available.
Likely to be the same issue I hunted recently and actually I just pushed a commit for it:
No much testing so far, will do more tomorrow -- anyhow, hopefully it fixes the issue.
In what I troubleshooted, the crash happened due to a race in accessing transaction when a reply for a terminated transaction (which already had a final reply received before) came at the moment wait timer was fired for that transaction (5sec later than the final reply), which resulted in destroying the transaction by the timer process, while another process was handling the late reply.
Hello @miconda , i think this patch introduced a new bug on tmx module.
Can you get from gdb the output for:
Does this crash happen often?
UPDATE: the backtrace shows that the crash happened during a retransmission timeout handling. I tried to reproduce locally with sipp, with lots of retransmission timeouts, but couldn't get any crash. Have you backported that patch or have you used the master branch?
After applying the patch i started to get this new crash, and with only one transaction the crash was already occurring. When I removed the changes made by #1875, the segfault on tmx stopped happening again.
Would you be able to try with latest master branch on a testbed or so?
The uac field inside the has an invalid value:
That is not something I touched, so it looks like an invalid write somewhere, might not be related or could be just because of other changes done meanwhile in master branch and not part of the backport.
The person that reported a similar crash like your initial post confirmed that first tests with master branch do not crash kamailio anymore. Before, the crash could be reproduced after running specific stress tests for rather short time.
However, your second crash could be due to your config and either not related to the previous one and its fix, or a side effect of the fix specific to your config.