Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debugger: kamailio freezes when dynamically setting module level(using fifo/rpc) #463

Closed
smititelu opened this issue Jan 6, 2016 · 6 comments
Labels

Comments

@smititelu
Copy link
Contributor

... if the module level is not previously set in config file: modparam("debugger", "mod_level", "core=2")

This is not happening when setting the module facility, for the above, same conditions.
This is happening also before pull-request [1] using "kamcmd dbg.mod_level core 1".
After some debugging I've noticed that this is happening when trying to set a level for a module name whose idx = hid&(_dbg_mod_table_size-1); is even number?!, but not for one that reduces to an odd number (i.e. module name "core" reduces to an even index and "corex" to odd); the idx is always in the range _dbg_mod_table_size-1 as it should be.

Trying to solve this, I commented the lock_get/release in dbg_set_mod_debug_level() and saw it's working; kamailio doesn't freeze anymore. Thus, I tried to refactor the locks in struct _dbg_mod_slot to be dynamically allocated/deallocated using lock_alloc()/destroy() without success.

I'm out of ideas. Do you have any idea what might lead to this strange deadlock?

[1] #462

@smititelu
Copy link
Contributor Author

Also I've been comparing the lock_get()/release() for level vs facility without seeing any differences. But maybe I'm missing something.

@miconda
Copy link
Member

miconda commented Jan 6, 2016

Can you attach with gdb to the locked process (probably the ctl handler) and get a backtrace from it?

@smititelu
Copy link
Contributor Author

Sorry, I overlooked gdb:
(gdb) bt
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x00007f600269d47c in futex_get (lock=0x7f5ffe91d748) at ../cfgt/../../parser/../mem/../futexlock.h:121
#2 0x00007f60026a6190 in dbg_get_mod_debug_level (mname=0x797349 "core", mnlen=4, mlevel=0x7ffc8bb3a9fc) at debugger_api.c:1479
#3 0x00000000004752c5 in get_debug_level (mname=0x797349 "core", mnlen=4) at dprint.c:164
#4 0x000000000063fc06 in fm_malloc (qmp=0x7f5ffe699000, size=37, file=0x7f60026b9b34 "debugger: debugger_api.c", func=0x7f60026bae40 <FUNCTION.8648> "dbg_set_mod_debug_level", line=1352,
mname=0x7f60026b8d3e "debugger") at mem/f_malloc.c:438
#5 0x000000000064633c in fm_shm_malloc (qmp=0x7f5ffe699000, size=37, file=0x7f60026b9b34 "debugger: debugger_api.c", func=0x7f60026bae40 <FUNCTION.8648> "dbg_set_mod_debug_level", line=1352,
mname=0x7f60026b8d3e "debugger") at mem/f_malloc.c:1088
#6 0x00007f60026a565a in dbg_set_mod_debug_level (mname=0x2aa07e0 "core", mnlen=4, mlevel=0x7ffc8bb3abfc) at debugger_api.c:1352
#7 0x00007f60026a3806 in dbg_rpc_set_mod_level (rpc=0x7f6003616660 <binrpc_callbacks>, ctx=0x7ffc8bb3acd0) at debugger_api.c:999
#8 0x00007f60033d2e77 in process_rpc_req (buf=0x2aa07c4 "\241\003\034\023\212\243\062\221\022dbg.set_mod_level", size=35, bytes_needed=0x7ffc8bb3ade8, sh=0x7ffc8bb3ae10, saved_state=0x2ab07c8)
at binrpc_run.c:675
#9 0x00007f6003404253 in handle_stream_read (s_c=0x2aa0790, idx=-1) at io_listener.c:511
#10 0x00007f6003405c8a in handle_io (fm=0x7f6006d84498, events=1, idx=-1) at io_listener.c:706
#11 0x00007f60033fd4df in io_wait_loop_epoll (h=0x7f6003617420 <io_h>, t=10, repeat=0) at ../../io_wait.h:1061
#12 0x00007f60034013c3 in io_listen_loop (fd_no=1, cs_lst=0x2a9e400) at io_listener.c:281
#13 0x00007f60033db58c in mod_child (rank=0) at ctl.c:320
#14 0x00000000005ae1cb in init_mod_child (m=0x7f6006d25060, rank=0) at sr_module.c:915
#15 0x00000000005adee8 in init_mod_child (m=0x7f6006d25660, rank=0) at sr_module.c:912
#16 0x00000000005adee8 in init_mod_child (m=0x7f6006d26d78, rank=0) at sr_module.c:912
#17 0x00000000005adee8 in init_mod_child (m=0x7f6006d272b8, rank=0) at sr_module.c:912
#18 0x00000000005adee8 in init_mod_child (m=0x7f6006d276e8, rank=0) at sr_module.c:912
#19 0x00000000005adee8 in init_mod_child (m=0x7f6006d28328, rank=0) at sr_module.c:912
#20 0x00000000005ae503 in init_child (rank=0) at sr_module.c:941
#21 0x00000000004b34c1 in main_loop () at main.c:1671
#22 0x00000000004b9c4c in main (argc=4, argv=0x7ffc8bb3b9a8) at main.c:2597

The deadlock makes sense. MDBG() in fm_malloc() issues dbg_get_mod_debug_level() which gets the same lock that dbg_set_mod_debug_level() already got.

The first option would be to never use locks for dbg_get_mod_debug_level()(because the list is just iterated through, no list mutations).
The other option is to create different get/set locks for level/facility in struct _dbg_mod_slot.

What do you think is the best option?

@miconda
Copy link
Member

miconda commented Jan 6, 2016

This is a recursive access to the same slot in debugger hash table:

  • lock the slot to update its list of items
  • do shm malloc which does internally a log message which needs to get the value of the log level of an item on that slot

Solutions:

  • make the locks reentrant (perhaps the safest in long term)
  • reorder the sequence of code so the shm malloc is not done under the lock (under lock do only insert/remove of the item, but allocation/free of item must be done outside of lock region)

@miconda miconda changed the title debugger: kamailio freezes when dynamically setting module level(using fifo/rpc), if the module level is not previously set in config file: modparam("debugger", "mod_level", "core=2") debugger: kamailio freezes when dynamically setting module level(using fifo/rpc) Jan 6, 2016
smititelu pushed a commit to smititelu/kamailio that referenced this issue Jan 7, 2016
Don't shm_malloc() while the lock is taken.
@smititelu
Copy link
Contributor Author

Added fix with the second solution on pull-request #469. Basically tested it and the freeze is not happening anymore. However, the first solution can still be logged as an enhancement issue.

If all is ok I will backport this to 4.X branches.

@miconda
Copy link
Member

miconda commented Jan 7, 2016

I am fine with second solution fix, just needs to be kept in mind that no direct or indirect or direct LOG/DBG statements insides locked regions of debugger hash table.

You can merge and close the related issues on tracker.

smititelu pushed a commit that referenced this issue Jan 7, 2016
smititelu pushed a commit that referenced this issue Jan 7, 2016
Don't shm_malloc() while the lock is taken.

(cherry picked from commit 3668618)
smititelu pushed a commit that referenced this issue Jan 7, 2016
Don't shm_malloc() while the lock is taken.

(cherry picked from commit 3668618)
smititelu pushed a commit that referenced this issue Jan 7, 2016
Don't shm_malloc() while the lock is taken.

(cherry picked from commit 3668618)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants