You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are facing strange occurrences in fs 1.8.7 running on Debian 9.
With very light load the main process crashes or dies.
We have ruled out OOMkiller for the signal 9 that kills it from time so that is still a mystery and any clue or hint on why this is happening would be awesome.
After compiling freeswitch with tcmalloc or jemalloc we have not seen crashes with signal 6 (ABORT).
The segfaults (signal 11) seem to point to memory buffer corruption on media recording: any help would be gladly accepted. Note: recording is from g711 to wav.
Example:
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/local/freeswitch/bin/freeswitch -u root -g root -ncwait -nonat -rp -core'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 __memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/../multiarch/memmove-vec-unaligned-erms.S:363
363 ../sysdeps/x86_64/multiarch/../multiarch/memmove-vec-unaligned-erms.S: No such file or directory.
[Current thread is 1 (Thread 0x7f240d347700 (LWP 29527))]
(gdb) bt full
#0 _memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/../multiarch/memmove-vec-unaligned-erms.S:363
No locals.
#1 0x00007f243099c808 in switch_buffer_read (buffer=0x7f240af76250, data=, datalen=datalen@entry=65536) at src/switch_buffer.c:247
reading = 65536 #2 0x00007f24309bcace in switch_core_file_write (fh=0x7f240af151b8, data=data@entry=0x7f240943c028, len=len@entry=0x7f240d345518) at src/switch_core_file.c:656
datalen_adj = 65536
status = SWITCH_STATUS_SUCCESS
asis = 0
rlen =
blen = 32768
orig_len = PRETTY_FUNCTION = "switch_core_file_write" func = "switch_core_file_write" #3 0x00007f2430a5e12e in recording_thread (thread=, obj=) at src/switch_ivr_async.c:1281
bug =
session = 0x7f24207676e8
channel = 0x7f2421659110
rh = 0x7f240af15408
bsize = 8192
samples = 192
inuse = 384
data = 0x7f240943c028 "\340\002\346\004V\v\223\bJ\v\227\r\321\n\024\v\305\f\250\016Y\v\255\022\006\024\033\025\340\024\032\025J\033 \030B\031\263\035.\032\215\037\372!,\036/ \235!\253 2"\334$\313#\230&\254)4'p$\177*\222(\271&\033*\313(K)\177'\273'\371(\373(\005'~(\240)\212!\261%\006%\223!\255" \246 \004 ^\035\246\034\t\031#\026v\024\351\022\034\024(\020T\017\206\r\214\016\215\t\302\a-\bS\003\333\375\334\376w\367\310\364\327\366?\361\337\362\201\363)\364\375\356\272\357\351\356\361\357\350\354)\354$\361\343\356\325\351\004\355@\352v\341\313\345\301\341O\341\252\342\271\336W\335R\334\026\333\334", <incomplete sequence \327>...
channels = 1
read_impl = {codec_type = SWITCH_CODEC_TYPE_AUDIO, ianacode = 0 '\000', iananame = 0x0, fmtp = 0x0, samples_per_second = 0, actual_samples_per_second = 0,
bits_per_second = 0, microseconds_per_packet = 0, samples_per_packet = 0, decoded_bytes_per_packet = 0, encoded_bytes_per_packet = 0, number_of_channels = 0 '\000',
codec_frames_per_packet = 0, init = 0x0, encode = 0x0, decode = 0x0, encode_video = 0x0, decode_video = 0x0, codec_control = 0x0, destroy = 0x0, codec_id = 0,
impl_id = 0, modname = 0x0, next = 0x0} func = "recording_thread" #4 0x00007f2430cfd3d7 in dummy_worker (opaque=0x7f2409cb9160) at threadproc/unix/thread.c:151
thread = 0x7f2409cb9160 #5 0x00007f242fee34a4 in start_thread (arg=0x7f240d347700) at pthread_create.c:456
__res =
pd = 0x7f240d347700
now =
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139792817092352, -3982925440384300703, 139792690191582, 139792690191583, 139792816844800, 3, 3959178655632394593,
3959105760450644321}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call =
pagesize_m1 =
sp =
freesize = PRETTY_FUNCTION = "start_thread" #6 0x00007f242f520d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
No locals.
We use lua to do logic call control. lua 5.2.4 in this case.
The main freeswitch process is controlled by systemctl.
Some of our freeswitches when they crash and restart are stuck into a limbo:
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/local/freeswitch/bin/freeswitch'.
#0 0x00007ff03291b603 in select () at ../sysdeps/unix/syscall-template.S:84
84 ../sysdeps/unix/syscall-template.S: No such file or directory.
[Current thread is 1 (Thread 0x7ff035397040 (LWP 28473))]
(gdb) bt full
#0 0x00007ff03291b603 in select () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1 0x00007ff034100915 in apr_sleep (t=) at time/unix/time.c:246
tv = {tv_sec = 0, tv_usec = 383914} #2 0x00007ff033e9571c in do_sleep (t=) at src/switch_time.c:164
No locals. #3 0x00007ff033ddbf02 in switch_core_runtime_loop (bg=) at src/switch_core.c:1197
No locals. #4 0x000055777b916de6 in main (argc=9, argv=) at src/switch.c:1208
pid_path = "/usr/local/freeswitch/run/freeswitch.pid", '\000' <repeats 4055 times>
pid_buffer = "28473", '\000' <repeats 26 times>
old_pid_buffer = '\000' <repeats 31 times>
pid_len = 5
old_pid_len = 0
err = 0x7ff03412c02d "Success"
nf =
do_wait =
runas_user =
runas_group =
reincarnate =
reincarnate_reexec =
fds = {3, -1}
nc = SWITCH_TRUE
pid =
i =
x =
opts =
opts_str = '\000' <repeats 1023 times>
local_argv = {0x7fff2e91de54 "/usr/local/freeswitch/bin/freeswitch", 0x7fff2e91de79 "-u", 0x7fff2e91de7c "root", 0x7fff2e91de81 "-g", 0x7fff2e91de84 "root",
0x7fff2e91de89 "-ncwait", 0x7fff2e91de91 "-nonat", 0x7fff2e91de98 "-rp", 0x7fff2e91de9c "-core", 0x0 <repeats 1015 times>}
local_argc =
arg_argv = {0x0 <repeats 128 times>}
alt_dirs =
alt_base =
log_set =
run_set =
do_kill =
priority =
flags =
ret = 0
destroy_status =
fd = 0x7ff02c04a0a0
pool = 0x7ff02c04a028
And fs_cli is of course unresponsive.
When this happens we have to manually 'systemctl restart freeswitch'.
We also have freeswitch configured to attempt recovery of the sofia stack. This might have an impact.
If anybody has suggestions or tips, that would be great.
UPDATE:
Debugging under gdb and valgrind shows that when doing an attended transfer while recording the call and retaining the recording on transfer causes the bug on channel a to be moved to channel b pointing at the same file but with different buffer handle pointing to the same memory buffer. When party c gets the call ( a transferred to c from b), if b hangs up, freeswitch crashes with memory corruption because the memory buffer used by the bug is destroyed.
The text was updated successfully, but these errors were encountered:
We are facing strange occurrences in fs 1.8.7 running on Debian 9.
With very light load the main process crashes or dies.
We have ruled out OOMkiller for the signal 9 that kills it from time so that is still a mystery and any clue or hint on why this is happening would be awesome.
After compiling freeswitch with tcmalloc or jemalloc we have not seen crashes with signal 6 (ABORT).
The segfaults (signal 11) seem to point to memory buffer corruption on media recording: any help would be gladly accepted. Note: recording is from g711 to wav.
Example:
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/local/freeswitch/bin/freeswitch -u root -g root -ncwait -nonat -rp -core'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 __memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/../multiarch/memmove-vec-unaligned-erms.S:363
363 ../sysdeps/x86_64/multiarch/../multiarch/memmove-vec-unaligned-erms.S: No such file or directory.
[Current thread is 1 (Thread 0x7f240d347700 (LWP 29527))]
(gdb) bt full
#0 _memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/../multiarch/memmove-vec-unaligned-erms.S:363
No locals.
#1 0x00007f243099c808 in switch_buffer_read (buffer=0x7f240af76250, data=, datalen=datalen@entry=65536) at src/switch_buffer.c:247
reading = 65536
#2 0x00007f24309bcace in switch_core_file_write (fh=0x7f240af151b8, data=data@entry=0x7f240943c028, len=len@entry=0x7f240d345518) at src/switch_core_file.c:656
datalen_adj = 65536
status = SWITCH_STATUS_SUCCESS
asis = 0
rlen =
blen = 32768
orig_len =
PRETTY_FUNCTION = "switch_core_file_write"
func = "switch_core_file_write"
#3 0x00007f2430a5e12e in recording_thread (thread=, obj=) at src/switch_ivr_async.c:1281
bug =
session = 0x7f24207676e8
channel = 0x7f2421659110
rh = 0x7f240af15408
bsize = 8192
samples = 192
inuse = 384
data = 0x7f240943c028 "\340\002\346\004V\v\223\bJ\v\227\r\321\n\024\v\305\f\250\016Y\v\255\022\006\024\033\025\340\024\032\025J\033 \030B\031\263\035.\032\215\037\372!,\036/ \235!\253 2"\334$\313#\230&\254)4'p$\177*\222(\271&\033*\313(K)\177'\273'\371(\373(\005'~(\240)\212!\261%\006%\223!\255" \246 \004 ^\035\246\034\t\031#\026v\024\351\022\034\024(\020T\017\206\r\214\016\215\t\302\a-\bS\003\333\375\334\376w\367\310\364\327\366?\361\337\362\201\363)\364\375\356\272\357\351\356\361\357\350\354)\354$\361\343\356\325\351\004\355@\352v\341\313\345\301\341O\341\252\342\271\336W\335R\334\026\333\334", <incomplete sequence \327>...
channels = 1
read_impl = {codec_type = SWITCH_CODEC_TYPE_AUDIO, ianacode = 0 '\000', iananame = 0x0, fmtp = 0x0, samples_per_second = 0, actual_samples_per_second = 0,
bits_per_second = 0, microseconds_per_packet = 0, samples_per_packet = 0, decoded_bytes_per_packet = 0, encoded_bytes_per_packet = 0, number_of_channels = 0 '\000',
codec_frames_per_packet = 0, init = 0x0, encode = 0x0, decode = 0x0, encode_video = 0x0, decode_video = 0x0, codec_control = 0x0, destroy = 0x0, codec_id = 0,
impl_id = 0, modname = 0x0, next = 0x0}
func = "recording_thread"
#4 0x00007f2430cfd3d7 in dummy_worker (opaque=0x7f2409cb9160) at threadproc/unix/thread.c:151
thread = 0x7f2409cb9160
#5 0x00007f242fee34a4 in start_thread (arg=0x7f240d347700) at pthread_create.c:456
__res =
pd = 0x7f240d347700
now =
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139792817092352, -3982925440384300703, 139792690191582, 139792690191583, 139792816844800, 3, 3959178655632394593,
3959105760450644321}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call =
pagesize_m1 =
sp =
freesize =
PRETTY_FUNCTION = "start_thread"
#6 0x00007f242f520d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
No locals.
We use lua to do logic call control. lua 5.2.4 in this case.
The main freeswitch process is controlled by systemctl.
Some of our freeswitches when they crash and restart are stuck into a limbo:
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/local/freeswitch/bin/freeswitch'.
#0 0x00007ff03291b603 in select () at ../sysdeps/unix/syscall-template.S:84
84 ../sysdeps/unix/syscall-template.S: No such file or directory.
[Current thread is 1 (Thread 0x7ff035397040 (LWP 28473))]
(gdb) bt full
#0 0x00007ff03291b603 in select () at ../sysdeps/unix/syscall-template.S:84
No locals.
#1 0x00007ff034100915 in apr_sleep (t=) at time/unix/time.c:246
tv = {tv_sec = 0, tv_usec = 383914}
#2 0x00007ff033e9571c in do_sleep (t=) at src/switch_time.c:164
No locals.
#3 0x00007ff033ddbf02 in switch_core_runtime_loop (bg=) at src/switch_core.c:1197
No locals.
#4 0x000055777b916de6 in main (argc=9, argv=) at src/switch.c:1208
pid_path = "/usr/local/freeswitch/run/freeswitch.pid", '\000' <repeats 4055 times>
pid_buffer = "28473", '\000' <repeats 26 times>
old_pid_buffer = '\000' <repeats 31 times>
pid_len = 5
old_pid_len = 0
err = 0x7ff03412c02d "Success"
nf =
do_wait =
runas_user =
runas_group =
reincarnate =
reincarnate_reexec =
fds = {3, -1}
nc = SWITCH_TRUE
pid =
i =
x =
opts =
opts_str = '\000' <repeats 1023 times>
local_argv = {0x7fff2e91de54 "/usr/local/freeswitch/bin/freeswitch", 0x7fff2e91de79 "-u", 0x7fff2e91de7c "root", 0x7fff2e91de81 "-g", 0x7fff2e91de84 "root",
0x7fff2e91de89 "-ncwait", 0x7fff2e91de91 "-nonat", 0x7fff2e91de98 "-rp", 0x7fff2e91de9c "-core", 0x0 <repeats 1015 times>}
local_argc =
arg_argv = {0x0 <repeats 128 times>}
alt_dirs =
alt_base =
log_set =
run_set =
do_kill =
priority =
flags =
ret = 0
destroy_status =
fd = 0x7ff02c04a0a0
pool = 0x7ff02c04a028
And fs_cli is of course unresponsive.
When this happens we have to manually 'systemctl restart freeswitch'.
We also have freeswitch configured to attempt recovery of the sofia stack. This might have an impact.
If anybody has suggestions or tips, that would be great.
UPDATE:
Debugging under gdb and valgrind shows that when doing an attended transfer while recording the call and retaining the recording on transfer causes the bug on channel a to be moved to channel b pointing at the same file but with different buffer handle pointing to the same memory buffer. When party c gets the call ( a transferred to c from b), if b hangs up, freeswitch crashes with memory corruption because the memory buffer used by the bug is destroyed.
The text was updated successfully, but these errors were encountered: