Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I want to use gperf for memory leak detection, but SRS keeps crashing. #2247

Closed
286897655 opened this issue Mar 19, 2021 · 7 comments
Closed
Assignees
Labels
Bug It might be a bug. TransByAI Translated by AI/GPT.
Milestone

Comments

@286897655
Copy link

286897655 commented Mar 19, 2021

Description'

Please ensure that the markdown structure is maintained.

Please describe the issue you encountered here.
'
Make sure to maintain the markdown structure.

  1. SRS version: 4.0.84 and it seems to be the same for the 4.0 series.

  2. The SRS log is as follows:

  3. The SRS configuration is as follows:

The simplest configuration.

Replay

How to replay bug?

Steps to reproduce the bug

1.configure --with-gb28181 --with-gperf --with-gmc && make
Steps to reproduce the bug:

  1. According to the SRS code description, use kill -10 reopen to use GMC.
  2. SRS: src/app/srs_app_st.cpp:222: void SrsFastCoroutine::stop(): Assertion `!r0' failed. (4.0.84) will assert here.
  3. src/protocol/srs_service_st.cpp:96: void srs_close_stfd(void*&): Assertion `err != -1' failed. (4.0.62) will assert here.

Expected behavior:

Under GMC, it should end normally and check the content of GMC memory detection.

Expected outcome:

Please describe your expectation.

TRANS_BY_GPT3

@286897655
Copy link
Author

286897655 commented Mar 23, 2021

valgrind can be used, but gperf-gmc cannot be used.

TRANS_BY_GPT3

@SU79840
Copy link

SU79840 commented May 26, 2021

I also encountered the same problem. When calling destroy() in SrsServer, the errno printed by st_thread_join called in srs_freep(trd_) is EDEADLK.

TRANS_BY_GPT3

@winlinvip winlinvip self-assigned this Aug 21, 2021
@winlinvip winlinvip added the Bug It might be a bug. label Aug 21, 2021
@winlinvip winlinvip added this to the SRS 4.0 release milestone Aug 26, 2021
@winlinvip
Copy link
Member

winlinvip commented Nov 16, 2021

Reproduced the issue, on Mac:

./configure --gperf=on --gmc=on --osx --jobs=10
make -j10

Then start SRS and enable GMC:

env PPROF_PATH=./objs/pprof HEAPCHECK=normal ./objs/srs -c conf/console.conf 2>gmc.log

Press CTRL+C to exit, and if a crash occurs:

^C[2021-11-17 07:33:33.547][Trace][59267][86k8f079] gmc is on, main cycle will terminate normally, signo=2
[2021-11-17 07:33:33.547][Trace][59267][86k8f079] sig=2, user terminate program, fast quit
[2021-11-17 07:33:33.822][Trace][59267][86k8f079] cleanup for quit signal fast=1, grace=0
[2021-11-17 07:33:33.822][Warn][59267][86k8f079][35] start destroy server
Abort trap: 6

View the system's console log:

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib        	0x00007fff2034a92e __pthread_kill + 10
1   libsystem_pthread.dylib       	0x00007fff203795bd pthread_kill + 263
2   libsystem_c.dylib             	0x00007fff202ce406 abort + 125
3   libsystem_c.dylib             	0x00007fff202cd7d8 __assert_rtn + 314
4   srs                           	0x0000000109383f53 SrsFastCoroutine::stop() + 483 (srs_app_st.cpp:215)
5   srs                           	0x0000000109384529 SrsFastCoroutine::~SrsFastCoroutine() + 25 (srs_app_st.cpp:150)
6   srs                           	0x0000000109383ac5 SrsFastCoroutine::~SrsFastCoroutine() + 21 (srs_app_st.cpp:149)
7   srs                           	0x0000000109383a4c SrsSTCoroutine::~SrsSTCoroutine() + 76 (srs_app_st.cpp:85)
8   srs                           	0x0000000109383ae5 SrsSTCoroutine::~SrsSTCoroutine() + 21 (srs_app_st.cpp:84)
9   srs                           	0x0000000109383b09 SrsSTCoroutine::~SrsSTCoroutine() + 25 (srs_app_st.cpp:84)
10  srs                           	0x000000010931e967 SrsServer::destroy() + 167 (srs_app_server.cpp:542)
11  srs                           	0x0000000109321fbe SrsServer::cycle() + 222 (srs_app_server.cpp:926)
12  srs                           	0x0000000109384774 SrsFastCoroutine::cycle() + 196 (srs_app_st.cpp:272)
13  srs                           	0x000000010938464d SrsFastCoroutine::pfn(void*) + 29 (srs_app_st.cpp:287)
14  srs                           	0x0000000109507ef8 _st_thread_main + 40 (sched.c:363)
15  srs                           	0x00000001095076c4 st_thread_create + 340 (sched.c:694)
16  ???                           	000000000000000000 0 + 0
17  ???                           	0x00007fb9d2304e70 0 + 140436072058480

Crash code SrsFastCoroutine::stop() + 483 (srs_app_st.cpp:215):

    if (trd) {
        void* res = NULL;
        int r0 = st_thread_join((st_thread_t)trd, &res);
        if (r0) {
            // By st_thread_join
            if (errno == EINVAL) srs_assert(!r0);
            if (errno == EDEADLK) srs_assert(!r0);

The reason is that in the coroutine of the Server, the cycle called destroy to destroy itself:

srs_error_t SrsServer::cycle() {
     err = do_cycle();
    
#ifdef SRS_GPERF_MC
    destroy();
void SrsServer::destroy() {
    srs_freep(trd_);

Of course, a coroutine cannot destroy itself, as it would result in trying to join itself, which would fail.

There is no problem with 3.0 because previously there was no coroutine started, and the function was directly called in the main coroutine.

After the modification, it is working fine.

^C[2021-11-17 08:20:03.266][Trace][63440][dp050lnh] gmc is on, main cycle will terminate normally, signo=2
[2021-11-17 08:20:03.266][Trace][63440][dp050lnh] sig=2, user terminate program, fast quit
[2021-11-17 08:20:03.272][Trace][63440][9f4ff122] cleanup when unpublish
[2021-11-17 08:20:03.272][Trace][63440][9f4ff122] cleanup when unpublish, created=1, deliver=1
[2021-11-17 08:20:03.273][Warn][63440][9f4ff122][4] 1 frames left in the queue on closing
[2021-11-17 08:20:03.273][Trace][63440][9f4ff122] TCP: before dispose resource(RtmpConn)(0x7fcd72f04190), conns=1, zombies=0, ign=0, inz=0, ind=0
[2021-11-17 08:20:03.273][Warn][63440][9f4ff122][32] client disconnect peer. ret=1009
[2021-11-17 08:20:03.273][Trace][63440][b12ru95i] TCP: clear zombies=1 resources, conns=1, removing=0, unsubs=0
[2021-11-17 08:20:03.273][Trace][63440][9f4ff122] TCP: disposing #0 resource(RtmpConn)(0x7fcd72f04190), conns=1, disposing=1, zombies=0
[2021-11-17 08:20:03.910][Trace][63440][dp050lnh] cleanup for quit signal fast=1, grace=0
[2021-11-17 08:20:03.910][Warn][63440][dp050lnh][32] start destroy server
[2021-11-17 08:20:03.910][Trace][63440][dp050lnh] fast stop all ingesters ok.
[2021-11-17 08:20:04.010][Trace][63440][dp050lnh] fast kill all ingesters ok.
[2021-11-17 08:20:04.010][Warn][63440][dp050lnh][3] ignore kill the process failed, pid=63441. err=code=1058 : kill
thread [63440][dp050lnh]: srs_kill_forced() [src/app/srs_app_utility.cpp:152][errno=3]
[2021-11-17 08:20:04.010][Trace][63440][dp050lnh] gmc is on, main cycle will terminate normally, signo=2
[2021-11-17 08:20:04.011][Warn][63440][dp050lnh][4] sleep a long time for system st-threads to cleanup.
[2021-11-17 08:20:04.904][Trace][63440][j6112653] Hybrid cpu=0.00%,0MB
[2021-11-17 08:20:07.011][Warn][63440][dp050lnh][4] system quit

TRANS_BY_GPT3

@winlinvip
Copy link
Member

winlinvip commented Nov 16, 2021

User Manual: GCP

GCP

Performance optimization, execute command:

# Build SRS with GCP
./configure --gperf=on --gcp=on && make

# Start SRS with GCP
./objs/srs -c conf/console.conf

# Or CTRL+C to stop GCP
killall -2 srs

# To analysis cpu profile
./objs/pprof --text objs/srs gperf.srs.gcp*

Output:

[root@213bb18db226 srs]# ./objs/pprof --text objs/srs gperf.srs.gcp*
Using local file objs/srs.
Using local file gperf.srs.gcp.
Total: 10 samples
       3  30.0%  30.0%        3  30.0% 0x00007ffca65fb8f4
       2  20.0%  50.0%        2  20.0% __epoll_wait_nocancel
       2  20.0%  70.0%        2  20.0% __read_nocancel@e739
       1  10.0%  80.0%        1  10.0% SrsSharedPtrMessage::SrsSharedPtrMessage
       1  10.0%  90.0%        1  10.0% __read_nocancel@efa29
       1  10.0% 100.0%        1  10.0% heap_delete
       0   0.0% 100.0%        3  30.0% SrsFastCoroutine::cycle

Note: You can refer to the usage of cpu-profiler.

GMD

Check for memory overflows and wild pointers, command:

# Build SRS with GMD.
./configure --gperf=on --gmd=on && make

# Start SRS with GMD.
env TCMALLOC_PAGE_FENCE=1 ./objs/srs -c conf/console.conf

If there is an overflow (which is not normal), an error will occur at the overflow location.

Note: You can refer to the usage in heap-defense.

Note: Please note that GMD requires linking libtcmalloc_debug.a and enabling the environment variable TCMALLOC_PAGE_FENCE.

TRANS_BY_GPT3

@winlinvip
Copy link
Member

winlinvip commented Nov 17, 2021

Currently, GMC has not been set up yet. It is recommended to use valgrind to check for memory leaks first.

GMC

Check for memory leaks, command:

# Build SRS with GMC
./configure --gperf=on --gmc=on && make

# Start SRS with GMC
env PPROF_PATH=./objs/pprof HEAPCHECK=normal ./objs/srs -c conf/console.conf 2>gmc.log 

# Or CTRL+C to stop gmc
killall -2 srs

# To analysis memory leak
cat gmc.log

Output:

Thread finding failed with -1 errno=14
Thread finding callback was interrupted or crashed; can't fix this
Aborted (core dumped)
[root@213bb18db226 srs]# 

Program terminated with signal 6, Aborted.
#0  0x00007f626ec73387 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-324.el7_9.x86_64 libgcc-4.8.5-44.el7.x86_64 libstdc++-4.8.5-44.el7.x86_64
(gdb) bt
#0  0x00007f626ec73387 in raise () from /lib64/libc.so.6
#1  0x00007f626ec74a78 in abort () from /lib64/libc.so.6
#2  0x0000000000814597 in LogPrintf (severity=-4, pat=<optimized out>, ap=ap@entry=0x7ffdef244aa8) at src/base/logging.h:209
#3  0x0000000000814639 in RAW_LOG (lvl=<optimized out>, pat=<optimized out>) at src/base/logging.h:228
#4  0x00000000008220e9 in HeapLeakChecker::IgnoreAllLiveObjectsLocked (self_stack_top=self_stack_top@entry=0x7ffdef244be0) at src/heap-checker.cc:1323
#5  0x0000000000822296 in HeapLeakChecker::DoNoLeaks (this=0x351a000, should_symbolize=should_symbolize@entry=HeapLeakChecker::SYMBOLIZE)
    at src/heap-checker.cc:1769
#6  0x0000000000822887 in HeapLeakChecker::NoGlobalLeaksMaybeSymbolize (should_symbolize=should_symbolize@entry=HeapLeakChecker::SYMBOLIZE)
    at src/heap-checker.cc:2147
#7  0x0000000000822925 in HeapLeakChecker::DoMainHeapCheck () at src/heap-checker.cc:2169
#8  0x0000000000822ad5 in HeapLeakChecker_AfterDestructors () at src/heap-checker.cc:2317
#9  0x00007f626ec76ce9 in __run_exit_handlers () from /lib64/libc.so.6
#10 0x00007f626ec76d37 in exit () from /lib64/libc.so.6
#11 0x00007f626ec5f55c in __libc_start_main () from /lib64/libc.so.6
#12 0x000000000041501b in _start ()

Note: You can refer to the usage of heap-checker.

TRANS_BY_GPT3

@winlinvip winlinvip modified the milestones: 4.0, 5.0 Dec 26, 2021
@winlinvip
Copy link
Member

Postpone to 5.0, because it's not a block issue, and seems existing bug gperftools/gperftools#265

@winlinvip
Copy link
Member

Now we switched to asan, see #3216

@winlinvip winlinvip changed the title 想用gperf做内存泄漏检测,但是srs一直崩溃 I want to use gperf for memory leak detection, but SRS keeps crashing. Jul 28, 2023
@winlinvip winlinvip added the TransByAI Translated by AI/GPT. label Jul 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug It might be a bug. TransByAI Translated by AI/GPT.
Projects
None yet
Development

No branches or pull requests

4 participants