Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threads may accumulate when Node's garbage collector hasn't run #614

Open
gev opened this issue Apr 29, 2020 · 19 comments
Open

Threads may accumulate when Node's garbage collector hasn't run #614

gev opened this issue Apr 29, 2020 · 19 comments

Comments

@gev
Copy link

gev commented Apr 29, 2020

At server side I get an offer without any media tracks and with two data channels.
For the first RTCPConnection Nodejs creates 8 child processes.
For every next RTCPConnection Nodejs creates 7 child processes.
After closing RTCPConnection Nodejs kills only 4 (for the first) or 3 (for every next) child process.
So for every open/close cycle Nodejs create 4 child processes!

Is it ok?

@savsofts
Copy link

I have noticed same thing, don't have enough knowledge about back end.
I noticed that when new connection connected with stream it increase memory consumption but not reduce that much memory when connection disconnected.
Effect doesn't show with 10-20 connections but yesterday i tested with active connections.

My test:

  1. We started adding new connection (stream viewer) one by one and no one disconnecting.
  2. After 20 mins we reached upto 184 active connections (stream viewer) and Memory reached about 2.4 GB
  3. When some of users started leaving connection then no effect on memory even when active connection remain 120 (only small amount of memory reduce 100-150 mb)
  4. after that we are unable to add new connection , node process stop responding..

Node-webrtc is good and fulfill our requirements but this issue stopped me to use it for further and parallel our team looking for another open source solution. They started working on kurento server.

@markandrus Mark Roberts, please help us to fix this, i completed my application on which i was working on last 30 days.

@savsofts
Copy link

@gev Is this useful for us?
"By default, pipes for stdin, stdout, and stderr are established between the parent Node.js process and the spawned child. These pipes have limited (and platform-specific) capacity. If the child process writes to stdout in excess of that limit without the output being captured, the child process will block waiting for the pipe buffer to accept more data. This is identical to the behavior of pipes in the shell. Use the { stdio: 'ignore' } option if the output will not be consumed."
Reference: https://nodejs.org/api/child_process.html

@gev
Copy link
Author

gev commented Apr 30, 2020

@savsofts have you close connection on server side when user disconnected?

@savsofts
Copy link

i checked with rest api (provided by examples) connection closed automatically after some time 4-5 seconds.

@gev
Copy link
Author

gev commented Apr 30, 2020

I've noticed not closed connections on the server side leave all 7 or 8 child processes. Could you try to close the peer connections on the server after disconnecting? Example:

        peer.onconnectionstatechange = () => {
          switch (peer.connectionState) {
            case FAILED:
            case DISCONNECTED: {
                // your code to close connection like
                peer.close();
              break;
            }
          }
        };

@savsofts
Copy link

yes, i captured statechange in console.log and i am getting failed and disconnect status

@gev
Copy link
Author

gev commented Apr 30, 2020

could you close a connection on failed or disconnected status and check a memory usage?

@gev
Copy link
Author

gev commented Apr 30, 2020

@savsofts @markandrus this simple test shows child processes leakage:

const { RTCPeerConnection } = require('wrtc');
setInterval(() => {
  const peer = new RTCPeerConnection();
  setTimeout(() => {
    peer.close();
  }, 5000);
}, 10000);

@gev
Copy link
Author

gev commented Apr 30, 2020

this code add 4 child processes every interval:

const { RTCPeerConnection } = require('wrtc');
setInterval(() => {
  const peer = new RTCPeerConnection();
  peer.close();
}, 10000);

and this only 2 child processes:

const { RTCPeerConnection } = require('wrtc');
setInterval(() => {
  const peer = new RTCPeerConnection();
}, 10000);

@markandrus
Copy link
Member

markandrus commented May 1, 2020

Why do you claim there is a process leak? I do not believe the webrtc.org code spawns any processes, nor does node-webrtc itself. Moreover, pgrep doesn't show any child processes with the sample provided. See here (tested with node-webtc v0.4.4 and node v10.16.0):

$ cat test.js 
const { RTCPeerConnection } = require('wrtc');
setInterval(() => {
  const peer = new RTCPeerConnection();
  peer.close();
}, 10000);

console.log('process.pid:', process.pid);

$ node test.js 
process.pid: 40086

Then, in another terminal:

$ pgrep -P 40086; echo $?
1

Per man pgrep:

EXIT STATUS
     The pgrep and pkill utilities return one of the following values upon exit:

     0       One or more processes were matched.

     1       No processes were matched.

Therefore, there should be no child processes being spawned.

@gev
Copy link
Author

gev commented May 1, 2020

@markandrus run htop and calculate the count of nodejs processes. you can press T to wiev as tree. And you wii se:
Screenshot 2020-05-01 at 11 49 00
Screenshot 2020-05-01 at 11 49 06
Screenshot 2020-05-01 at 11 49 39
Every process have own PID

or use

$ ps huH p YOUR_PID_HERE

Therefore, there should be no child processes being spawned.

It's threads. I think it has been created for ICE and DTLS/SCTP/SRTP transports

@gev
Copy link
Author

gev commented May 1, 2020

in peer_connection_factory.cc:

Screenshot 2020-05-01 at 13 52 30

@markandrus
Copy link
Member

markandrus commented May 1, 2020

OK, you are referring to a thread leak and not process leak. Indeed you can observe that the number of threads accumulates (until a garbage collection) in your example. I will update the issue title accordingly.

Analysis

In node-webrtc, we create our worker and signaling threads here and here in the PeerConnectionFactory, respectively. They both get stoppped in its destructor here. We also create a thread for TestAudioDeviceModuleImpl, which has a 1:1 mapping with each PeerConnectionFactory. A PeerConnectionFactory (and its TestAudioDeviceModuleImpl) exist so long as at least one non-stopped RTCPeerConnection exists. In your example, you create an RTCPeerConnection and immediately close it—totally fine—which means you are creating a PeerConnectionFactory on demand each time, and throwing it away—also totally fine. If node ran garbage collection continuously, these would continuously be cleaned up, as I'll show below. Instead, garbage collection does not run continuously, and so they appear to accumulate.

I made a small change locally to name these threads, and then re-ran the test, attached with lldb, and ran thread list. You can see the various threads:

$ lldb attach -p 45302
(lldb) thread list
Process 45302 stopped
* thread #1: tid = 0xd9503, 0x00007fff5faa278e libsystem_kernel.dylib`kevent + 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  thread #2: tid = 0xd9504, 0x00007fff5faa278e libsystem_kernel.dylib`kevent + 10
  thread #3: tid = 0xd9505, 0x00007fff5fa9f86a libsystem_kernel.dylib`__psynch_cvwait + 10
  thread #4: tid = 0xd9506, 0x00007fff5fa9f86a libsystem_kernel.dylib`__psynch_cvwait + 10
  thread #5: tid = 0xd9507, 0x00007fff5fa9f86a libsystem_kernel.dylib`__psynch_cvwait + 10
  thread #6: tid = 0xd9508, 0x00007fff5fa9f86a libsystem_kernel.dylib`__psynch_cvwait + 10
  thread #7: tid = 0xd9509, 0x00007fff5fa9c266 libsystem_kernel.dylib`semaphore_wait_trap + 10
  thread #8: tid = 0xd953f, 0x00007fff5fa9dbfe libsystem_kernel.dylib`__workq_kernreturn + 10
  thread #9: tid = 0xd9541, 0x00007fff5faa361a libsystem_kernel.dylib`__select + 10, name = 'PeerConnectionFactory:workerThread'
  thread #10: tid = 0xd9542, 0x00007fff5fa9f86a libsystem_kernel.dylib`__psynch_cvwait + 10, name = 'PeerConnectionFactory:signalingThread'
  thread #11: tid = 0xd9543, 0x00007fff5fa9ff32 libsystem_kernel.dylib`__semwait_signal + 10, name = 'TestAudioDeviceModuleImpl'
  thread #12: tid = 0xd95b8, 0x00007fff5faa361a libsystem_kernel.dylib`__select + 10, name = 'PeerConnectionFactory:workerThread'
  thread #13: tid = 0xd95b9, 0x00007fff5fa9f86a libsystem_kernel.dylib`__psynch_cvwait + 10, name = 'PeerConnectionFactory:signalingThread'
  thread #14: tid = 0xd95ba, 0x00007fff5fa9ff32 libsystem_kernel.dylib`__semwait_signal + 10, name = 'TestAudioDeviceModuleImpl'
  thread #15: tid = 0xd9642, 0x00007fff5faa361a libsystem_kernel.dylib`__select + 10, name = 'PeerConnectionFactory:workerThread'
  thread #16: tid = 0xd9643, 0x00007fff5fa9f86a libsystem_kernel.dylib`__psynch_cvwait + 10, name = 'PeerConnectionFactory:signalingThread'
  thread #17: tid = 0xd9644, 0x00007fff5fa9ff32 libsystem_kernel.dylib`__semwait_signal + 10, name = 'TestAudioDeviceModuleImpl'
  thread #18: tid = 0xd967b, 0x00007fff5faa361a libsystem_kernel.dylib`__select + 10, name = 'PeerConnectionFactory:workerThread'
  thread #19: tid = 0xd967c, 0x00007fff5fa9f86a libsystem_kernel.dylib`__psynch_cvwait + 10, name = 'PeerConnectionFactory:signalingThread'
  thread #20: tid = 0xd967d, 0x00007fff5fa9ff32 libsystem_kernel.dylib`__semwait_signal + 10, name = 'TestAudioDeviceModuleImpl'
  thread #21: tid = 0xd96da, 0x00007fff5faa361a libsystem_kernel.dylib`__select + 10, name = 'PeerConnectionFactory:workerThread'
  thread #22: tid = 0xd96db, 0x00007fff5fa9f86a libsystem_kernel.dylib`__psynch_cvwait + 10, name = 'PeerConnectionFactory:signalingThread'
  thread #23: tid = 0xd96dc, 0x00007fff5fa9ff32 libsystem_kernel.dylib`__semwait_signal + 10, name = 'TestAudioDeviceModuleImpl'
  thread #24: tid = 0xd9adc, 0x00007fff5faa361a libsystem_kernel.dylib`__select + 10, name = 'PeerConnectionFactory:workerThread'
  thread #25: tid = 0xd9add, 0x00007fff5fa9f86a libsystem_kernel.dylib`__psynch_cvwait + 10, name = 'PeerConnectionFactory:signalingThread'
  thread #26: tid = 0xd9ade, 0x00007fff5fa9ff32 libsystem_kernel.dylib`__semwait_signal + 10, name = 'TestAudioDeviceModuleImpl'
  thread #27: tid = 0xd9b3b, 0x00007fff5faa361a libsystem_kernel.dylib`__select + 10, name = 'PeerConnectionFactory:workerThread'
  thread #28: tid = 0xd9b3c, 0x00007fff5fa9f86a libsystem_kernel.dylib`__psynch_cvwait + 10, name = 'PeerConnectionFactory:signalingThread'
  thread #29: tid = 0xd9b3d, 0x00007fff5fa9ff32 libsystem_kernel.dylib`__semwait_signal + 10, name = 'TestAudioDeviceModuleImpl'

These threads are accumulating because node has not yet run a garbage collection. If I extend the example to explicitly run gc, we can see the threads do not accumulate:

$ cat test.js 
const { RTCPeerConnection } = require('wrtc');

setInterval(() => {
  if (typeof gc === 'function') {
    gc();
  }
  const peer = new RTCPeerConnection();
  peer.close();
}, 1000);

console.log('process.pid:', process.pid);

$ node --expose-gc test.js
process.pid: 45333

$ lldb attach -p 45333
(lldb) thread list
Process 45333 stopped
* thread #1: tid = 0xda9d8, 0x00007fff5faa278e libsystem_kernel.dylib`kevent + 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  thread #2: tid = 0xda9d9, 0x00007fff5faa278e libsystem_kernel.dylib`kevent + 10
  thread #3: tid = 0xda9da, 0x00007fff5fa9f86a libsystem_kernel.dylib`__psynch_cvwait + 10
  thread #4: tid = 0xda9db, 0x00007fff5fa9f86a libsystem_kernel.dylib`__psynch_cvwait + 10
  thread #5: tid = 0xda9dc, 0x00007fff5fa9f86a libsystem_kernel.dylib`__psynch_cvwait + 10
  thread #6: tid = 0xda9dd, 0x00007fff5fa9f86a libsystem_kernel.dylib`__psynch_cvwait + 10
  thread #7: tid = 0xda9de, 0x00007fff5fa9c266 libsystem_kernel.dylib`semaphore_wait_trap + 10
  thread #8: tid = 0xda9e7, 0x00007fff5fa9dbfe libsystem_kernel.dylib`__workq_kernreturn + 10
  thread #9: tid = 0xda9f1, 0x00007fff5fa9f86a libsystem_kernel.dylib`__psynch_cvwait + 10
  thread #10: tid = 0xda9f2, 0x00007fff5fa9f86a libsystem_kernel.dylib`__psynch_cvwait + 10
  thread #11: tid = 0xda9f3, 0x00007fff5fa9f86a libsystem_kernel.dylib`__psynch_cvwait + 10
  thread #12: tid = 0xda9f4, 0x00007fff5fa9f86a libsystem_kernel.dylib`__psynch_cvwait + 10
  thread #13: tid = 0xdaa44, 0x00007fff5faa361a libsystem_kernel.dylib`__select + 10, name = 'PeerConnectionFactory:workerThread'
  thread #14: tid = 0xdaa45, 0x00007fff5fa9f86a libsystem_kernel.dylib`__psynch_cvwait + 10, name = 'PeerConnectionFactory:signalingThread'
  thread #15: tid = 0xdaa46, 0x00007fff5fa9ff32 libsystem_kernel.dylib`__semwait_signal + 10, name = 'TestAudioDeviceModuleImpl'

Questions/Notes

  • Are there some options that can force node to run garbage collection more frequently?
  • Could we close threads earlier than the PeerConnectionFactory destructor in PeerConnectionFactory::Release — yes.

@markandrus markandrus changed the title Nodejs processes leakage Threads may accumulate when Node's garbage collector hasn't run May 1, 2020
@gev
Copy link
Author

gev commented May 1, 2020

And for this code

  const peer = new RTCPeerConnection();
  peer.close();

(when close connection at same time) I have segmentation fault!

@gev
Copy link
Author

gev commented May 3, 2020

@markandrus can I help?

@savsofts
Copy link

savsofts commented May 4, 2020

@gev @markandrus pstree -p shows 648 node thread with 120 concurrent broadcast stream.
My question is that 648 thread effects main node thread? thats why webpage not loading after some time?
i mean main thread reached at maximum strength? i am using 16 core cpu 32 GB ram.. also tried gc clean
Screenshot_16

Screenshot_15

@savsofts
Copy link

savsofts commented May 4, 2020

This error message show when process stop responding..
some time this error doesn't effects but some time it stop responding main process..
Not sure what is the issue..?
Screenshot_18

@brubbel
Copy link

brubbel commented Feb 5, 2021

gc.collect() after closing the RTCPeerConnection does the trick. Otherwise, in my case I end up with hundreds of threads and increasing cpu usage over time.

Could we close threads earlier than the PeerConnectionFactory destructor

This may be the correct solution.

@temaivanoff
Copy link

temaivanoff commented Oct 1, 2021

@markandrus Hi, we also drew on the memory leak.
We decided to test the behavior with a call to global.gc and the problem was solved.
Testing time is two days, and there are no problems yet. If something changes, I will report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants