Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling Node.Js On Multi-Core Systems #625

Closed
mkhahani opened this issue Nov 26, 2016 · 5 comments

Comments

Projects
None yet
3 participants
@mkhahani
Copy link
Contributor

commented Nov 26, 2016

Yesterday we had a big webinar using Licode (proudly) on a strong server with 16 core Intel Xeon 5470 cpu. There was one publisher (audio+video). When the number of participants reached 100 then we started experiencing packet loss and connection drops!

I discovered that Node.js (so Licode) uses only one of the cpu cores on a multi-core server! Fortunately Node.js, as of version 6.0, has added the Cluster module which takes advantages of multi-processor/multi-core environment. But sadly Licode does not support newer versions of Node.js. So is there any schedule for it, upgrading Node.js and adding support for clustering? Or is there any consideration I'm not aware of? This is a serious problem IMO.

Anyway, one solution is to create several virtual servers and distribute Licode on them assigning one of the cpu cores each. Any one who has a better solution to use the full power of a multi-core system or useful experiences on the subject please share with us here.

Thanks.

@zevarito

This comment has been minimized.

Copy link
Contributor

commented Nov 26, 2016

@mkhahani

This comment has been minimized.

Copy link
Contributor Author

commented Dec 2, 2016

@zevarito
I'm a big fan of htop utility for process monitoring. It has many features and options that can be customized to be matched with your needs.

The following image is a screenshot of htop when running a webinar on a server with 12 cpu cores. I've added the CPU column (first column) which shows the core number. As you can see the master erizoJS (highlighted) is using the most cpu time (44.9%) and is sitting on one of the cores (No. 3 here). The forked erizoJS processes have been distributed well on all cpu cores. So yes, I think the problem is erizoJS master process that can not use all the server capacity.

htop-licode

I have an idea in mind. Is that possible to distribute Licode on a single dedicated server (instead of creating several VPSs) but with multiple IPs and then launch several erizoAgents per cpu core assigning each an IP? This way we can have 12 Licode instances (for example) each on a different cpu core which lets us to use the whole cpu power.

I also found PM2 useful for scaling a process over all the CPUs but I'm not sure if it can be used for Licode.

@jcague

This comment has been minimized.

Copy link
Contributor

commented Dec 2, 2016

@mkhahani There's no ErizoJS master and ErizoJS slaves, all of them are isolated. ErizoJS uses multiple CPU cores, at least the most CPU intensive part of it (the C++ addon inside, which is called Erizo), and according to our tests in production we haven't seen much problems. Actually, we're currently using too many threads in cases like yours with tens of participants, so we're improving that in recent PRs.

Other components like Nuve, Erizo Controller and Erizo Agent are using just one single CPU, we'd like to update them in the future to start using workers if they become a bottleneck. Can you please check if they're overloaded in your case?

IMO the bottleneck in the number of concurrent clients is inside Erizo, there's a known limitation in libnice, but they're working on fixing it at this moment.

@mkhahani

This comment has been minimized.

Copy link
Contributor Author

commented Dec 3, 2016

@jcague
Thanks for the clarification. I'm just confused with the cpu load (44.9%) for the erizoJS process (highlighted on the image above) while all the cpu core are just a little busy as the cpu meters show.

The other components are taking a few time of the cpu:

htop3

@jcague

This comment has been minimized.

Copy link
Contributor

commented Feb 1, 2017

Erizojs is the MCU itself, all media packets go through its logic, so it's the most CPU intensive part of Licode.

I think discourse will be a more interesting place to discuss this kind of topics (http://discourse.lynckia.com/), so I'm closing it here. Please feel free to continue with the comments there.

@jcague jcague closed this Feb 1, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.