Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

100% CPU and server crashes after connecting 10k clients #66

Closed
eladnava opened this issue Sep 22, 2016 · 10 comments
Closed

100% CPU and server crashes after connecting 10k clients #66

eladnava opened this issue Sep 22, 2016 · 10 comments

Comments

@eladnava
Copy link
Contributor

eladnava commented Sep 22, 2016

Hi @mcollina =)

Getting a strange crash and CPU halt with the latest Aedes (0.21.0):

/var/app/current/node_modules/aedes/node_modules/reusify/reusify.js:7
  function get () {
               ^

RangeError: Maximum call stack size exceeded
    at Object.get (/var/app/current/node_modules/aedes/node_modules/reusify/reusify.js:7:16)
    at SubscribeState.compiled (/var/app/current/node_modules/aedes/node_modules/fastfall/fall.js:38:25)
    at SubscribeState.doSubscribe (/var/app/current/node_modules/aedes/lib/handlers/subscribe.js:34:16)
    at makeCall (/var/app/current/node_modules/aedes/node_modules/fastseries/series.js:113:10)
    at SubscribeState.ResultsHolder.release (/var/app/current/node_modules/aedes/node_modules/fastseries/series.js:94:9)
    at work (/var/app/current/node_modules/aedes/node_modules/fastfall/fall.js:121:25)
    at sendRetained (/var/app/current/node_modules/aedes/lib/handlers/subscribe.js:107:5)
    at SubscribeState.subTopic (/var/app/current/node_modules/aedes/lib/handlers/subscribe.js:102:5)
    at work (/var/app/current/node_modules/aedes/node_modules/fastfall/fall.js:105:23)
    at /var/app/current/node_modules/aedes/lib/handlers/subscribe.js:64:5

Before the server crashes, the process consumed 100% CPU and is completely unresponsive.

This happens after about 10k clients connect. Keepalive interval is set at 5 minutes. Using MemoryPersistence.

Most clients do not subscribe to anything, by the way.

Oh, and I set the concurrency to 10,000. Didn't help.

@eladnava eladnava changed the title 100% cpu and server crashes after connecting 10k clients 100% CPU and server crashes after connecting 10k clients Sep 22, 2016
@mcollina
Copy link
Collaborator

Yes, this is perfectly "normal", and described in #60.
Most of the code is highly synchronous for performance reason, but it might use a process.nextTick in a couple of places (like in the memory persistence).

PR is welcomed.

@eladnava
Copy link
Contributor Author

@mcollina It's normal that Aedes is unable to handle 10k connections? Am I doing something wrong?

Or is the MemoryPersistence unable to handle 10k connections?

@mcollina
Copy link
Collaborator

It's normal that it crashes with RangeError: Maximum call stack size exceeded.
It's not looping, Aedes works heavily on the stack, specially with the in-memory mq and persistence.

node --stack-size=X solves this, where X is big enough.

wrapping the callbacks in process.nextTick in the memory persistence will probably solve your issue.

@eladnava
Copy link
Contributor Author

I ended up going with mqtt-connection since I don't need any kind of persistence. Seems to work well so far for me.

@mcollina
Copy link
Collaborator

Perfect, I'm closing this now.

@GavinDmello
Copy link
Collaborator

Hey, I'm facing a similar issue after 15k connections.The process becomes unresponsive for quite some time and then starts working again. I'm not sure if gc is the culprit here as it is synchronous and takes quite a bit of time.

@mcollina
Copy link
Collaborator

@GavinDmello with the same error? what is the memory usage? Can you produce a script to reproduce artificially?

@GavinDmello
Copy link
Collaborator

The server doesn't crash in my case and memory is well under control. It's just the CPU which is cranking up to 100 % and stays there with nothing happening(no logs) and comes back to normal after some time.

@mcollina
Copy link
Collaborator

@GavinDmello if I can reproduce, I can help fixing it. Tools like http://npm.im/0x, dtrace, perf, etc, are built to solve those kind of issues.

@cordovapolymer
Copy link
Contributor

related to #88

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants