Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Every queue has a consume rate of at best 25 messages per second. How can I increase this? #63

Closed
ppinel opened this issue Oct 17, 2016 · 26 comments
Labels

Comments

@ppinel
Copy link

ppinel commented Oct 17, 2016

Hello,

I am using SenecaJS with this plugin and RabbitMQ. When I did some benchmarking to understand why at best I had 25 messages per second handled on each queue, I found out that it takes 40ms for a message to go and come back.
For exemple, a microservice ask for an object to another microservice (Basically a mongodb query).
The act callback is called 40ms after when Mongodb query only takes 2ms to complete.
So I now understand why on RabbitMQ admin interface, I have at best 25 messages handled per second per queue: 40ms * 25 = 1 second.

I use the default config for Amqp plugin and SenecaJS. All microservices and RabbiMQ are on the same server.
Each microservice has only one instance running.

Is this a normal behaviour?
Is it possible to only increase the rate by modifying the parameters of Seneca AMQP Transport, SenacaJS and/or RabbitMQ ?

@nfantone
Copy link
Collaborator

nfantone commented Oct 17, 2016

Hi, @ppinel. Thanks for taking the time to write about your issue.

I've tested this on my side and found wildly different numbers than the ones you brought up. For instance, running the microservices on the /examples, both client.js and listener.js locally (and RabbitMQ 3.6.4 on localhost), I got up to 820+ msg/s.

See screenshot below:

screen shot 2016-10-17 at 5 03 41 pm

And while this isn't a thorough benchmark by any means, our results are so different, that I'm inclined to say there's something in your topology or .act function that's putting a lid on the throughput.

Are you connecting to the broker locally (ie.: using localhost)? Are you behind a proxy or a load balancer? Was the broker under heavy load when you measured your numbers? How many queues are declared on your vhost? How many messages are on those queues, on average? Is your callback doing anything else that could amount to a significant delay? How are you measuring the RTT?

Finally, could you run the examples and share your results? Or maybe (if this isn't too much to ask), could you profile your script and tell me exactly where exec time is spent? That would be very helpful.

Thanks again!


EDIT
I forgot to mention that I did make one small modification to the examples/client.js script: set the second setInterval argument to 0.

Also, all my tests were run using node 6.8.1, seneca@3.2.1 and seneca-amqp-transport@2.1.0 on a Macbook Pro (early 2015) 3.1 GHz Intel Core i7, 16 GB 1867 MHz DDR3.

@ppinel
Copy link
Author

ppinel commented Oct 18, 2016

Thanks for your response.

Yes I am connecting to the broker locally, not using any load balancer or proxy. The broker wasn't under heavy load because it's on my dev env and nobody is using it but me.
There is 121 queues declared with 104 corresponding to response and 17 to action.
There is a low traffic on the queue in general. Microservices are behind an API gateway and can talk with each other through SenecaJS.

To understand why my action was taking too long I just set a var with the current time at the beginning of the action and displayed at different moment of the logic the diff between now and my var.
For exemple at some point of the logic MicroserviceA act on MicroserviceB. I display the time diff at right before calling the act and another display in the callback of the act.
I also do the same system on MicroserviceB to get how many ms the logic takes to execute.
I end up with 2ms on MicroserviceB and 40ms on MicroserviceA. So from here I was wondering who was taking 38ms. When I tested the charge, I end up with a max of 25 messages per second.

I am not saying is SenecaJS, RabbitMQ or anyone fault. I just want to understand why I have this results to improve where it needs to be.
So your quick answer is much appreciated.

I will do the test asap and get back to you with results.

EDIT:

I am looking here for stats, it's maybe not the right place.
screen shot 2016-10-18 at 15 56 49

@nfantone
Copy link
Collaborator

nfantone commented Oct 18, 2016

Awesome. Your setup LGTM. Could you also include the output of RabbitMQ's web admin? In particular, I'm interested in the "Acknowledge" rate.

@ppinel
Copy link
Author

ppinel commented Oct 18, 2016

Here the exemple running on my server. I am lost oO

screen shot 2016-10-18 at 16 14 02

screen shot 2016-10-18 at 16 13 51

@ppinel
Copy link
Author

ppinel commented Oct 18, 2016

On my macbook pro:
screen shot 2016-10-18 at 16 40 03
screen shot 2016-10-18 at 16 39 24

@ppinel
Copy link
Author

ppinel commented Oct 18, 2016

I just reinstall RabbitMQ on the server and still have the same issue.
I updated node to 6.8.1 same issue.

@nfantone
Copy link
Collaborator

nfantone commented Oct 18, 2016

@ppinel All right. Numbers on your Macbook look a lot more than what I'd expect. What other differences can you think of between your local env and your server?

Consider:

  • seneca, seneca-amqp-transport versions (latest).
  • Node version (6.8.1 in both).
  • RabbitMQ version (3.6.1 in both).
  • Is your server running inside a VM or a container? (dedicated server)

Also, you did change setInterval argumento to 0 in both cases, right?

@ppinel
Copy link
Author

ppinel commented Oct 18, 2016

Yes I did changed setInterval in both cases, otherwise messages are coming to slowly and I never reach the mysterious limit.
3.6.1 on my server too, and seneca, seneca-amqp-transport versions are the same.

Regarding the server it's from a french hosting service OVH.
https://www.soyoustart.com/fr/offres/e3-ssd-2.xml or
https://www.soyoustart.com/fr/offres/e3-ssd-3.xml
So dedicated server.

@ppinel
Copy link
Author

ppinel commented Oct 18, 2016

It could be on the OS layer. Is there any limitation you had to change or met someone with issues on Debian?
I am looking at the Overview tab and nothing have crazy value as fd, socket, memory etc.

@nfantone
Copy link
Collaborator

nfantone commented Oct 18, 2016

I'm perplexed. And out of ideas.

Could you please read and check this out? Use the rabbitmqctl command in your server to check if the connection is being stalled.

@nfantone
Copy link
Collaborator

It could be on the OS layer. Is there any limitation you had to change or met someone with issues on Debian?

Although rare, it could be. Check the Process statistics panel on Overview -> (More about this node) and take a look at the thresholds in each graph. For example, mine look like this:

screen shot 2016-10-18 at 5 23 02 pm

Also, is your RabbitMQ node Disc or RAM based?

@nfantone nfantone changed the title Every queue have a consume rate with at best 25 messages per second. How can I increase this? Every queue has a consume rate of at best 25 messages per second. How can I increase this? Oct 18, 2016
@ppinel
Copy link
Author

ppinel commented Oct 18, 2016

First one is while running exemple, second one is before.

I can't tell if my RabbitMQ is disk or memory based. How can I know this?

screen shot 2016-10-18 at 18 40 01

screen shot 2016-10-18 at 18 39 03

@nfantone
Copy link
Collaborator

Ok, so no problems there.

I can't tell if my RabbitMQ is disk or memory based. How can I know this?

There' a tag under the node's title on its description page. It also appears on the Overview tab.

screen shot 2016-10-18 at 5 49 35 pm

Did you manage to check the flow status of your connection?

@ppinel
Copy link
Author

ppinel commented Oct 18, 2016

Ok disk everywhere.
Also tested on a fresh AWS EC2 on debian and same issue.

screen shot 2016-10-18 at 18 52 05

@ppinel
Copy link
Author

ppinel commented Oct 18, 2016

What's 48MB low watermark ?
screen shot 2016-10-18 at 18 54 54

@ppinel
Copy link
Author

ppinel commented Oct 18, 2016

I checked on the connections tab during a test:
screen shot 2016-10-18 at 18 58 22

@nfantone
Copy link
Collaborator

nfantone commented Oct 18, 2016

What's 48MB low watermark ?

Minimum free space needed for the node to run.


Ok, so no throttling on the connection. This looks like a non-arbitrary fixed cap of sorts. 25 is too much of a nice round number to be a coincidence.

Anything on the RabbitMQ log? It should be under /var/log.

@ppinel
Copy link
Author

ppinel commented Oct 18, 2016

Also it's the same number across server even on the aws ec2 instance.
I created a stackoverflow question: http://stackoverflow.com/questions/40114532/every-queue-has-a-consume-rate-of-at-best-25-messages-per-second-how-can-i-incr

@ppinel
Copy link
Author

ppinel commented Oct 18, 2016

Thank you for your time, I'll post an update here as soon as I have the solution.

@nfantone
Copy link
Collaborator

Thanks! Yes, if you come up with a solution, please share it. I'm intrigued now.

@ppinel
Copy link
Author

ppinel commented Oct 18, 2016

@ppinel
Copy link
Author

ppinel commented Oct 19, 2016

I hardcoded the no delay option in amqplib to true and I have 930 messages deliver per second with the example.
I know why this is usefull but maybe I could decrease the delay.

screen shot 2016-10-19 at 09 25 39

@ppinel
Copy link
Author

ppinel commented Oct 19, 2016

Found out how to set the noDelay option: { amqp: { socketOptions: { noDelay: true } } }.

@nfantone
Copy link
Collaborator

Had no idea about that noDelay flag. And I assume that if kept false, the delay in socket communication is OS dependant? That would explain your behaviour on Debian.

You learn something new everyday. Thanks for your teachings 🐐 ! Perhaps, I can include this in some FAQ section on the README.md or the wiki.

Feel free to close the issue if you think there's nothing else you can do to improve your message rate.

@ppinel
Copy link
Author

ppinel commented Oct 19, 2016

Yes it's OS dependant. Would be great if this flag could be documented.
Also, this might not be the right solution for everyone.
Thanks again for your help!

@ppinel ppinel closed this as completed Oct 19, 2016
@nfantone
Copy link
Collaborator

nfantone commented Oct 19, 2016

@ppinel After reading the whole thread on that Google Groups forum, I'm inclined to say that your solution with noDelay: true is very reasonable. And it is platform dependant, as you mentioned above. Nagle will throttle the communication on the TCP level to avoid congestion on the network, by "joining together" packets and avoid sending small segments. More on that here.

Glad you could work things out. Good work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants