Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aedes stops responding after 99 messages when a client has overlapping subscriptions #88

Closed
chrismoorhouse opened this issue Feb 22, 2017 · 19 comments

Comments

@chrismoorhouse
Copy link

If a create an Aedes broker and have two clients, one publishing and one subscribing and the subscribing client has overlapping subscriptions such as:

client.subscribe('000001/021/#');
client.subscribe('000001/021/000B/0001/01');

the hub stops responding after 99 messages.

This is not the case when I use mosca as the broker. I have attached my test files

Archive.zip

@cordovapolymer
Copy link
Contributor

cordovapolymer commented Feb 22, 2017

I can confirm the issue, your test reaches 99 and hangs.
When I have set {concurrency: 1000} it hanged on 1000. I think this may be related to #74 and #77 .

@GavinDmello
Copy link
Collaborator

@cordovapolymer is right. You might have to bump the concurrency value to something according to your requirement.
You can set it to 0 but I don't think that's a good idea.

@chrismoorhouse
Copy link
Author

Thanks for confirming it and testing against concurrency. If I get time I will try to debug it this weekend.

@cordovapolymer
Copy link
Contributor

cordovapolymer commented Feb 23, 2017

@GavinDmello the issue is that, once it hits the concurrency limit, it hangs completely. The correct way of operation when concurrency limit was hit, is to deliver queued messages and then continue accepting new.
Today I've encountered it in my production with concurrency 1000, I've increased it to 5000, hope @yutter will find the way to handle this situation.

@GavinDmello
Copy link
Collaborator

@cordovapolymer Have you tried {concurrency: 0}
Again, Not the best of ideas as this may lead to a leak, I think
I think @mcollina would suggest a better solution.

@cordovapolymer
Copy link
Contributor

cordovapolymer commented Feb 23, 2017

@GavinDmello , of course it will lead to a leak, as we can see from @yutter test once setMaxListeners limit is hit, it stops accepting new messages and it doesn't flush already sent ones or resets count.
Update: it hangs on {concurrency: 0} as well, does anyone use aedes in production?

@cordovapolymer
Copy link
Contributor

@mcollina @GavinDmello @pkinney @dpatekar @eladnava @renatoc @behrad @ovhemert @ralphtheninja @wuhkuh
Any hope that someone will look deeper into this? The issue is very old and it blocks from using aedes in production and development.

@wuhkuh
Copy link
Contributor

wuhkuh commented Feb 28, 2017

@cordovapolymer
Yeah, after some debugging it seems that there is a massive desync when the concurrency limit is reached. Packets are still being 'processed' but don't seem to get published.

Can you confirm that the publish events don't seem to work either? These start working once I reset the server, where it will throw a large chunk of missing packets.

I might have more time to look into this bug later this week. It's actually quite interesting at first sight.

@cordovapolymer
Copy link
Contributor

cordovapolymer commented Mar 1, 2017

Can you confirm that the publish events don't seem to work either? These start working once I reset the server, where it will throw a large chunk of missing packets.

What do you mean by resetting the server?

When I restart the server, or restart the publisher(client2.js in the @yutter example, I've changed it to publish at 1msec rate) it recovers to hang again in a same way, if I restart client1 it accepts its connection then lags for about a minute and sends a bunch of packets without interruption and hangs again. In the case it's restarted when it's receiving qos 2 packets, it throws no such packet Error: no such packet.
I have described this in #87

@GavinDmello
Copy link
Collaborator

@cordovapolymer Have you tried the solution which I mentioned in the PR with { concurrency : 0 }
. Also, it would be great if you tell me at what point (or how many messages it becomes unresponsive)

@wuhkuh
Copy link
Contributor

wuhkuh commented Mar 1, 2017

@cordovapolymer
With resetting I mean shutting down the process and restarting it using the console.

@GavinDmello
While it seems like a fun benchmark, it probably is not going to solve the underlying cause.

So here's what I'm interested in:

  • server works 'kind of' normal (aedes publish events don't work?)
  • when the amount of published packets exceed the concurrency number, no more packets are published to the subscribed client
  • the server still processes the incoming data
  • handlers are still being called
  • ?
  • clients drop connections because the keepalive is being exceeded

So a logical next step would be to check if the queue works properly. I have a hunch that the problem starts there.

I'm currently on my phone and therefore unable to test this. You could continue my research so we can get to a working solution sooner rather than later.

@GavinDmello
Copy link
Collaborator

GavinDmello commented Mar 1, 2017

The issue is not the listeners , (atleast in this case it isn't, we might need more that the default listeners for drain).
The messages are still published and they are queued by Aedes. Using 0 with my solution won't queue and deliver directly. The problem with using { concurrency : 0} before was that it could've created a memory leak because of the tight coupling.

@wuhkuh
Copy link
Contributor

wuhkuh commented Mar 1, 2017

@GavinDmello
I can confirm. The problem is that the MQEmitter isn't being drained when it starts to fill.

@cordovapolymer
Copy link
Contributor

@GavinDmello yes, try it yourself with @yutter demo, you can see it doesn't work. Also I've reviewed your latest PR.
@wuhkuh is it a bug in MQEmitter?

@wuhkuh
Copy link
Contributor

wuhkuh commented Mar 2, 2017

@cordovapolymer
At this moment I'm not sure, as I have yet to dig deeper. The problem could be more complex as this might also be the reason events are broken.
It is possible though, because at first sight it seems that the emitter should drain itself to the subscribers.

More on this issue later today.

@wuhkuh
Copy link
Contributor

wuhkuh commented Mar 2, 2017

Yes, this bug is in MQEmitter. I will send a PR over to the MQEmitter package.
Check the PR over there and copy and paste the code in order to get it working at your spot.

There's more to fix; events in particular, so it's not a production-ready branch yet.

@GavinDmello
Copy link
Collaborator

@wuhkuh Nice catch

@mcollina
Copy link
Collaborator

mcollina commented Mar 3, 2017

Aedes is fast because it respect backpressure, e.g. it does not accept new messages if its internal queue is full. But that means we should clear it in any case. It seems there is a problem with overlapping subscriptions where things are not properly cleared.

@wuhkuh
Copy link
Contributor

wuhkuh commented Mar 3, 2017

Yeah, only to find out this is intended. The queue has to be emptied on aedes' side.
Why it worked? It limited the queue to 150 instead of limiting the concurrency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants