Slow clients slow down the whole broker #95

alexsporn · 2022-09-01T13:22:22Z

We are using the MQTT broker and publishing messages directly to all clients using the broker's Publish() func.
This func adds a new publish packet to the inlineMessages.pub buffered channel (size 1024) and the inlineClient() loop will publish those packets to all subscribed clients.
For each subscribed client this will call client.WritePacket() which in the end will call Write() on the clients writer.

If a single subscribed client is too slow, the clients write buffer will fill up and the whole inlineClient() loop will hang until this client's buffer has space again (see awaitEmpty inside Write()). Shortly after the inlineMessages.pub buffered channel will fill up and further calls to Publish() will hang.

This means a single slow client (even one using QoS 0 with no guarantees of receiving packets) can make the whole broker wait indefinitely and not deliver any more packets to any client.

A possible workaround for this could be to instead of waiting for the buffer to be freed, to just return a "client buffer full" error and skip sending the packet to this client. If the client is using QoS 1/2 the inflight message retry mechanism should try to re-deliver the message.

What do you think? I can write a PR with this changes. Or do you have a better solution to this problem?

The text was updated successfully, but these errors were encountered:

mochi-co · 2022-09-02T20:29:51Z

Hi @alexsporn! This is very interesting - the possibility never occurred to me.

Currently I am inclined to think the best solution is the one you have described:

If the buffer is full, then writing the message should fail with the error message to the embedding platform.
The QOS of the inline-publisher is always 2 (exactly once), so we don't have to modify how this is handled.
If the QOS of the receiving client subscription is 1/2, then the message should be added to the client's inflight messages queue.

Perhaps we should also make the buffer size for inline publish an value in server.Options. @alexsporn what's your use case which triggered this?

In the meantime I have increased the buffer to 4096 in v1.3.2 👍🏻

alexsporn · 2022-09-06T08:14:46Z

Hi @mochi-co , thanks for looking in to the issue.

We are using MQTT over WebSocket as a Pub/Sub mechanism to listen to messages processed by our node software.
We faced some issues on one of the nodes running the MQTT broker which has a JavaScript-client (QoS 0) always connected and receives all unfiltered messages (between 50-300 a second). Due to this client slowing down the broker and blocking the Publish() function from enqueueing any more messages, the node started to slow down itself and not process any more messages.

Initially I thought it could be an issue in how we handle the incoming messages and publish them, so I went on to reproduce the bug. Using a JavaScript client (https://github.com/mqttjs/MQTT.js), publishing about 2000 packets a second and forcing the client to sleep between incoming messages to simulate slow processing of each packet, I could reproduce the MQTT broker lockup. Normally I'd say this would be no issue, but this can be used as a Denial-Of-Service attack on public brokers.

With the proposed change, the slow QoS 0 client will not influence any other connected clients and slow down the broker. As soon as the slow client clears up enough buffer it will start receiving messages again.

If the slow client is using QoS 1/2 this opens up another "attack vector" to the broker. If a long InflightTTL is used (defaults to 24 hours), then you can force the memory usage of the broker to quickly go up by using a couple of slow clients. All pending packets will stay in the inflight messages queue.

I totally understand that QoS 1/2 give certain guarantees on how MQTT behaves, but a slow client should not influence the brokers performance. Maybe we need a max count of inflight messages per client?

What do you think?

mochi-co · 2022-09-07T22:32:42Z

Hi @alexsporn, thanks for your comprehensive reply :) My apologies for not replying to this earlier, I have been very busy lately...

I absolutely agree with all of the issues you've highlighted here, and have been trying to think about the best way to handle this and ensure we don't create any unintended consequences.

I plan to look into it more thoroughly between now and the weekend if I get some time, but tentatively I think the correct (even expected) behaviour would be to drop the packet if the QOS is 0 and the client buffer is fully, otherwise to add it to the inflight queue. This should apply to both inline-message publishing by the embedding service, and also when a client publishes to the broker and the message is delegated out to subscribing clients.

A brief reminder of the code suggests that writing to clients is blocking (in as much as we wait to write to the client's buffer if it's full). This makes me suspect that a client publishing to a topic with many subscribers could theoretically block until all clients are iterated, which is not ideal. I will have a think about how we might alleviate this bottleneck.

mochi-co · 2022-09-10T18:54:10Z

@alexsporn I merged your recent PR, can you try pulling down master and seeing it the problem still exists? :) Thank you!

mochi-co · 2022-09-11T21:48:06Z

@alexsporn I've reverted #97 and reopened this issue as the solution for #97 causes the broker to stall (as per #101) under heavy load. I believe this may be related to the broker dropping acks if the queue is full rather than waiting.

mochi-co · 2022-12-10T22:22:39Z

This issue has been resolved in v2.0.0

This was referenced Sep 1, 2022

Investigate block processor overflow when inx-mqtt is enabled iotaledger/hornet#1687

Closed

Instead of waiting for the writing buffer to have enough space, skip writing and return an error #97

Merged

mochi-co added the discussion Something to be discussed label Sep 2, 2022

mochi-co self-assigned this Sep 2, 2022

mochi-co mentioned this issue Sep 2, 2022

Fix concurrent map access for clients and inflights causes data race #99

Merged

mochi-co closed this as completed in #97 Sep 10, 2022

mochi-co reopened this Sep 10, 2022

mochi-co mentioned this issue Sep 10, 2022

Broker can stall when stress testing 40 * 10000 messages #101

Closed

mochi-co closed this as completed Dec 10, 2022

mochi-co mentioned this issue Jan 28, 2023

Purpose of FanPool #150

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow clients slow down the whole broker #95

Slow clients slow down the whole broker #95

alexsporn commented Sep 1, 2022 •

edited

Loading

mochi-co commented Sep 2, 2022

alexsporn commented Sep 6, 2022

mochi-co commented Sep 7, 2022

mochi-co commented Sep 10, 2022

mochi-co commented Sep 11, 2022

mochi-co commented Dec 10, 2022

Slow clients slow down the whole broker #95

Slow clients slow down the whole broker #95

Comments

alexsporn commented Sep 1, 2022 • edited Loading

mochi-co commented Sep 2, 2022

alexsporn commented Sep 6, 2022

mochi-co commented Sep 7, 2022

mochi-co commented Sep 10, 2022

mochi-co commented Sep 11, 2022

mochi-co commented Dec 10, 2022

alexsporn commented Sep 1, 2022 •

edited

Loading