Allow MQTT QoS 0 subscribers to reconnect #10244
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The solution in #10203 has the following issues:
rabbit_amqqueue:internal_delete(Q, User, missing_owner)
and subsequently declare the new queue viarabbit_amqqueue:internal_declare(Q, false)
However, even then, it suffers from:
rabbit_amqqueue:on_node_down/1
andrabbit_mqtt_qos0_queue:declare/2
:rabbit_amqqueue:on_node_down/1
could first read the queue records that need to be deleted, thereafterrabbit_mqtt_qos0_queue:declare/2
could re-create the queue owned by the new connection PID, andrabbit_amqqueue:on_node_down/1
could subsequently delete the re-created queue.Unfortunately,
rabbit_amqqueue:on_node_down/1
does not delete transient queues in one isolated transaction. Instead it first reads queues and subsequenlty deletes queues in batches making it prone to race conditions.Ideally, this commit deletes all rabbit_mqtt_qos0_queue queues of the node that has crashed including their bindings.
However, doing so in one transaction is risky as there may be millions of such queues and the current code path applies the same logic on all live nodes resulting in conflicting transactions and therefore a long database operation.
Hence, this commit uses the simplest approach which should still be safe:
Do not remove rabbit_mqtt_qos0_queue queues if a node crashes. Other live nodes will continue to route to these dead queues. That should be okay, given that the rabbit_mqtt_qos0_queue clients auto confirm.
Continuing routing however has the effect of counting as routing result for AMQP 0.9.1
mandatory
property.If an MQTT client re-connects to a live node with the same client ID, the new node will delete and then re-create the queue. Once the crashed node comes back online, it will clean up its leftover queues and bindings.