Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overwrite rabbit_mqtt_qos0_queue record from crashed node (backport #10203) #10205

Merged
merged 2 commits into from Dec 21, 2023

Conversation

mergify[bot]
Copy link

@mergify mergify bot commented Dec 21, 2023

This is an automatic backport of pull request #10203 done by Mergify.
Cherry-pick of 9487189 has failed:

On branch mergify/bp/v3.12.x/pr-10203
Your branch is up to date with 'origin/v3.12.x'.

You are currently cherry-picking commit 9487189dc6.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   deps/rabbitmq_mqtt/src/rabbit_mqtt_qos0_queue.erl

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   deps/rabbitmq_mqtt/test/shared_SUITE.erl

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/github/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally


Mergify commands and options

More conditions and actions can be found in the documentation.

You can also trigger Mergify actions by commenting on this pull request:

  • @Mergifyio refresh will re-evaluate the rules
  • @Mergifyio rebase will rebase this PR on its base branch
  • @Mergifyio update will merge the base branch into this PR
  • @Mergifyio backport <destination> will backport this PR on <destination> branch

Additionally, on Mergify dashboard you can:

  • look at your merge queues
  • generate the Mergify configuration with the config editor.

Finally, you can contact us on https://mergify.com

When a node is shut down cleanly, the rabbit_mqtt_qos0_queue record is
removed from Mnesia.
When a node crashes and subsequently reboots the new node incarnation
removes the old rabbit_mqtt_qos0_queue record from Mnesia (via
rabbit_mqtt_qos0_queue:recover/2)

However, when a node crashes, the rabbit_mqtt_qos0_queue will be removed
from Mnesia table rabbit_queue, but will still be present in table
rabbit_durable_queue on the other live nodes.
Prior to this commit, when the same MQTT client (i.e. same MQTT client
ID) re-connects from the crashed node to another live node and
re-subscribes, the following error occurred:
```
[info] <0.43155.0> Accepted MQTT connection 10.105.0.18:60508 -> 10.105.0.10:1883 for client ID nodered_24e214feb018a232
[debug] <0.43155.0> Received a SUBSCRIBE for topic(s) [{mqtt_topic,
[debug] <0.43155.0>                                        <<"as923/gateway/+/command/#">>,0}]
[error] <0.43155.0> Failed to declare queue 'mqtt-subscription-nodered_24e214feb018a232qos0' in vhost '/': {absent,
[error] <0.43155.0>                                                                                         {amqqueue,
[error] <0.43155.0>                                                                                          {resource,
[error] <0.43155.0>                                                                                           <<"/">>,
[error] <0.43155.0>                                                                                           queue,
[error] <0.43155.0>                                                                                           <<"mqtt-subscription-nodered_24e214feb018a232qos0">>},
[error] <0.43155.0>                                                                                          true,
[error] <0.43155.0>                                                                                          false,
[error] <0.43155.0>                                                                                          <15486.32690.0>,
[error] <0.43155.0>                                                                                          [],
[error] <0.43155.0>                                                                                          <15486.32690.0>,
[error] <0.43155.0>                                                                                          [],
[error] <0.43155.0>                                                                                          [],
[error] <0.43155.0>                                                                                          [],
[error] <0.43155.0>                                                                                          [{vhost,
[error] <0.43155.0>                                                                                            <<"/">>},
[error] <0.43155.0>                                                                                           {name,
[error] <0.43155.0>                                                                                            <<"ha-all-mqtt">>},
[error] <0.43155.0>                                                                                           {pattern,
[error] <0.43155.0>                                                                                            <<"^mqtt-">>},
[error] <0.43155.0>                                                                                           {'apply-to',
[error] <0.43155.0>                                                                                            <<"all">>},
[error] <0.43155.0>                                                                                           {definition,
[error] <0.43155.0>                                                                                            [{<<"ha-mode">>,
[error] <0.43155.0>                                                                                              <<"all">>}]},
[error] <0.43155.0>                                                                                           {priority,
[error] <0.43155.0>                                                                                            0}],
[error] <0.43155.0>                                                                                          undefined,
[error] <0.43155.0>                                                                                          [],
[error] <0.43155.0>                                                                                          undefined,
[error] <0.43155.0>                                                                                          live,
[error] <0.43155.0>                                                                                          0,
[error] <0.43155.0>                                                                                          [],
[error] <0.43155.0>                                                                                          <<"/">>,
[error] <0.43155.0>                                                                                          #{user =>
[error] <0.43155.0>                                                                                             <<"iottester">>},
[error] <0.43155.0>                                                                                          rabbit_mqtt_qos0_queue,
[error] <0.43155.0>                                                                                          #{}},
[error] <0.43155.0>                                                                                         nodedown}
[error] <0.43155.0> MQTT protocol error on connection 10.105.0.18:60508 -> 10.105.0.10:1883: subscribe_error
```

This commit fixes this error allowing an MQTT client that connects with CleanSession=true and
subscribes with QoS 0 to re-connect and re-subscribe to another live
node if the original Rabbit node crashes.

Reported in https://groups.google.com/g/rabbitmq-users/c/pxgy0QiwilM/m/LkJQ-3DyBgAJ

(cherry picked from commit 9487189)

# Conflicts:
#	deps/rabbitmq_mqtt/test/shared_SUITE.erl
@mergify mergify bot added the conflicts label Dec 21, 2023
@mergify mergify bot assigned ansd Dec 21, 2023
@michaelklishin michaelklishin added this to the 3.12.11 milestone Dec 21, 2023
@ansd ansd force-pushed the mergify/bp/v3.12.x/pr-10203 branch from 7259941 to d73f3cc Compare December 21, 2023 17:23
@michaelklishin michaelklishin merged commit d0ccd4f into v3.12.x Dec 21, 2023
13 of 14 checks passed
@michaelklishin michaelklishin deleted the mergify/bp/v3.12.x/pr-10203 branch December 21, 2023 18:24
ansd added a commit that referenced this pull request Dec 21, 2023
Follow up of #10205

Branch v3.12.x is currently red because test
```
bazel test //deps/rabbitmq_mqtt:shared_SUITE-mixed -t- --test_sharding_strategy=disabled --test_env FOCUS="-group [mqtt,cluster_size_3] -case rabbit_mqtt_qos0_queue_kill_node"
```

fails because the old node will error out with:
```
[info] <0.1962.0> accepting MQTT connection <0.1962.0> (127.0.0.1:61899 -> 127.0.0.1:21059, client id: subscriber)
[debug] <0.1962.0> Received a SUBSCRIBE for topic(s) [{mqtt_topic,
[debug] <0.1962.0>                                        "rabbit_mqtt_qos0_queue_kill_node",0}]
[error] <0.1977.0> Channel error on connection <0.1965.0> (127.0.0.1:61899 -> 127.0.0.1:21059, vhost: '/', user: 'guest'), channel 1:
[error] <0.1977.0> operation queue.declare caused a channel exception resource_locked: cannot obtain exclusive access to locked queue 'mqtt-subscription-subscriberqos0' in vhost '/'. It could be originally declared on another connection or the exclusive property value does not match that of the original declaration.
```

Classic mirrored queue could be used instead as descibed in https://groups.google.com/g/rabbitmq-users/c/pxgy0QiwilM/m/LkJQ-3DyBgAJ

PR #10205 specifically allows for clients to re-subscribe to a live node
for queue type rabbit_mqtt_qos0_queue.

So, it's okay to skip the test when run with feature flag rabbit_mqtt_qos0_queue being disabled
causing a classic queue to be created.
ansd added a commit that referenced this pull request Dec 21, 2023
* Skip rabbit_mqtt_qos0_queue_kill_node in mixed version mode

Follow up of #10205

Branch v3.12.x is currently red because test
```
bazel test //deps/rabbitmq_mqtt:shared_SUITE-mixed -t- --test_sharding_strategy=disabled --test_env FOCUS="-group [mqtt,cluster_size_3] -case rabbit_mqtt_qos0_queue_kill_node"
```

fails because the old node will error out with:
```
[info] <0.1962.0> accepting MQTT connection <0.1962.0> (127.0.0.1:61899 -> 127.0.0.1:21059, client id: subscriber)
[debug] <0.1962.0> Received a SUBSCRIBE for topic(s) [{mqtt_topic,
[debug] <0.1962.0>                                        "rabbit_mqtt_qos0_queue_kill_node",0}]
[error] <0.1977.0> Channel error on connection <0.1965.0> (127.0.0.1:61899 -> 127.0.0.1:21059, vhost: '/', user: 'guest'), channel 1:
[error] <0.1977.0> operation queue.declare caused a channel exception resource_locked: cannot obtain exclusive access to locked queue 'mqtt-subscription-subscriberqos0' in vhost '/'. It could be originally declared on another connection or the exclusive property value does not match that of the original declaration.
```

Classic mirrored queue could be used instead as descibed in https://groups.google.com/g/rabbitmq-users/c/pxgy0QiwilM/m/LkJQ-3DyBgAJ

PR #10205 specifically allows for clients to re-subscribe to a live node
for queue type rabbit_mqtt_qos0_queue.

So, it's okay to skip the test when run with feature flag rabbit_mqtt_qos0_queue being disabled
causing a classic queue to be created.

* Ensure test is skipped
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants