-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exchange federation links not automatically restarted after a rolling upgrade to 3.8.15 or later release #3148
Comments
Has any investigation into this been conducted by CloudAMQP? The only changes relevant to Federation after
and the upgrade of Ranch to 2.0 which I doubt can matter here. |
No more investigation has been done apart for confirming the issue and constructing cases to test it. I take it that one way to debug would be to build |
Reverting those is one option but seeing what mirrored supervisor messages are produced at |
OK, attaching debug level logs ( Node |
I cannot reproduce this with three |
@johanrhodin when you say
does this mean this is a mixed-version cluster? In a mixed version cluster, This is expected that during a rolling upgrade to As explained in #3080, there isn't much our team can do about this. For federation users, clearing and re-enabling the policy should be sufficient to bring back the links on the upgraded post- |
I could not reproduce this with All the symptoms from #3080 were present: this is a side-effect of a changed process group module used by plugins The bad news is that we cannot do anything about this without dropping (and never re-introducing) Erlang 24 support. |
… changes in the logs. Referenes #3148.
@michaelklishin I should have been more explicit with I still see this with all nodes involved running Erlang 24.0.2 and RabbitMQ 3.8.19. I will try and create a minimal working example for reproduction. |
OK I can't reproduce with I can reproduce with 3.8.16, with the following: # 1. Two clusters defintions
UPSTREAM_URL="amqps://xcjvgoyg:PASSWD@test-myrtle-green-stingray.rmq2.cloudamqp.com/xcjvgoyg" # 1 node 3.8.21
DOWNSTREAM_VHOST=kjdjuxhr
HTTPS_DOWNSTREAM="https://kjdjuxhr:PASSWD@test-exotic-blond-duckbill.rmq2.cloudamqp.com" # 3 nodes 3.8.21
# 2. Create federation-upstream on downstream
curl -i -XPUT -H "content-type:application/json" -d'{"value":{"uri":"'$UPSTREAM_URL'","expires":3600000}}' $HTTPS_DOWNSTREAM/api/parameters/federation-upstream/$DOWNSTREAM_VHOST/upstream
# 3. Create a federation policy on downstream
curl -i -X PUT -H 'Content-Type: application/json' $HTTPS_DOWNSTREAM/api/policies/$DOWNSTREAM_VHOST/fedit/ -d '{"pattern":".", "definition": {"federation-upstream-set":"all"}, "priority":0, "apply-to": "exchanges"}'
# 4. Stop RabbitMQ on the node that has the link running on downstream. |
Hi, we seem to have the same situation in RabbitMQ RabbitMQ 3.13.2 Erlang 26.2.4. It is not consistent. But sometimes during Cluster restart (one node at a time), we experience losing the federation link. In our case federation is used to transport messages from one vhost to another.. I have uploaded logs from our 2-node-cluster. The last time I see anything of the federation link in there ist at 2024-06-17 14:48:13.975211+02:00 and the lines following immediately after that. The federation upstream points to a loadbalancer address (x17-rabbit-ha.stuttgart.de) This cluster is not yet operational. We have another older cluster with RabbitMQ 3.8.2 Erlang 22.2.7 with the same configuration except cluster names and address of the loadbalancer as well of the introduction of certificates within die upstream URI needed since Erlang 26. Certificate data to show this should not be an issue: Is there any solution to this? Thank you very much in advance for your help! |
@lfstuttgart start a new discussion and provide exact steps to reproduce (even if it only happens sometimes). We can then take a look at this (potential) issue. |
@johanrhodin thanks! Here ist the new discussion: https://github.com/rabbitmq/rabbitmq-server/issues/11492 |
Here is the new discussion: #11495 |
Up to, and including, RabbitMQ 3.8.14 restarting one node in a multi-node cluster would cause the associated exchange federation links to jump to another node in the cluster. (This is the documented behavior in https://www.rabbitmq.com/federation.html#clustering: "Exchange federation links will start on any node in the downstream cluster. They will fail over to other nodes if the node they are running on crashes or stops.")
If the node that has the links in 3.8.15 (and higher) goes down, the federation link is removed and the policy will need to be recreated for the links to reappear. While the node serving the link is stopping a status of "Starting" is shown in the management interface, followed by no links at all when the node has fully stopped.
In my example scenario I used a one node cluster (3.8.16 with Erlang 24.0.2) as upstream and a three node cluster (3.8.14/15 with Erlang 23.2.3 as downstream. After noticing which node ran the link I stopped that node.
The following policy was used:
curl -i -X PUT -H 'Content-Type: application/json' $HTTPS_DOWNSTREAM/api/policies/$DOWNSTREAM_VHOST/fedit/ -d '{"pattern":".", "definition": {"federation-upstream-set":"all"}, "priority":0, "apply-to": "exchanges"}'
and the upstream is defined as:
curl -i -XPUT -H "content-type:application/json" -d'{"value":{"uri":"'$UPSTREAM_URL'","expires":3600000}}' $HTTPS_DOWNSTREAM/api/parameters/federation-upstream/$DOWNSTREAM_VHOST/upstream
This issue was reported both to us (CloudAMQP) and to RabbitMQ Slack: https://rabbitmq.slack.com/archives/C1EDN83PA/p1623755314386000
The text was updated successfully, but these errors were encountered: