GET /api/exchanges fails with a 500 after upgrading to 4.2.0 with rabbitmq_delayed_message_exchange #14977
-
Describe the bugWe deploy RabbitMQ using the official cluster operator. When we upgrade the cluster (by changing the Docker image tag from 4.1.4 to 4.2.0), the cluster upgrades properly, and all AMQP operations work. However, the We can see the following error logs related to the management interface when the issue happens: We can recover from the issue by restarting the cluster nodes one by one, but it breaks our automation around upgrades. We've had the issue in the past on other upgrades, but we've lost the history of the exact upgrade paths that triggered the issue. Reproduction steps
The endpoint returns a 500 error with no more details. We reproduce the issue reliably, it happens every time we try the upgrade. Expected behaviorThe endpoint works and the list of exchanges is returned. Additional contextHere's our definition of RabbitmqCluster: We use a private Docker image to add plugins, here is the Dockerfile: |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments
-
|
I'm assuming this... ...is really this: ...because there is no 4.1.4 release of that plugin. |
Beta Was this translation helpful? Give feedback.
-
|
FWIW, I can't reproduce what you report using this docker compose project. Steps:
The above enables feature flags, builds new containers using the Then: |
Beta Was this translation helpful? Give feedback.
-
|
UPDATE: @mgarstecki since you didn't provide a set of definitions, I was just going with my base docker compose project. However, as soon as I add a delayed exchange, I CAN reproduce the issue. I just pushed my updated code |
Beta Was this translation helpful? Give feedback.
-
|
@mgarstecki my guess is that you are running into the issue explicitly called out in the release notes: https://github.com/rabbitmq/rabbitmq-delayed-message-exchange/releases/tag/v4.2.0 During the upgrade to 4.2.0, all feature flags are enabled by the cluster operator, which means that Khepri will be enabled. As the release notes clearly state, you will have to do this:
I confirmed this in my project by NOT doing anything with feature flags (thus keeping the mnesia metadata store), and the upgrade did not cause this issue to occur. |
Beta Was this translation helpful? Give feedback.
-
|
@mgarstecki I confirmed that restarting the plugin on all nodes (as the release notes state) does indeed fix this issue - https://github.com/lukebakken/docker-rabbitmq-cluster/blob/rabbitmq-server-14973/upgrade.sh#L27-L33 |
Beta Was this translation helpful? Give feedback.
-
|
@mgarstecki you should consider reading the release notes of the 3rd party/community plugins you use before upgrading them. And another mandatory mention: the A future Tanzu RabbitMQ version will include a distributed alternative as a new queue type. Possibly starting with |
Beta Was this translation helpful? Give feedback.
-
|
Indeed we had missed this part of the release note. Thank you for the investigations, I'll improve our operational docs around RabbitMQ. |
Beta Was this translation helpful? Give feedback.
@mgarstecki my guess is that you are running into the issue explicitly called out in the release notes: https://github.com/rabbitmq/rabbitmq-delayed-message-exchange/releases/tag/v4.2.0
During the upgrade to 4.2.0, all feature flags are enabled by the cluster operator, which means that Khepri will be enabled. As the release notes clearly state, you will have to do this:
I confirmed this in my project by NOT doing anything with feature flags (thus keeping the mnesia me…