GET /api/exchanges fails with a 500 after upgrading to 4.2.0 with rabbitmq_delayed_message_exchange #14977

mgarstecki · 2025-11-19T16:24:26Z

mgarstecki
Nov 19, 2025

Describe the bug

We deploy RabbitMQ using the official cluster operator.

When we upgrade the cluster (by changing the Docker image tag from 4.1.4 to 4.2.0), the cluster upgrades properly, and all AMQP operations work. However, the /api/exchanges endpoint starts returning 500 errors.

We can see the following error logs related to the management interface when the issue happens:

Ranch listener {acceptor,{0,0,0,0,0,0,0,0},15672}, connection process <0.284866.0>, stream 1 had its request process <0.284867.0> exit with reason {{aborted,{no_exists,['rabbit_delayed_messagerabbit@rabbitmq-cluster-server-1.rabbitmq-cluster-nodes.staging-sv1',[{{delay_entry,{delay_key,'_',{exchange,{resource,<<"pigment.events">>,exchange,<<"RuntimeConfigOrganizationDeletion_republish_exchange">>},'_','_','_','_','_','_','_','_','_','_'}},'_','_'},[],[true]}]]}},[{mnesia,abort,1,[{file,"mnesia.erl"},{line,683}]},{rabbit_delayed_message,messages_delayed,1,[{file,"rabbit_delayed_message.erl"},{line,136}]},{rabbit_exchange_type_delayed_message,info,2,[{file,"rabbit_exchange_type_delayed_message.erl"},{line,107}]},{rabbit_exchange,info,1,[{file,"rabbit_exchange.erl"},{line,348}]},{lists,map_1,2,[{file,"lists.erl"},{line,2082}]},{lists,map,2,[{file,"lists.erl"},{line,2077}]},{rabbit_mgmt_util,'-all_or_one_vhost/2-lc$^0/1-0-',2,[{file,"rabbit_mgmt_util.erl"},{line,979}]},{rabbit_mgmt_util,all_or_one_vhost,2,[{file,"rabbit_mgmt_util.erl"},{line,979}]}]}

  crasher:
    initial call: cowboy_stream_h:request_process/3
    pid: <0.284867.0>
    registered_name: []
    exception exit: {{aborted,
                      {no_exists,
                       ['rabbit_delayed_messagerabbit@rabbitmq-cluster-server-1.rabbitmq-cluster-nodes.staging-sv1',
                        [{{delay_entry,
                           {delay_key,'_',
                            {exchange,
                             {resource,<<"pigment.events">>,exchange,
                              <<"RuntimeConfigOrganizationDeletion_republish_exchange">>},
                             '_','_','_','_','_','_','_','_','_','_'}},
                           '_','_'},
                          [],
                          [true]}]]}},
                     [{mnesia,abort,1,[{file,"mnesia.erl"},{line,683}]},
                      {rabbit_delayed_message,messages_delayed,1,
                       [{file,"rabbit_delayed_message.erl"},{line,136}]},
                      {rabbit_exchange_type_delayed_message,info,2,
                       [{file,"rabbit_exchange_type_delayed_message.erl"},
                        {line,107}]},
                      {rabbit_exchange,info,1,
                       [{file,"rabbit_exchange.erl"},{line,348}]},
                      {lists,map_1,2,[{file,"lists.erl"},{line,2082}]},
                      {lists,map,2,[{file,"lists.erl"},{line,2077}]},
                      {rabbit_mgmt_util,'-all_or_one_vhost/2-lc$^0/1-0-',2,
                       [{file,"rabbit_mgmt_util.erl"},{line,979}]},
                      {rabbit_mgmt_util,all_or_one_vhost,2,
                       [{file,"rabbit_mgmt_util.erl"},{line,979}]}]}
      in function  mnesia:abort/1 (mnesia.erl, line 683)
      in call from rabbit_delayed_message:messages_delayed/1 (rabbit_delayed_message.erl, line 136)
      in call from rabbit_exchange_type_delayed_message:info/2 (rabbit_exchange_type_delayed_message.erl, line 107)
      in call from rabbit_exchange:info/1 (rabbit_exchange.erl, line 348)
      in call from lists:map_1/2 (lists.erl, line 2082)
      in call from lists:map/2 (lists.erl, line 2077)
      in call from rabbit_mgmt_util:'-all_or_one_vhost/2-lc$^0/1-0-'/2 (rabbit_mgmt_util.erl, line 979)
      in call from rabbit_mgmt_util:all_or_one_vhost/2 (rabbit_mgmt_util.erl, line 979)
    ancestors: [<0.284866.0>,<0.3608.0>,<0.3599.0>,<0.3598.0>,<0.3596.0>,
                  rabbit_web_dispatch_sup,<0.3551.0>]
    message_queue_len: 0
    messages: []
    links: [<0.284866.0>]
    dictionary: [{{xtype_to_module,fanout},rabbit_exchange_type_fanout},
                  {{khepri,can_skip_fence_preliminary_query,rabbitmq_metadata},
                   true},
                  {{xtype_to_module,'x-delayed-message'},
                   rabbit_exchange_type_delayed_message}]
    trap_exit: false
    status: running
    heap_size: 46422
    stack_size: 29
    reductions: 2471
  neighbours:

We can recover from the issue by restarting the cluster nodes one by one, but it breaks our automation around upgrades.

We've had the issue in the past on other upgrades, but we've lost the history of the exact upgrade paths that triggered the issue.

Reproduction steps

Install a three-node 4.1.4 RabbitMQ cluster using the RabbitMQ operator,
Upgrade it to 4.2.0 by changing the deployed image,
Call /api/exchanges on the cluster.

The endpoint returns a 500 error with no more details.

We reproduce the issue reliably, it happens every time we try the upgrade.

Expected behavior

The endpoint works and the list of exchanges is returned.

Additional context

Here's our definition of RabbitmqCluster:

apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
  name: rabbitmq-cluster
spec:
  rabbitmq:
    advancedConfig: |
      [
        {rabbit, [
          {consumer_timeout, undefined}
        ]}
      ].
    additionalPlugins:
      - rabbitmq_delayed_message_exchange
      - rabbitmq_stream
      - rabbitmq_prometheus
    additionalConfig: |
      log.console = true
      log.console.formatter = json

      prometheus.return_per_object_metrics = true

      # Replicate max message size of RabbitMQ v3.x as we have code that
      # expects this limit to be in place.
      max_message_size = 134217728
  replicas: 3
  autoEnableAllFeatureFlags: true
  service:
    type: ClusterIP
  persistence:
    storageClassName: pd-ssd-lazy
    storage: 30Gi
  image: $privateregistry/rabbitmq-with-plugins:4.1.4
  resources:
    requests:
      cpu: 200m
      memory: 1Gi
    limits:
      memory: 1Gi

We use a private Docker image to add plugins, here is the Dockerfile:

FROM rabbitmq:4.1.4-management-alpine

RUN apk update && apk upgrade

ARG RABBITMQ_DELAYED_MESSAGE_PLUGIN_VERSION=4.1.4

RUN <<EOF
set -e
cd /opt/rabbitmq/plugins
wget "https://github.com/rabbitmq/rabbitmq-delayed-message-exchange/releases/download/v${RABBITMQ_DELAYED_MESSAGE_PLUGIN_VERSION}/rabbitmq_delayed_message_exchange-${RABBITMQ_DELAYED_MESSAGE_PLUGIN_VERSION}.ez"
rabbitmq-plugins enable --offline rabbitmq_delayed_message_exchange
rabbitmq-plugins enable --offline rabbitmq_stream
rabbitmq-plugins enable --offline rabbitmq_prometheus
EOF

Answered by lukebakken

Nov 19, 2025

@mgarstecki my guess is that you are running into the issue explicitly called out in the release notes: https://github.com/rabbitmq/rabbitmq-delayed-message-exchange/releases/tag/v4.2.0

During the upgrade to 4.2.0, all feature flags are enabled by the cluster operator, which means that Khepri will be enabled. As the release notes clearly state, you will have to do this:

Important: if the cluster uses Mnesia, then the plugin is enabled, and then Khepri is enabled, the plugin must be disabled and re-enabled, or the node must be restarted. Then it will start Mnesia and works as in scenario 2

I confirmed this in my project by NOT doing anything with feature flags (thus keeping the mnesia me…

View full answer

lukebakken · 2025-11-19T16:51:14Z

lukebakken
Nov 19, 2025
Maintainer

I'm assuming this...

ARG RABBITMQ_DELAYED_MESSAGE_PLUGIN_VERSION=4.1.4

...is really this:

ARG RABBITMQ_DELAYED_MESSAGE_PLUGIN_VERSION=4.1.0

...because there is no 4.1.4 release of that plugin.

0 replies

lukebakken · 2025-11-19T17:16:56Z

lukebakken
Nov 19, 2025
Maintainer

FWIW, I can't reproduce what you report using this docker compose project.

Steps:

make DOCKER_FRESH=true up

wait for cluster to come up and PerfTest to start
browse to localhost:15672 to verify version and that x-delayed-message exchange type is available
Upgrade:

make upgrade

The above enables feature flags, builds new containers using the 4.2-management-alpine base, and restarts the cluster.

Then:

make apicall

0 replies

lukebakken · 2025-11-19T17:30:25Z

lukebakken
Nov 19, 2025
Maintainer

UPDATE: @mgarstecki since you didn't provide a set of definitions, I was just going with my base docker compose project. However, as soon as I add a delayed exchange, I CAN reproduce the issue. I just pushed my updated code

0 replies

lukebakken · 2025-11-19T17:47:16Z

lukebakken
Nov 19, 2025
Maintainer

@mgarstecki my guess is that you are running into the issue explicitly called out in the release notes: https://github.com/rabbitmq/rabbitmq-delayed-message-exchange/releases/tag/v4.2.0

During the upgrade to 4.2.0, all feature flags are enabled by the cluster operator, which means that Khepri will be enabled. As the release notes clearly state, you will have to do this:

Important: if the cluster uses Mnesia, then the plugin is enabled, and then Khepri is enabled, the plugin must be disabled and re-enabled, or the node must be restarted. Then it will start Mnesia and works as in scenario 2

I confirmed this in my project by NOT doing anything with feature flags (thus keeping the mnesia metadata store), and the upgrade did not cause this issue to occur.

0 replies

lukebakken · 2025-11-19T17:55:11Z

lukebakken
Nov 19, 2025
Maintainer

@mgarstecki I confirmed that restarting the plugin on all nodes (as the release notes state) does indeed fix this issue - https://github.com/lukebakken/docker-rabbitmq-cluster/blob/rabbitmq-server-14973/upgrade.sh#L27-L33

0 replies

michaelklishin · 2025-11-19T23:27:08Z

michaelklishin
Nov 19, 2025
Maintainer

@mgarstecki you should consider reading the release notes of the 3rd party/community plugins you use before upgrading them.

And another mandatory mention: the rabbitmq_delayed_message_exchange plugin is no longer being developed. Its updates will likely stop sooner rather than later.

A future Tanzu RabbitMQ version will include a distributed alternative as a new queue type. Possibly starting with 4.3.0, this is not a promise of delivery of any kind.

0 replies

mgarstecki · 2025-11-20T16:20:11Z

mgarstecki
Nov 20, 2025
Author

Indeed we had missed this part of the release note. Thank you for the investigations, I'll improve our operational docs around RabbitMQ.

0 replies

GET /api/exchanges fails with a 500 after upgrading to 4.2.0 with rabbitmq_delayed_message_exchange #14977

Uh oh!

Uh oh!

mgarstecki Nov 19, 2025

Describe the bug

Reproduction steps

Expected behavior

Additional context

Replies: 7 comments

Uh oh!

lukebakken Nov 19, 2025 Maintainer

Uh oh!

lukebakken Nov 19, 2025 Maintainer

Uh oh!

Uh oh!

lukebakken Nov 19, 2025 Maintainer

Uh oh!

Uh oh!

lukebakken Nov 19, 2025 Maintainer

Uh oh!

lukebakken Nov 19, 2025 Maintainer

Uh oh!

Uh oh!

michaelklishin Nov 19, 2025 Maintainer

Uh oh!

mgarstecki Nov 20, 2025 Author

mgarstecki
Nov 19, 2025

lukebakken
Nov 19, 2025
Maintainer

lukebakken
Nov 19, 2025
Maintainer

lukebakken
Nov 19, 2025
Maintainer

lukebakken
Nov 19, 2025
Maintainer

lukebakken
Nov 19, 2025
Maintainer

michaelklishin
Nov 19, 2025
Maintainer

mgarstecki
Nov 20, 2025
Author