Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x-death header count is no longer incremented from RabbitMQ version 3.13.0+ #11331

Closed
cressie176 opened this issue May 27, 2024 · 14 comments
Closed
Milestone

Comments

@cressie176
Copy link

cressie176 commented May 27, 2024

Describe the bug

Prior to v13.13.0, when a message is rejected and routed to a dead letter queue, the broker maintains an "x-death" header added similar to the following...

[
  {
    "count": 1,
    "reason": "rejected",
    "queue": "q1",
    "time": {
      "!": "timestamp",
      "value": 1716827089
    },
    "exchange": "",
    "routing-keys": [
      "q1"
    ]
  }
]

if the message is moved from the dead letter queue back the work queue, and again rejected, the count property is updated by the broker, i.e.

[
  {
    "count": 2,
    "reason": "rejected",
    "queue": "q1",
    "time": {
      "!": "timestamp",
      "value": 1716827089
    },
    "exchange": "",
    "routing-keys": [
      "q1"
    ]
  }
]

From RabbitMQ v13.13.0 the count is no longer updated. I'm author of a RabbitMQ client for Node.js called Rascal, and was about to rely on the incrementing count for this feature.

There isn't any mention of this breaking change in the v13.13.0 release notes, but I see there is a related issue. Similar to the author of this issue, I am also not editing the x-death header, but am relying on the broker maintaining the count. I am unable to maintain this in a custom header since I would need to modify the message on rejection, which is not possible. If the decision is for x-death to be "broker only" and to ignore this header when published from the client, that's no problem, but to maintain backwards compatibility, the broker must count the rejections.

In #10709 you also advise that RabbitMQ v4 is likely to make header's starting with 'x-' broker only. Thankfully none of Rascal's custom headers take this format, but wouldn't this mean that any messages with custom 'x-' headers that are in flight / enqueued when the broker is upgraded might be broken?

Reproduction steps

  1. Configure a work queue with a dead letter exchange / dead letter queue
  2. Publish a message to the work queue
  3. Reject the message with requeue false
  4. Move the message to the work queue
  5. Rerject the message with requeue false again

The message should have an 'x-death' record with a count of 2, but instead it has a count of 1

Expected behavior

The x-death record's count property should be incremented whenever a message is rejected.

Additional context

No response

@cressie176 cressie176 added the bug label May 27, 2024
@michaelklishin
Copy link
Member

michaelklishin commented May 27, 2024

There is a set of changes related to dead lettering that have shipped in 3.13.2 or will ship in 3.13.3: #10709,
#11173, #11160, #11159, #11174

@michaelklishin
Copy link
Member

As for x-* headers, nothing has been decided yet. There's a set of headers such as x-deaths that were never meant to be modified by applications.

@cressie176
Copy link
Author

cressie176 commented May 27, 2024

@michaelklishin thank you for the link to #11174, it provides a lot more context, however I think it is still a bug and should not have been closed as a duplicate.

There's a set of headers such as x-deaths that were never meant to be modified by applications.

The application is not modifying these, other than when they move the message (by copying it) to a different queue.

Irrespective it's very confusing behaviour to have a "count" property set by the broker, which until recently was incremented with each rejection, but now if fixed at 1 no matter how many times the message is rejected. This was an undocumented change in v3.13.0, that breaks the current documented behaviour.

count: how many times this message was dead-lettered in this queue for this reason

cressie176 added a commit to onebeyond/rascal that referenced this issue May 27, 2024
@michaelklishin
Copy link
Member

@cressie176 since 3.13.0 there were four changes directly related to dead lettering, including at least one related to the conditions under which the counter is incremented. They are all mentioned in release notes (including those pending release under release-notes in this repository).

Use https://hub.docker.com/layers/pivotalrabbitmq/rabbitmq/v3.13.x/images/sha256-b32c328236bec5ec2b94bd4afd290687844a957919fe838457d10858c620bfc4?context=explore in order to compare.

If this is something distinct from all those behaviours, please put together an executable way to reproduce and we will consider reopening this issue or filing a new one.

@cressie176
Copy link
Author

cressie176 commented May 27, 2024

I searched the v3.13.0 release notes (the version which introduced the bug) prior to reporting it. They only mentioned dead lettering once and give no indication there was a breaking change relating to the way rejections are counted...

Messages are now internally stored using a new common heavily AMQP 1.0-influenced container format. This is a major step towards a protocol-agnostic core: a common format that encapsulates a sum of data types used by the protocols RabbitMQ supports, plus annotations for routing, dead-lettering state, and other purposes.

@cressie176
Copy link
Author

cressie176 commented May 27, 2024

@michaelklishin

since 3.13.0 there were four changes directly related to dead lettering, including at least one related to the conditions under which the counter is incremented

#10709 - This is the same bug, although it manifests with message expiry. The OP was able to workaround, but noted how this would likely cause issues for other users. @lukebakken acknowledges it is a regression and breaking change, but the issue was closed regardless.

#11173 - This issue is about adding the x-death header to streamed messages

#11160 - This issue is about a warning message when dead letter messages are dropped by the broker

#11159 - This issue is about dead letter messages being incorrectly dropped by the broker

#11174 - This discusses approaches for avoiding infinite dead letter cycles, and the introduction of deaths_v2 annotation

None of the above, other than the first which was closed without resolving the bug, are concerned with v3.13.0 of the rabbitmq broker failing to increment the x-death count field.

If this is something distinct from all those behaviours, please put together an executable way to reproduce and we will consider reopening this issue or filing a new one.

https://github.com/cressie176/rabbitmq-11331 demonstrates the bug

@michaelklishin
Copy link
Member

michaelklishin commented May 27, 2024

@cressie176 apparently it was an intentionally decision (in the same discussion as #10709 (comment)) by several team members to keep the current behavior instead of reverting it only to change it back for 4.x.

Thank you for taking the time to put together an executable example to reproduce!

@michaelklishin
Copy link
Member

The above decision was taken after investigating what would a message container-based 3.12.x era solution would be like #11043.

Time will tell how many paying users will ask for a temporary restoration of this behavior. Most seemingly can use a custom header when publishing, which is a fundamental solution which avoids modifying a header used internally by RabbitMQ to collect a series of dead lettering-related events.

@ansd
Copy link
Member

ansd commented May 28, 2024

Thank you @cressie176 for the detailed issue and reproduction steps.

In https://github.com/cressie176/rabbitmq-11331/blob/7372dc0e7d69dc78dc52c27d8ad8f6b0f2889440/index.js#L17 you are publishing a new (i.e. 2nd) message. The broker won't interpret this x-death header from this new (2nd) message in v3.13.x.

The solution for you is one of the following:

  1. Use a custom count header as described in x-death count not incremented when message expired #10709 (comment), or
  2. Let RabbitMQ itself move the message back to the original queue, for example by configuring x-message-ttl on the dead letter queue.

I am unable to maintain this in a custom header since I would need to modify the message on rejection, which is not possible.

No, you don't need to modify the message on rejection, you can instead modify the message header when you publish your new (2nd) message.

@cressie176
Copy link
Author

cressie176 commented May 28, 2024

Thanks @michaelklishin and @ansd - It's good to know a fix was back ported to the v3 branch.

No, you don't need to modify the message on rejection, you can instead modify the message header when you publish your new (2nd) message.

This is what I was trying to avoid by using the x-death count attribute. Without a standard rejection count header, users won't be able to use the shovel plugin (or other 3rd party software) to move the messages from the dead letter queue back to the work queue in scenarios where they need to treat replayed messages differently from regular ones.

Rascal's scenario is that it needs to remove the immediate nack custom header, but given x-death count has been widely adopted by other frameworks to track rejections, there are certain to be other examples. Without the x-death count users will need to write their own shovel application. For Rascal, I've worked around this by using the x-death time value as well. Prior to v3.13.0 this was not updated by the broker. Ironically the same change that broke the count means a the timestamp is now set to that of the last rejection rather than the first.

For v4 I assume it won't be possible for the broker to keep a count of rejections of republished messages since there isn't a guaranteed message id. If not, can the x-death count attribute be removed so that it isn't misleading, and the removal listed as a breaking change in the release notes please?

@ansd
Copy link
Member

ansd commented May 28, 2024

It's good to know a fix was back ported to the v3 branch

Do you mean the v3.13.x branch? Which fix are you referring to?

Without a standard rejection count header, users won't be able to use the shovel plugin (or other 3rd party software) to move the messages from the dead letter queue back to the work queue in scenarios where they need to treat replayed messages differently from regular ones.

Shovels can be configured to set certain headers such as dest-publish-properties, dest-add-forward-headers, dest-add-timestamp-header. Hence, users can treat messages replayed via the shovel plugin differently from other messages.

Ironically the same change that broke the count means a the timestamp is now set to that of the last rejection rather than the first.

Note that all these timestamps are wall clock time, and therefore an approximate point in time.

@cressie176
Copy link
Author

Do you mean the v3.13.x branch? Which fix are you referring to?

From #11043

"This commit will only be packported to v3.13.x branch. In other words, 4.0 won't interpret x-headers as done in 3.13.0 and 3.13.1".

Have I misunderstood?

Shovels can be configured to set certain headers such as dest-publish-properties, dest-add-forward-headers, dest-add-timestamp-header

Thank you. I was not aware of these. The documentation says that dest-publish-properties is unsupported though, and there still doesn't appear to be the ability to increment the number of times the message was rejected.

Note that all these timestamps are wall clock time, and therefore an approximate point in time.

Thanks for the clarification. This will be sufficient for Rascal's use case.

ansd added a commit that referenced this issue May 28, 2024
Fixes #10709
Fixes #11331

This commit fixes the following regression which worked in 3.12.x, but
stopped working in 3.13.0 - 3.13.3:

```
AMQP 0.9.1 client    --publish-->
Q                    --dead-letter-->
DLQ                  --consume-->
AMQP 0.9.1 client (death count is now 1) --republish-same-message-with-headers-as-just-received-->
Q                    --dead-letter-->
DLQ                  --consume -->
AMQP 0.9.1 (death count is now 1, but should be 2)
```

The reason this behaviour stopped to work in 3.13.0 is that the broker
won't specially interpret x-headers in general, and the x-death header
specifically in this case anymore.

In other words, the new desired 3.13 behaviour with message containers
is that "x-headers belong to the broker".

While this is correct, it does break client applications which depended
on the previous use case.
One simple fix is that the client application does not re-publish with
the x-death header, but instead sets its own custom count header to
determine the number of times it retries.

This commit will only be packported to v3.13.x branch.
In other words, 4.0 won't interpret x-headers as done in 3.13.0 - 3.13.3.

The reason we backport this commit to v3.13.x is that the Spring
documentation expliclity recommends re-publishing the message with
x-death header being set:
https://cloud.spring.io/spring-cloud-static/spring-cloud-stream-binder-rabbit/3.0.6.RELEASE/reference/html/spring-cloud-stream-binder-rabbit.html#_retry_with_the_rabbitmq_binder
ansd added a commit that referenced this issue May 28, 2024
Fixes #10709
Fixes #11331

This commit fixes the following regression which worked in 3.12.x, but
stopped working in 3.13.0 - 3.13.2:

```
AMQP 0.9.1 client    --publish-->
Q                    --dead-letter-->
DLQ                  --consume-->
AMQP 0.9.1 client (death count is now 1) --republish-same-message-with-headers-as-just-received-->
Q                    --dead-letter-->
DLQ                  --consume -->
AMQP 0.9.1 (death count is now 1, but should be 2)
```

The reason this behaviour stopped to work in 3.13.0 is that the broker
won't specially interpret x-headers in general, and the x-death header
specifically in this case anymore.

In other words, the new desired 3.13 behaviour with message containers
is that "x-headers belong to the broker".

While this is correct, it does break client applications which depended
on the previous use case.
One simple fix is that the client application does not re-publish with
the x-death header, but instead sets its own custom count header to
determine the number of times it retries.

This commit will only be packported to v3.13.x branch.
In other words, 4.0 won't interpret x-headers as done in 3.13.0 - 3.13.2.

The reason we backport this commit to v3.13.x is that the Spring
documentation expliclity recommends re-publishing the message with
x-death header being set:
https://cloud.spring.io/spring-cloud-static/spring-cloud-stream-binder-rabbit/3.0.6.RELEASE/reference/html/spring-cloud-stream-binder-rabbit.html#_retry_with_the_rabbitmq_binder
@michaelklishin michaelklishin added this to the 3.13.3 milestone May 28, 2024
@michaelklishin
Copy link
Member

To clarify: #11339, which to some extent helps projects that publish messages with x-deaths, is v3.13.x-specific and will not go into main for 4.0.

@ansd
Copy link
Member

ansd commented May 29, 2024

Have I misunderstood?

Yes, #11043 is closed. Closed means not merged.
However, we merged #11339 yesterday which fixes the issue in 3.13.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants