Skip to content

Conversation

@mergify
Copy link

@mergify mergify bot commented Dec 4, 2025

Why

We use a Khepri projection to compute a graph for bindings that have a topic exchange as their source. This allows more efficient queries during routing. This graph is not stored in Khepri, only in the projection ETS table.

When a binding is deleted, we need to clean up the graph. However, the pattern used to match the trie edges to delete was incorrect, leading to "orphaned" trie edges. The accumulation of these leftovers caused a memory leak.

How

The pattern was fixed to correctly match the appropriate trie edges.

However, this fix alone is effective for new deployments of RabbitMQ only, when the projection function is registered for the first time. We also need to handle the update of already registered projections in existing clusters.

To achieve that, first, we renamed the projection from rabbit_khepri_topic_trie to rabbit_khepri_topic_trie_v2 to distinguish the bad one and the good one. Any updated RabbitMQ nodes in an existing cluster will use this new projection. Other existing out-of-date nodes will continue to use the old projection. Because both projections continue to exist, the cluster will still be affected by the memory leak.

Then, each node will verify on startup if all other cluster members support the new projection. If that is the case, they will unregister the old projection. Therefore, once all nodes in a cluster are up-to-date and use the new projection, the old one will go away and the leaked memory will be reclaimed.

This startup check could have been made simpler with a feature flag. We decided to go with a custom check in case a user would try to upgrade from a 4.1.x release that has the fix to a 4.2.x release that does not for instance. A feature flag would have prevented that upgrade path.

Fixes #15024.


This is an automatic backport of pull request #15025 done by Mergify.

[Why]
We use a Khepri projection to compute a graph for bindings that have a
topic exchange as their source. This allows more efficient queries
during routing. This graph is not stored in Khepri, only in the
projection ETS table.

When a binding is deleted, we need to clean up the graph. However, the
pattern used to match the trie edges to delete was incorrect, leading to
"orphaned" trie edges. The accumulation of these leftovers caused a
memory leak.

[How]
The pattern was fixed to correctly match the appropriate trie edges.

However, this fix alone is effective for new deployments of RabbitMQ
only, when the projection function is registered for the first time. We
also need to handle the update of already registered projections in
existing clusters.

To achieve that, first, we renamed the projection from
`rabbit_khepri_topic_trie` to `rabbit_khepri_topic_trie_v2` to
distinguish the bad one and the good one. Any updated RabbitMQ nodes in
an existing cluster will use this new projection. Other existing
out-of-date nodes will continue to use the old projection. Because both
projections continue to exist, the cluster will still be affected by the
memory leak.

Then, each node will verify on startup if all other cluster members
support the new projection. If that is the case, they will unregister
the old projection. Therefore, once all nodes in a cluster are
up-to-date and use the new projection, the old one will go away and the
leaked memory will be reclaimed.

This startup check could have been made simpler with a feature flag. We
decided to go with a custom check in case a user would try to upgrade
from a 4.1.x release that has the fix to a 4.2.x release that does not
for instance. A feature flag would have prevented that upgrade path.

Fixes #15024.

(cherry picked from commit 76dcd92)
@michaelklishin michaelklishin added this to the 4.2.2 milestone Dec 4, 2025
@michaelklishin michaelklishin merged commit 73e5d93 into v4.2.x Dec 4, 2025
291 checks passed
@michaelklishin michaelklishin deleted the mergify/bp/v4.2.x/pr-15025 branch December 4, 2025 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants