rabbit_khepri: Fix topic binding deletion leak (backport #15025) #15063

mergify · 2025-12-04T14:16:15Z

Why

We use a Khepri projection to compute a graph for bindings that have a topic exchange as their source. This allows more efficient queries during routing. This graph is not stored in Khepri, only in the projection ETS table.

When a binding is deleted, we need to clean up the graph. However, the pattern used to match the trie edges to delete was incorrect, leading to "orphaned" trie edges. The accumulation of these leftovers caused a memory leak.

How

The pattern was fixed to correctly match the appropriate trie edges.

However, this fix alone is effective for new deployments of RabbitMQ only, when the projection function is registered for the first time. We also need to handle the update of already registered projections in existing clusters.

To achieve that, first, we renamed the projection from rabbit_khepri_topic_trie to rabbit_khepri_topic_trie_v2 to distinguish the bad one and the good one. Any updated RabbitMQ nodes in an existing cluster will use this new projection. Other existing out-of-date nodes will continue to use the old projection. Because both projections continue to exist, the cluster will still be affected by the memory leak.

Then, each node will verify on startup if all other cluster members support the new projection. If that is the case, they will unregister the old projection. Therefore, once all nodes in a cluster are up-to-date and use the new projection, the old one will go away and the leaked memory will be reclaimed.

This startup check could have been made simpler with a feature flag. We decided to go with a custom check in case a user would try to upgrade from a 4.1.x release that has the fix to a 4.2.x release that does not for instance. A feature flag would have prevented that upgrade path.

Fixes #15024.

This is an automatic backport of pull request #15025 done by Mergify.

[Why] We use a Khepri projection to compute a graph for bindings that have a topic exchange as their source. This allows more efficient queries during routing. This graph is not stored in Khepri, only in the projection ETS table. When a binding is deleted, we need to clean up the graph. However, the pattern used to match the trie edges to delete was incorrect, leading to "orphaned" trie edges. The accumulation of these leftovers caused a memory leak. [How] The pattern was fixed to correctly match the appropriate trie edges. However, this fix alone is effective for new deployments of RabbitMQ only, when the projection function is registered for the first time. We also need to handle the update of already registered projections in existing clusters. To achieve that, first, we renamed the projection from `rabbit_khepri_topic_trie` to `rabbit_khepri_topic_trie_v2` to distinguish the bad one and the good one. Any updated RabbitMQ nodes in an existing cluster will use this new projection. Other existing out-of-date nodes will continue to use the old projection. Because both projections continue to exist, the cluster will still be affected by the memory leak. Then, each node will verify on startup if all other cluster members support the new projection. If that is the case, they will unregister the old projection. Therefore, once all nodes in a cluster are up-to-date and use the new projection, the old one will go away and the leaked memory will be reclaimed. This startup check could have been made simpler with a feature flag. We decided to go with a custom check in case a user would try to upgrade from a 4.1.x release that has the fix to a 4.2.x release that does not for instance. A feature flag would have prevented that upgrade path. Fixes #15024. (cherry picked from commit 76dcd92)

mergify bot assigned mkuratczyk Dec 4, 2025

michaelklishin added this to the 4.2.2 milestone Dec 4, 2025

michaelklishin merged commit 73e5d93 into v4.2.x Dec 4, 2025
291 checks passed

michaelklishin deleted the mergify/bp/v4.2.x/pr-15025 branch December 4, 2025 15:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rabbit_khepri: Fix topic binding deletion leak (backport #15025) #15063

rabbit_khepri: Fix topic binding deletion leak (backport #15025) #15063

mergify bot commented Dec 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rabbit_khepri: Fix topic binding deletion leak (backport #15025) #15063

rabbit_khepri: Fix topic binding deletion leak (backport #15025) #15063

Conversation

mergify bot commented Dec 4, 2025

Why

How

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants