rabbit_maintenance: Replicate `rabbit_node_maintenance_states` Mnesia table #9005

dumbbell · 2023-08-03T12:08:52Z

Why

So far, the code only ensured the table existed. Because it is a non-local Mnesia table, its presence on a single node was enough. This is not what we want here: we want the table to be replicated to all nodes across the cluster.

This was detected while working on the integration of Khepri. In our work in progress, the Mnesia table was declared differently and replicated. This caused mixed-version testing to fail because nodes were hanging forever while trying to force-load that Mnesia table. The hang was explained by the fact that the node having that single table copy was stopped or restarted and thus was unavailable, preventing the load of the table.

How

After the table is declared, we use rabbit_table:ensure_table_copy/3 to make sure the table is replicated to the local node. Because all nodes call that boot step, each of them takes care of configuring its copy. In the end, the table is replicated everywhere.

We also try to add replicas on remote nodes that don't have one yet. This reduces the risk of having a node waiting forever that the table becomes available on another node. Failures to add remote replicas are ignored as they should not be fatal and prevent the current node from starting.

… table [Why] So far, the code only ensured the table existed. Because it is a non-local Mnesia table, its presence on a single node was enough. This is not what we want here: we want the table to be replicated to all nodes across the cluster. This was detected while working on the integration of Khepri. In our work in progress, the Mnesia table was declared differently and replicated. This caused mixed-version testing to fail because nodes were hanging forever while trying to force-load that Mnesia table. The hang was explained by the fact that the node having that single table copy was stopped or restarted and thus was unavailable, preventing the load of the table. [How] After the table is declare, we use `rabbit_table:ensure_table_copy/3` to make sure the table is replicated to the local node. Because all nodes call that boot step, each of them takes care of configuring its copy. In the end, the table is replicated everywhere. V2: We also try to add replicas on remote nodes that don't have one yet. This reduces the risk of having a node waiting forever that the table becomes available on another node. Failures to add remote replicas are ignored as they should not be fatal and prevent the current node from starting.

rabbit_maintenance: Replicate `rabbit_node_maintenance_states` Mnesia table (backport #9005) (backport #9010)

…plicated See #9005 for an explanation of the bug.

…plicated See #9005 for an explanation of the bug. (cherry picked from commit ada57c0)

The x_jms_topic_table Mnesia table must be on all nodes for messages to be published to JMS topic exchanges and routed to topic subscribers. The table used to be only in RAM on one node, so it would be unavailable when the node was down and empty when it came back up, losing the state for subscribers still online because connected to other nodes. References #9005

The x_jms_topic_table Mnesia table must be on all nodes for messages to be published to JMS topic exchanges and routed to topic subscribers. The table used to be only in RAM on one node, so it would be unavailable when the node was down and empty when it came back up, losing the state for subscribers still online because connected to other nodes. References #9005 (cherry picked from commit df9fec8)

The x_jms_topic_table Mnesia table must be on all nodes for messages to be published to JMS topic exchanges and routed to topic subscribers. The table used to be only in RAM on one node, so it would be unavailable when the node was down and empty when it came back up, losing the state for subscribers still online because connected to other nodes. References #9005 (cherry picked from commit df9fec8) # Conflicts: # deps/rabbitmq_jms_topic_exchange/src/rabbit_db_jms_exchange.erl

The x_jms_topic_table Mnesia table must be on all nodes for messages to be published to JMS topic exchanges and routed to topic subscribers. The table used to be only in RAM on one node, so it would be unavailable when the node was down and empty when it came back up, losing the state for subscribers still online because connected to other nodes. Inspired by a similar change for the node maintenance status table in #9005.

dumbbell added this to the 3.13.0 milestone Aug 3, 2023

dumbbell self-assigned this Aug 3, 2023

dumbbell added backport-v3.11.x backport-v3.12.x labels Aug 3, 2023

dumbbell force-pushed the replicate-rabbit_node_maintenance_states-table branch 2 times, most recently from 6ba5596 to 31d8bfe Compare August 4, 2023 12:09

dumbbell force-pushed the replicate-rabbit_node_maintenance_states-table branch from 31d8bfe to b82ff37 Compare August 4, 2023 15:12

dumbbell marked this pull request as ready for review August 4, 2023 15:42

dumbbell merged commit d489fc9 into main Aug 4, 2023
16 checks passed

dumbbell deleted the replicate-rabbit_node_maintenance_states-table branch August 4, 2023 15:42

mergify bot mentioned this pull request Aug 4, 2023

rabbit_maintenance: Replicate rabbit_node_maintenance_states Mnesia table (backport #9005) #9010

Merged

dumbbell added a commit that referenced this pull request Aug 7, 2023

Merge pull request #9011 from rabbitmq/mergify/bp/v3.11.x/pr-9010

a70c637

rabbit_maintenance: Replicate `rabbit_node_maintenance_states` Mnesia table (backport #9005) (backport #9010)

dumbbell added a commit that referenced this pull request Aug 7, 2023

per_vhost_connection_limit_SUITE: Ensure maintenance mode table is re…

ada57c0

…plicated See #9005 for an explanation of the bug.

mergify bot pushed a commit that referenced this pull request Aug 7, 2023

per_vhost_connection_limit_SUITE: Ensure maintenance mode table is re…

344afcc

…plicated See #9005 for an explanation of the bug. (cherry picked from commit ada57c0)

dumbbell mentioned this pull request Sep 21, 2023

Use 3.12.6 for the secondary umbrella #8341

Merged

acogoluegnes mentioned this pull request Apr 25, 2024

Replicate x_jms_topic_table Mnesia table #11087

Merged

This was referenced Apr 25, 2024

Replicate x_jms_topic_table Mnesia table (backport #11087) #11091

Merged

Replicate x_jms_topic_table Mnesia table (backport #11087) #11092

Merged

Replicate x_jms_topic_table Mnesia table (backport #11087) #11093

Closed

acogoluegnes mentioned this pull request Apr 26, 2024

Replicate x_jms_topic_table Mnesia table (backport #11087) #11097

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rabbit_maintenance: Replicate `rabbit_node_maintenance_states` Mnesia table #9005

rabbit_maintenance: Replicate `rabbit_node_maintenance_states` Mnesia table #9005

dumbbell commented Aug 3, 2023 •

edited

Loading

rabbit_maintenance: Replicate rabbit_node_maintenance_states Mnesia table #9005

rabbit_maintenance: Replicate rabbit_node_maintenance_states Mnesia table #9005

Conversation

dumbbell commented Aug 3, 2023 • edited Loading

Why

How

rabbit_maintenance: Replicate `rabbit_node_maintenance_states` Mnesia table #9005

rabbit_maintenance: Replicate `rabbit_node_maintenance_states` Mnesia table #9005

dumbbell commented Aug 3, 2023 •

edited

Loading