Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rabbit_maintenance: Replicate rabbit_node_maintenance_states Mnesia table #9005

Merged
merged 1 commit into from
Aug 4, 2023

Conversation

dumbbell
Copy link
Member

@dumbbell dumbbell commented Aug 3, 2023

Why

So far, the code only ensured the table existed. Because it is a non-local Mnesia table, its presence on a single node was enough. This is not what we want here: we want the table to be replicated to all nodes across the cluster.

This was detected while working on the integration of Khepri. In our work in progress, the Mnesia table was declared differently and replicated. This caused mixed-version testing to fail because nodes were hanging forever while trying to force-load that Mnesia table. The hang was explained by the fact that the node having that single table copy was stopped or restarted and thus was unavailable, preventing the load of the table.

How

After the table is declared, we use rabbit_table:ensure_table_copy/3 to make sure the table is replicated to the local node. Because all nodes call that boot step, each of them takes care of configuring its copy. In the end, the table is replicated everywhere.

We also try to add replicas on remote nodes that don't have one yet. This reduces the risk of having a node waiting forever that the table becomes available on another node. Failures to add remote replicas are ignored as they should not be fatal and prevent the current node from starting.

@dumbbell dumbbell added this to the 3.13.0 milestone Aug 3, 2023
@dumbbell dumbbell self-assigned this Aug 3, 2023
@dumbbell dumbbell force-pushed the replicate-rabbit_node_maintenance_states-table branch 2 times, most recently from 6ba5596 to 31d8bfe Compare August 4, 2023 12:09
… table

[Why]
So far, the code only ensured the table existed. Because it is a
non-local Mnesia table, its presence on a single node was enough. This
is not what we want here: we want the table to be replicated to all
nodes across the cluster.

This was detected while working on the integration of Khepri. In our
work in progress, the Mnesia table was declared differently and
replicated. This caused mixed-version testing to fail because nodes were
hanging forever while trying to force-load that Mnesia table. The hang
was explained by the fact that the node having that single table copy
was stopped or restarted and thus was unavailable, preventing the load
of the table.

[How]
After the table is declare, we use `rabbit_table:ensure_table_copy/3` to
make sure the table is replicated to the local node. Because all nodes
call that boot step, each of them takes care of configuring its copy. In
the end, the table is replicated everywhere.

V2: We also try to add replicas on remote nodes that don't have one yet.
    This reduces the risk of having a node waiting forever that the
    table becomes available on another node. Failures to add remote
    replicas are ignored as they should not be fatal and prevent the
    current node from starting.
@dumbbell dumbbell force-pushed the replicate-rabbit_node_maintenance_states-table branch from 31d8bfe to b82ff37 Compare August 4, 2023 15:12
@dumbbell dumbbell marked this pull request as ready for review August 4, 2023 15:42
@dumbbell dumbbell merged commit d489fc9 into main Aug 4, 2023
16 checks passed
@dumbbell dumbbell deleted the replicate-rabbit_node_maintenance_states-table branch August 4, 2023 15:42
dumbbell added a commit that referenced this pull request Aug 7, 2023
rabbit_maintenance: Replicate `rabbit_node_maintenance_states` Mnesia table (backport #9005) (backport #9010)
dumbbell added a commit that referenced this pull request Aug 7, 2023
mergify bot pushed a commit that referenced this pull request Aug 7, 2023
…plicated

See #9005 for an explanation of the bug.

(cherry picked from commit ada57c0)
acogoluegnes added a commit that referenced this pull request Apr 25, 2024
The x_jms_topic_table Mnesia table must be on all nodes
for messages to be published to JMS topic exchanges
and routed to topic subscribers.

The table used to be only in RAM on one node, so it would
be unavailable when the node was down and empty
when it came back up, losing the state for subscribers
still online because connected to other nodes.

References #9005
acogoluegnes added a commit that referenced this pull request Apr 25, 2024
The x_jms_topic_table Mnesia table must be on all nodes
for messages to be published to JMS topic exchanges
and routed to topic subscribers.

The table used to be only in RAM on one node, so it would
be unavailable when the node was down and empty
when it came back up, losing the state for subscribers
still online because connected to other nodes.

References #9005
mergify bot pushed a commit that referenced this pull request Apr 25, 2024
The x_jms_topic_table Mnesia table must be on all nodes
for messages to be published to JMS topic exchanges
and routed to topic subscribers.

The table used to be only in RAM on one node, so it would
be unavailable when the node was down and empty
when it came back up, losing the state for subscribers
still online because connected to other nodes.

References #9005

(cherry picked from commit df9fec8)
mergify bot pushed a commit that referenced this pull request Apr 25, 2024
The x_jms_topic_table Mnesia table must be on all nodes
for messages to be published to JMS topic exchanges
and routed to topic subscribers.

The table used to be only in RAM on one node, so it would
be unavailable when the node was down and empty
when it came back up, losing the state for subscribers
still online because connected to other nodes.

References #9005

(cherry picked from commit df9fec8)
mergify bot pushed a commit that referenced this pull request Apr 25, 2024
The x_jms_topic_table Mnesia table must be on all nodes
for messages to be published to JMS topic exchanges
and routed to topic subscribers.

The table used to be only in RAM on one node, so it would
be unavailable when the node was down and empty
when it came back up, losing the state for subscribers
still online because connected to other nodes.

References #9005

(cherry picked from commit df9fec8)

# Conflicts:
#	deps/rabbitmq_jms_topic_exchange/src/rabbit_db_jms_exchange.erl
acogoluegnes added a commit that referenced this pull request Apr 26, 2024
The x_jms_topic_table Mnesia table must be on all nodes for messages to
be published to JMS topic exchanges and routed to topic subscribers.

The table used to be only in RAM on one node, so it would be
unavailable when the node was down and empty when it came back up,
losing the state for subscribers still online because connected to other nodes.

Inspired by a similar change for the node maintenance status table in #9005.
acogoluegnes added a commit that referenced this pull request Apr 26, 2024
The x_jms_topic_table Mnesia table must be on all nodes for messages to
be published to JMS topic exchanges and routed to topic subscribers.

The table used to be only in RAM on one node, so it would be
unavailable when the node was down and empty when it came back up,
losing the state for subscribers still online because connected to other nodes.

Inspired by a similar change for the node maintenance status table in #9005.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant