Skip to content

Commit

Permalink
Remove randomized startup delays
Browse files Browse the repository at this point in the history
On initial cluster formation, only one node in a multi node cluster
should initialize the Mnesia database schema (i.e. form the cluster).
To ensure that for nodes starting up in parallel,
RabbitMQ peer discovery backends have used
either locks or randomized startup delays.

Locks work great: When a node holds the lock, it either starts a new
blank node (if there is no other node in the cluster), or it joins
an existing node. This makes it impossible to have two nodes forming
the cluster at the same time.
Consul and etcd peer discovery backends use locks. The lock is acquired
in the consul and etcd infrastructure, respectively.

For other peer discovery backends (classic, DNS, AWS), randomized
startup delays were used. They work good enough in most cases.
However, in rabbitmq/cluster-operator#662 we
observed that in 1% - 10% of the cases (the more nodes or the
smaller the randomized startup delay range, the higher the chances), two
nodes decide to form the cluster. That's bad since it will end up in a
single Erlang cluster, but in two RabbitMQ clusters. Even worse, no
obvious alert got triggered or error message logged.

To solve this issue, one could increase the randomized startup delay
range from e.g. 0m - 1m to 0m - 3m. However, this makes initial cluster
formation very slow since it will take up to 3 minutes until
every node is ready. In rare cases, we still end up with two nodes
forming the cluster.

Another way to solve the problem is to name a dedicated node to be the
seed node (forming the cluster). This was explored in
rabbitmq/cluster-operator#689 and works well.
Two minor downsides to this approach are: 1. If the seed node never
becomes available, the whole cluster won't be formed (which is okay),
and 2. it doesn't integrate with existing dynamic peer discovery backends
(e.g. K8s, AWS) since nodes are not yet known at deploy time.

In this commit, we take a better approach: We remove randomized startup
delays altogether. We replace them with locks. However, instead of
implementing our own lock implementation in an external system (e.g. in K8s),
we re-use Erlang's locking mechanism global:set_lock/3.

global:set_lock/3 has some convenient properties:
1. It accepts a list of nodes to set the lock on.
2. The nodes in that list connect to each other (i.e. create an Erlang
cluster).
3. The method is synchronous with a timeout (number of retries). It
blocks until the lock becomes available.
4. If a process that holds a lock dies, or the node goes down, the lock
held by the process is deleted.

The list of nodes passed to global:set_lock/3 corresponds to the nodes
the peer discovery backend discovers (lists).

Two special cases worth mentioning:

1. That list can be all desired nodes in the cluster
(e.g. in classic peer discovery where nodes are known at
deploy time) while only a subset of nodes is available.
In that case, global:set_lock/3 still sets the lock not
blocking until all nodes can be connected to. This is good since
nodes might start sequentially (non-parallel).

2. In dynamic peer discovery backends (e.g. K8s, AWS), this
list can be just a subset of desired nodes since nodes might not startup
in parallel. That's also not a problem as long as the following
requirement is met: "The peer disovery backend does not list two disjoint
sets of nodes (on different nodes) at the same time."
For example, in a 2-node cluster, the peer discovery backend must not
list only node 1 on node 1 and only node 2 on node 2.

Existing peer discovery backends fullfil that requirement because the
resource the nodes are discovered from is global.
For example, in K8s, once node 1 is part of the Endpoints object, it
will be returned on both node 1 and node 2.
Likewise, in AWS, once node 1 started, the described list of instances
with a specific tag will include node 1 when the AWS peer discovery backend
runs on node 1 or node 2.

Removing randomized startup delays also makes cluster formation
considerably faster (up to 1 minute faster if that was the
upper bound in the range).
  • Loading branch information
ansd committed Jun 2, 2021
1 parent 2a217f0 commit 08b5aca
Show file tree
Hide file tree
Showing 17 changed files with 327 additions and 249 deletions.
36 changes: 21 additions & 15 deletions deps/rabbit/priv/schema/rabbit.schema
Original file line number Diff line number Diff line change
Expand Up @@ -945,31 +945,37 @@ fun(Conf) ->
end}.

%% Cluster formation: Randomized startup delay
%%
%% DEPRECATED: This is a no-op. Old configs are still allowed, but a warning will be printed.

{mapping, "cluster_formation.randomized_startup_delay_range.min", "rabbit.cluster_formation.randomized_startup_delay_range",
[{datatype, integer}]}.
{mapping, "cluster_formation.randomized_startup_delay_range.max", "rabbit.cluster_formation.randomized_startup_delay_range",
[{datatype, integer}]}.
{mapping, "cluster_formation.randomized_startup_delay_range.min", "rabbit.cluster_formation.randomized_startup_delay_range", []}.
{mapping, "cluster_formation.randomized_startup_delay_range.max", "rabbit.cluster_formation.randomized_startup_delay_range", []}.

{translation, "rabbit.cluster_formation.randomized_startup_delay_range",
fun(Conf) ->
Min = cuttlefish:conf_get("cluster_formation.randomized_startup_delay_range.min", Conf, undefined),
Max = cuttlefish:conf_get("cluster_formation.randomized_startup_delay_range.max", Conf, undefined),

case {Min, Max} of
{undefined, undefined} ->
cuttlefish:unset();
{undefined, Max} ->
%% fallback default
{5, Max};
{Min, undefined} ->
%% fallback default
{Min, 60};
{Min, Max} ->
{Min, Max}
end
{undefined, undefined} ->
ok;
_ ->
cuttlefish:warn("cluster_formation.randomized_startup_delay_range.min and "
"cluster_formation.randomized_startup_delay_range.max are deprecated")
end,
cuttlefish:unset()
end}.

%% Cluster formation: lock acquisition retries as passed to https://erlang.org/doc/man/global.html#set_lock-3
%%
%% Currently used in classic, k8s, and aws peer discovery backends.

{mapping, "cluster_formation.erlang_lock_retries", "rabbit.cluster_formation.erlang_lock_retries",
[
{datatype, integer},
{validators, ["non_zero_positive_integer"]}
]}.

%% Cluster formation: discovery failure retries

{mapping, "cluster_formation.lock_retry_limit", "rabbit.cluster_formation.lock_retry_limit",
Expand Down
15 changes: 6 additions & 9 deletions deps/rabbit/src/rabbit_mnesia.erl
Original file line number Diff line number Diff line change
Expand Up @@ -105,21 +105,18 @@ init_with_lock(Retries, Timeout, RunPeerDiscovery) ->
rabbit_log:debug("rabbit_peer_discovery:lock returned ~p", [LockResult]),
case LockResult of
not_supported ->
rabbit_log:info("Peer discovery backend does not support locking, falling back to randomized delay"),
%% See rabbitmq/rabbitmq-server#1202 for details.
rabbit_peer_discovery:maybe_inject_randomized_delay(),
RunPeerDiscovery(),
rabbit_peer_discovery:maybe_register();
{error, _Reason} ->
timer:sleep(Timeout),
init_with_lock(Retries - 1, Timeout, RunPeerDiscovery);
{ok, Data} ->
try
RunPeerDiscovery(),
rabbit_peer_discovery:maybe_register()
after
rabbit_peer_discovery:unlock(Data)
end
end;
{error, _Reason} ->
timer:sleep(Timeout),
init_with_lock(Retries - 1, Timeout, RunPeerDiscovery)
end.

-spec run_peer_discovery() -> ok | {[node()], node_type()}.
Expand Down Expand Up @@ -178,7 +175,7 @@ join_discovered_peers(TryNodes, NodeType) ->
join_discovered_peers_with_retries(TryNodes, NodeType, RetriesLeft, DelayInterval).

join_discovered_peers_with_retries(TryNodes, _NodeType, 0, _DelayInterval) ->
rabbit_log:warning(
rabbit_log:info(
"Could not successfully contact any node of: ~s (as in Erlang distribution). "
"Starting as a blank standalone node...",
[string:join(lists:map(fun atom_to_list/1, TryNodes), ",")]),
Expand All @@ -193,7 +190,7 @@ join_discovered_peers_with_retries(TryNodes, NodeType, RetriesLeft, DelayInterva
rabbit_node_monitor:notify_joined_cluster();
none ->
RetriesLeft1 = RetriesLeft - 1,
rabbit_log:error("Trying to join discovered peers failed. Will retry after a delay of ~b ms, ~b retries left...",
rabbit_log:info("Trying to join discovered peers failed. Will retry after a delay of ~b ms, ~b retries left...",
[DelayInterval, RetriesLeft1]),
timer:sleep(DelayInterval),
join_discovered_peers_with_retries(TryNodes, NodeType, RetriesLeft1, DelayInterval)
Expand Down
61 changes: 1 addition & 60 deletions deps/rabbit/src/rabbit_peer_discovery.erl
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,7 @@
-export([maybe_init/0, discover_cluster_nodes/0, backend/0, node_type/0,
normalize/1, format_discovered_nodes/1, log_configured_backend/0,
register/0, unregister/0, maybe_register/0, maybe_unregister/0,
maybe_inject_randomized_delay/0, lock/0, unlock/1,
discovery_retries/0]).
lock/0, unlock/1, discovery_retries/0]).
-export([append_node_prefix/1, node_prefix/0, locking_retry_timeout/0,
lock_acquisition_failure_mode/0]).

Expand All @@ -28,9 +27,6 @@
%% default node prefix to attach to discovered hostnames
-define(DEFAULT_PREFIX, "rabbit").

%% default randomized delay range, in seconds
-define(DEFAULT_STARTUP_RANDOMIZED_DELAY, {5, 60}).

%% default discovery retries and interval.
-define(DEFAULT_DISCOVERY_RETRY_COUNT, 10).
-define(DEFAULT_DISCOVERY_RETRY_INTERVAL_MS, 500).
Expand Down Expand Up @@ -159,61 +155,6 @@ discovery_retries() ->
{?DEFAULT_DISCOVERY_RETRY_COUNT, ?DEFAULT_DISCOVERY_RETRY_INTERVAL_MS}
end.


-spec maybe_inject_randomized_delay() -> ok.
maybe_inject_randomized_delay() ->
Backend = backend(),
case Backend:supports_registration() of
true ->
rabbit_log:info("Peer discovery backend ~s supports registration.", [Backend]),
inject_randomized_delay();
false ->
rabbit_log:info("Peer discovery backend ~s does not support registration, skipping randomized startup delay.", [Backend]),
ok
end.

-spec inject_randomized_delay() -> ok.

inject_randomized_delay() ->
{Min, Max} = randomized_delay_range_in_ms(),
case {Min, Max} of
%% When the max value is set to 0, consider the delay to be disabled.
%% In addition, `rand:uniform/1` will fail with a "no function clause"
%% when the argument is 0.
{_, 0} ->
rabbit_log:info("Randomized delay range's upper bound is set to 0. Considering it disabled."),
ok;
{_, N} when is_number(N) ->
rand:seed(exsplus),
RandomVal = rand:uniform(round(N)),
rabbit_log:debug("Randomized startup delay: configured range is from ~p to ~p milliseconds, PRNG pick: ~p...",
[Min, Max, RandomVal]),
Effective = case RandomVal < Min of
true -> Min;
false -> RandomVal
end,
rabbit_log:info("Will wait for ~p milliseconds before proceeding with registration...", [Effective]),
timer:sleep(Effective),
ok
end.

-spec randomized_delay_range_in_ms() -> {integer(), integer()}.

randomized_delay_range_in_ms() ->
Backend = backend(),
Default = case erlang:function_exported(Backend, randomized_startup_delay_range, 0) of
true -> Backend:randomized_startup_delay_range();
false -> ?DEFAULT_STARTUP_RANDOMIZED_DELAY
end,
{Min, Max} = case application:get_env(rabbit, cluster_formation) of
{ok, Proplist} ->
proplists:get_value(randomized_startup_delay_range, Proplist, Default);
undefined ->
Default
end,
{Min * 1000, Max * 1000}.


-spec register() -> ok.

register() ->
Expand Down
62 changes: 42 additions & 20 deletions deps/rabbit/src/rabbit_peer_discovery_classic_config.erl
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@
-export([list_nodes/0, supports_registration/0, register/0, unregister/0,
post_registration/0, lock/1, unlock/1]).

%% TODO ansd: see todo below
-define(DEFAULT_LOCK_RETRIES, 80).
%%
%% API
%%
Expand All @@ -26,12 +28,46 @@ list_nodes() ->
Nodes when is_list(Nodes) -> {ok, {Nodes, disc}}
end.

-spec lock(Node :: atom()) -> {ok, {ResourceId :: term(), LockRequesterId :: term()}} | {error, Reason :: string()}.

lock(Node) ->
%% TODO ansd: instead of below lines, ideally we reuse functions rabbit_peer_discovery_util:lock_id/1 and
%% rabbit_peer_discovery_util:lock_retries/0
%% When using these two functions, executing tests failed since node couldn't boot: undef rabbit_peer_discovery_util:lock_id/1
%% Can the rabbit app depend on the rabbit_peer_discovery_common_app?
%% If not, should we move these two functions to some other place (e.g. rabbit_common)?
LockId = {rabbit_nodes:cookie_hash(), Node},
Retries = case application:get_env(rabbit, cluster_formation) of
{ok, PropList} ->
proplists:get_value(erlang_lock_retries, PropList, ?DEFAULT_LOCK_RETRIES);
undefined ->
?DEFAULT_LOCK_RETRIES
end,

Nodes = node_list(),
case lists:member(Node, Nodes) of
false when Nodes =/= [] ->
rabbit_log:warning("Local node ~s is not part of configured nodes ~p. "
"This might lead to incorrect cluster formation.", [Node, Nodes]);
_ -> ok
end,
case global:set_lock(LockId, Nodes, Retries) of
true ->
{ok, LockId};
false ->
{error, io_lib:format("Acquiring lock taking too long, bailing out after ~b retries", [Retries])}
end.

-spec unlock(Data :: term()) -> ok.

unlock(LockId) ->
global:del_lock(LockId, node_list()),
ok.

-spec supports_registration() -> boolean().

supports_registration() ->
%% If we don't have any nodes configured, skip randomized delay and similar operations
%% as we don't want to delay startup for no reason. MK.
has_any_peer_nodes_configured().
false.

-spec register() -> ok.

Expand All @@ -48,28 +84,14 @@ unregister() ->
post_registration() ->
ok.

-spec lock(Node :: atom()) -> not_supported.

lock(_Node) ->
not_supported.

-spec unlock(Data :: term()) -> ok.

unlock(_Data) ->
ok.

%%
%% Helpers
%%

has_any_peer_nodes_configured() ->
node_list() ->
case application:get_env(rabbit, cluster_nodes, []) of
{[], _NodeType} ->
false;
{Nodes, _NodeType} when is_list(Nodes) ->
true;
[] ->
false;
Nodes;
Nodes when is_list(Nodes) ->
true
Nodes
end.
17 changes: 8 additions & 9 deletions deps/rabbit/test/config_schema_SUITE_data/rabbit.snippets
Original file line number Diff line number Diff line change
Expand Up @@ -540,23 +540,22 @@ tcp_listen_options.exit_on_close = false",
{cluster_formation_randomized_startup_delay_both_values,
"cluster_formation.randomized_startup_delay_range.min = 10
cluster_formation.randomized_startup_delay_range.max = 30",
[{rabbit, [{cluster_formation, [
{randomized_startup_delay_range, {10, 30}}
]}]}],
[],
[]},

{cluster_formation_randomized_startup_delay_min_only,
"cluster_formation.randomized_startup_delay_range.min = 10",
[{rabbit, [{cluster_formation, [
{randomized_startup_delay_range, {10, 60}}
]}]}],
[],
[]},

{cluster_formation_randomized_startup_delay_max_only,
"cluster_formation.randomized_startup_delay_range.max = 30",
[{rabbit, [{cluster_formation, [
{randomized_startup_delay_range, {5, 30}}
]}]}],
[],
[]},

{cluster_formation_erlang_lock_retries,
"cluster_formation.erlang_lock_retries = 10",
[{rabbit,[{cluster_formation,[{erlang_lock_retries,10}]}]}],
[]},

{cluster_formation_dns,
Expand Down
Loading

0 comments on commit 08b5aca

Please sign in to comment.