1107 fix cluster conf sync wait loop #11897
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes EMQX-11329
When EMQX boots up, it tries to get latest config from peer (core type)
nodes, if none of the nodes are replying, the node will decide
to boot with local config (and replay the committed changes) if
the commit table is loaded from disk locally (an indication of the
data being latest), otherwise it will sleep for 1-2 seconds and
retry.
This lead to a race condition, e.g. in a two nodes cluster:
Now that both node1 and node2 has the mnesia
load_node
pointingto each other (i.e. not a local disk load).
Prior to this fix, the nodes would wait for each other in a dead loop.
This commit fixes the issue by allowing node to boot
with local config if it does not have a lagging.
Summary
馃 Generated by Copilot at 7bfad34
This pull request improves the cluster configuration synchronization and booting process by using a new module
emqx_cluster_rpc
that provides better functions to query and update the cluster state. It also refactors and cleans up the code inemqx_conf_app
and enhances the logging messages.PR Checklist
Please convert it to a draft if any of the following conditions are not met. Reviewers may skip over until all the items are checked:
changes/(ce|ee)/(feat|perf|fix|breaking)-<PR-id>.en.md
filesChecklist for CI (.github/workflows) changes
changes/
dir for user-facing artifacts update