Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split-brain handling #29

Open
gsvarovsky opened this issue Oct 28, 2020 · 1 comment
Open

Split-brain handling #29

gsvarovsky opened this issue Oct 28, 2020 · 1 comment
Labels
integrity Something that might lose data investigate Extra attention is needed

Comments

@gsvarovsky
Copy link
Member

gsvarovsky commented Oct 28, 2020

A clone is prompted to do the recovery dance when the remotes object revokes its 'live' status, for example if the network is down and the remotes object detects it is partitioned.

In the event of a split-brain, in which a set of clones partitions from another set, e.g. when using MQTT bridges, the remotes object may have no idea that this has occurred. The only clue is after the split heals and a message arrives for which the prev message is missing. All clones in receipt of such a message will then re-connect:

Probably need a much smarter way to detect and recover from this case specifically.

@gsvarovsky gsvarovsky added the integrity Something that might lose data label Oct 28, 2020
@gsvarovsky gsvarovsky added this to To do in m-ld backlog via automation Oct 28, 2020
@gsvarovsky gsvarovsky added the investigate Extra attention is needed label Aug 1, 2021
@gsvarovsky
Copy link
Member Author

gsvarovsky commented Oct 3, 2021

Hypothesis: All of this is solvable inside the Remotes implementation (in JS, a PubSubRemotes or a plain MeldRemotes).

  • Split-brain detection by e.g. ping messages, or anomaly detection, e.g. 5 clones suddenly disappear
  • Recovery methods proactively seek out peers across a recovered split-brain divide

@gsvarovsky gsvarovsky moved this from To do to MVP / launch in m-ld backlog Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrity Something that might lose data investigate Extra attention is needed
Projects
m-ld backlog
Beta / launch
Development

No branches or pull requests

1 participant