Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

network splits #5

Closed
michielbdejong opened this issue May 14, 2024 · 4 comments
Closed

network splits #5

michielbdejong opened this issue May 14, 2024 · 4 comments

Comments

@michielbdejong
Copy link
Contributor

https://github.com/gotchoices/ChipNet/blob/master/doc/cluster.md#dead-referees

Dead referees

If a majority of referees go offline indefinitely, the participants are stuck with locked resources. Note that this requires all of said majority referees to not be reachable through any potential path from any of the participants, in perpetuity; which essentially means they are forever lost from the network. If even a single participant is able to reach a given referee, it will share the received record with the others.

"go offline" doesn't need to be an all-or-nothing predicate. Suppose there are 9 referees, and 5 referees make a majority. You would say 5 referees need to be offline or misbehaving before "dead referees" becomes a problem. But what about network splits caused by failure of some of the links between nodes? What if the referees are split: 3 in the Americas, 3 in Europe and 3 in Asia. Suppose each node can only talk to the referees in their own continent. Then everybody's money will also be tied up forever.

@michielbdejong
Copy link
Contributor Author

Put another way though, the network damage needs to be pretty severe before this happens. In an 9 x 9 connected mesh there are 9 * 8 = 73 links. A topology where no consensus is reached would be 4 x 4 + 4 x 4 + 1 so that would be 4 * 3 + 4 * 3 = 24 links (i.e. 49 links down), or 3 x 3 + 3 x 3 + 3 x 3 which has only 3 * 3 * 2 = 18 links (i.e. 55 links down).

In this sense, mesh communication is way more robust than the ring communication which LedgerLoops uses. There, a ring of 9 participants would have 8 links, and only 2 of those 8 need to go down simultaneously to cut off some of the participants from the loop initiator.

@n8allan
Copy link
Collaborator

n8allan commented May 14, 2024

A ring definitely reduces this problem, because a node has to be cut off on both ends, but you are correct that it is still possible. There are a few potential ways to address this, including:

  • Route-around the disconnect(s). By asking for a direct route with a communications intent (not lift), a communication channel using whatever nodes, is possible. This isn't part of our implementation yet, but the components are there for it.
  • Opt for at least one physically reachable external (non-participant) referee in addition to the participants. We anticipate that a likely pattern will be for participants to select one or two well-known service providers to be included as referees. With such, the topology becomes hub and spoke, greatly decreasing the chance of being cut off.

@michielbdejong
Copy link
Contributor Author

A ring definitely reduces this problem

Compared to a tree, yes. I would say a ring is more connected than tree, but a mesh is more connected than a ring.

Opt for at least one physically reachable external (non-participant) referee

Right, I hadn't taken into account the differences in uptime of different servers. Of course network splits between peer-to-peer software running on user devices are much more likely than when a professionally hosted server is involved. Still, the theoretical analysis is the same: nodes that are in a "minority" island during a network split will not get a final answer until the network comes back up ("minority" being defined as any group that does not have majority voting power).

I opened this issue because maybe you wanted to add "Network Splits" as an additional Exception since a situation with two islands of communication is different from a situation with Dead Referees. If you want I can create a PR to add it?

@n8allan
Copy link
Collaborator

n8allan commented May 25, 2024

Great suggestion Michiel, I added a paragraph for the split network scenario.

@n8allan n8allan closed this as completed Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants