New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autodiscovery - autosetup of cluster members #24
Comments
Is it necessary for every peer to be aware of another in |
Cluster shares state (list of pinned things) using a consensus algorithm (Raft) so all nodes need to know each other. |
@hsanjuan When starting a bunch at once, why not seed with one node like so: Here's a system I was thinking about for autodiscovery. Suppose we have three nodes: A, B, and C. Discovery takes place as follows: Step 1: SeedingIn order for the system to function, all nodes must be "seeded" with at least one other node. In this diagram, node A is joining the cluster by contacting C.
Step 2: Pass-throughIn this step, peers wishing to join the cluster contact a current member of the cluster. In this diagram, B wishes to join the cluster; thus, B contacts C.
Step 3: RepeatC returns a list of peers it is immediately connected to, and B goes to each of those peers individually and requests a list of peers. In this diagram, B contacts A, but since the only peer A will return is C, and B is already connected to C, this "trail" has gone cold.
This mechanism works really well when considering a more complicated model, like so:
Let's assume that nodes are seeded with Node B. No matter what order nodes connect in, they should all end up in the same cluster. |
Join() takes a peer's multiaddress. It then sends a "PeerAdd()" RPC request with that peer and the local multiaddress by which the remote peer is reached. The the remote peer uses the PeerAdd() functionality to a) Tell the rest of the cluster about the new member and b) send us the list of cluster members. Join() has no REST API support, but it is provided instead via "ipfs-cluster-service --join". A "--leave" flag has been included too, by which a peer triggers "PeerRemove()" on itself when shutting down. This allows a node to join an existing cluster and leave it by the end of its life. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>
@20zinnm thanks for the comments. For the moment I'm not supporting the case where everyone bootstraps from someone else at the same time.
This is a graph search, but in cluster C should know all its nodes and if it doesn't yet it should tell B and everyone else about any modifications (trails may warm up). Since it's not just about having a graph, but about having everyone connected to everyone else, this grows very complex. An easier solution is to have nodes joining running clusters where they can receive the full list of peers from whoever they connect to directly, and where it is possibly to push the new update to all peers. After joining the full peer list is saved to the configuration for future use on start up. This also re-uses most of the PeerAdd() code path, as B is basically requesting the bootstrap node to PeerAdd(B). |
@hsanjuan Why would it need to announce modifications if the seeker needs to contact all nodes anyways? In order to establish connections, it needs to contact the nodes anyways, so it could announce its own "joining." Also, the running cluster means that the ability of a node to join is dependant on (a) how up-to-date the cluster is, and (b) whether or not there is already a cluster. You also make it impossible to create a new cluster automatically without pre-filling peer lists, so my solution is the same as yours in the abstract except the peers are discovered at runtime. |
ok @20zinnm I'm gonna give a try to the way that you propose it. It is not way more complicated and the result is better. |
The only difference in practice is that mine only mandates the seeding of one node, which could be done by a Kubernetes service or whatnot. It requires no persistence and only one entry point. If you would like, I could put together a distributed demo cluster (obviously not IPFS, just a simple demo cluster). |
Join() takes a peer's multiaddress. It then sends a "PeerAdd()" RPC request with that peer and the local multiaddress by which the remote peer is reached. The the remote peer uses the PeerAdd() functionality to a) Tell the rest of the cluster about the new member and b) send us the list of cluster members. Join() has no REST API support, but it is provided instead via "ipfs-cluster-service --join". A "--leave" flag has been included too, by which a peer triggers "PeerRemove()" on itself when shutting down. This allows a node to join an existing cluster and leave it by the end of its life. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>
This reworks PeerAdd() and adds a Join() operation. Join(multiaddress) uses the given bootstrap multiaddress to register itself with it and fetch its list of cluster peers. Then it proceeds to register itself with every peer and fetch their lists of cluster peers. If a cluster peer is not known, then we register ourselves with it again and so on. PeerAdd() is now "simply" registering the new node and sending a Join() request to it. The new node then will fetch the list of peers and register itself with everyone like explained above. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>
@20zinnm I know, my original solution would also work with 1 seeding node which everyone uses. The new one would theoretically work even when using several seeding nodes and everyone connects at the same time. For this the "newcomer checks-in with every node" way that you propose is mentally simpler to me than adapting my original approach. I have a more or less working solution but I wanna give it another round. |
This reworks PeerAdd() and adds a Join() operation. Join(multiaddress) uses the given bootstrap multiaddress to register itself with it and fetch its list of cluster peers. Then it proceeds to register itself with every peer and fetch their lists of cluster peers. If a cluster peer is not known, then we register ourselves with it again and so on. PeerAdd() is now "simply" registering the new node and sending a Join() request to it. The new node then will fetch the list of peers and register itself with everyone like explained above. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>
This reworks PeerAdd() and adds a Join() operation. Join(multiaddress) uses the given bootstrap multiaddress to register itself with it and fetch its list of cluster peers. Then it proceeds to register itself with every peer and fetch their lists of cluster peers. If a cluster peer is not known, then we register ourselves with it again and so on. PeerAdd() is now "simply" registering the new node and sending a Join() request to it. The new node then will fetch the list of peers and register itself with everyone like explained above. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>
This reworks PeerAdd() and adds a Join() operation. Join(multiaddress) uses the given bootstrap multiaddress to register itself with it and fetch its list of cluster peers. Then it proceeds to register itself with every peer and fetch their lists of cluster peers. If a cluster peer is not known, then we register ourselves with it again and so on. PeerAdd() is now "simply" registering the new node and sending a Join() request to it. The new node then will fetch the list of peers and register itself with everyone like explained above. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>
This reworks PeerAdd() and adds a Join() operation. Join(multiaddress) uses the given bootstrap multiaddress to register itself with it and fetch its list of cluster peers. Then it proceeds to register itself with every peer and fetch their lists of cluster peers. If a cluster peer is not known, then we register ourselves with it again and so on. PeerAdd() is now "simply" registering the new node and sending a Join() request to it. The new node then will fetch the list of peers and register itself with everyone like explained above. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>
This reworks PeerAdd() and adds a Join() operation. Join(multiaddress) uses the given bootstrap multiaddress to register itself with it and fetch its list of cluster peers. Then it proceeds to register itself with every peer and fetch their lists of cluster peers. If a cluster peer is not known, then we register ourselves with it again and so on. PeerAdd() is now "simply" registering the new node and sending a Join() request to it. The new node then will fetch the list of peers and register itself with everyone like explained above. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>
This reworks PeerAdd() and adds a Join() operation. Join(multiaddress) uses the given bootstrap multiaddress to register itself with it and fetch its list of cluster peers. Then it proceeds to register itself with every peer and fetch their lists of cluster peers. If a cluster peer is not known, then we register ourselves with it again and so on. PeerAdd() is now "simply" registering the new node and sending a Join() request to it. The new node then will fetch the list of peers and register itself with everyone like explained above. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>
This is the third implementation attempt. This time, rather than broadcasting PeerAdd/Join requests to the whole cluster, we use the consensus log to broadcast new peers joining. This makes it easier to recover from errors and to know who exactly is member of a cluster and who is not. The consensus is, after all, meant to agree on things, and the list of cluster peers is something everyone has to agree on. Raft itself uses a special log operation to maintain the peer set. The tests are almost unchanged from the previous attempts so it should be the same, except it doesn't seem possible to bootstrap a bunch of nodes at the same time using different bootstrap nodes. It works when using the same. I'm not sure this worked before either, but the code is simpler than recursively contacting peers, and scales better for larger clusters. Nodes have to be careful about joining clusters while keeping the state from a different cluster (disjoint logs). This may cause problems with Raft. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>
This is the third implementation attempt. This time, rather than broadcasting PeerAdd/Join requests to the whole cluster, we use the consensus log to broadcast new peers joining. This makes it easier to recover from errors and to know who exactly is member of a cluster and who is not. The consensus is, after all, meant to agree on things, and the list of cluster peers is something everyone has to agree on. Raft itself uses a special log operation to maintain the peer set. The tests are almost unchanged from the previous attempts so it should be the same, except it doesn't seem possible to bootstrap a bunch of nodes at the same time using different bootstrap nodes. It works when using the same. I'm not sure this worked before either, but the code is simpler than recursively contacting peers, and scales better for larger clusters. Nodes have to be careful about joining clusters while keeping the state from a different cluster (disjoint logs). This may cause problems with Raft. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>
This is the third implementation attempt. This time, rather than broadcasting PeerAdd/Join requests to the whole cluster, we use the consensus log to broadcast new peers joining. This makes it easier to recover from errors and to know who exactly is member of a cluster and who is not. The consensus is, after all, meant to agree on things, and the list of cluster peers is something everyone has to agree on. Raft itself uses a special log operation to maintain the peer set. The tests are almost unchanged from the previous attempts so it should be the same, except it doesn't seem possible to bootstrap a bunch of nodes at the same time using different bootstrap nodes. It works when using the same. I'm not sure this worked before either, but the code is simpler than recursively contacting peers, and scales better for larger clusters. Nodes have to be careful about joining clusters while keeping the state from a different cluster (disjoint logs). This may cause problems with Raft. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>
This is the third implementation attempt. This time, rather than broadcasting PeerAdd/Join requests to the whole cluster, we use the consensus log to broadcast new peers joining. This makes it easier to recover from errors and to know who exactly is member of a cluster and who is not. The consensus is, after all, meant to agree on things, and the list of cluster peers is something everyone has to agree on. Raft itself uses a special log operation to maintain the peer set. The tests are almost unchanged from the previous attempts so it should be the same, except it doesn't seem possible to bootstrap a bunch of nodes at the same time using different bootstrap nodes. It works when using the same. I'm not sure this worked before either, but the code is simpler than recursively contacting peers, and scales better for larger clusters. Nodes have to be careful about joining clusters while keeping the state from a different cluster (disjoint logs). This may cause problems with Raft. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>
This is the third implementation attempt. This time, rather than broadcasting PeerAdd/Join requests to the whole cluster, we use the consensus log to broadcast new peers joining. This makes it easier to recover from errors and to know who exactly is member of a cluster and who is not. The consensus is, after all, meant to agree on things, and the list of cluster peers is something everyone has to agree on. Raft itself uses a special log operation to maintain the peer set. The tests are almost unchanged from the previous attempts so it should be the same, except it doesn't seem possible to bootstrap a bunch of nodes at the same time using different bootstrap nodes. It works when using the same. I'm not sure this worked before either, but the code is simpler than recursively contacting peers, and scales better for larger clusters. Nodes have to be careful about joining clusters while keeping the state from a different cluster (disjoint logs). This may cause problems with Raft. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>
@20zinnm went a third way in the end by commiting new members to the distributed log rather than broadcasting and recursively contacting peers. I have realized it is very easy to mess the consensus when someone does not cooperate perfectly or peers have disjoint logs, or different leadership terms etc. Despite locking sections of the code I wasn't very happy with the results and added complexity. As of the latest approach, a cluster should be able to bootstrap from the same node (at the same time). |
@hsanjuan What do you mean by distributed log? What implementation is the log using that wouldn't suffer the same faults? I think it's fine to have everyone join through the same node but it's not scalable in my opinion. My check-in method means you can bootstrap a thousand nodes by giving each a pointer to the one before it, without having to spam a single member for a list. |
Fix #24: Auto-join and auto-leave operations for Cluster
Given a single existing cluster member, a new cluster node should be able to set itself up, retrieve and connect to all members of the cluster.
Note the trickiness of this:
The text was updated successfully, but these errors were encountered: