Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autodiscovery - autosetup of cluster members #24

Closed
hsanjuan opened this issue Jan 2, 2017 · 10 comments
Closed

Autodiscovery - autosetup of cluster members #24

hsanjuan opened this issue Jan 2, 2017 · 10 comments
Assignees

Comments

@hsanjuan
Copy link
Collaborator

hsanjuan commented Jan 2, 2017

Given a single existing cluster member, a new cluster node should be able to set itself up, retrieve and connect to all members of the cluster.

Note the trickiness of this:

  • At least some components are not ready to work or should not attempt to work (consensus) before receiving the list of peers
  • It's not as easy as connecting to someone and retrieving current cluster members (i.e. would fail when starting a bunch of new members at the same time). Also involves broadcasting that a new node is available.
  • Need to give more thought. Should aim for something simple and straightfoward. Errors during automated setup are the worst.
@meyerzinn
Copy link

Is it necessary for every peer to be aware of another in ipfs-cluster? I was imagining something like a private DHT that assigns blocks to nodes.

@hsanjuan
Copy link
Collaborator Author

Cluster shares state (list of pinned things) using a consensus algorithm (Raft) so all nodes need to know each other.

@meyerzinn
Copy link

meyerzinn commented Jan 30, 2017

@hsanjuan When starting a bunch at once, why not seed with one node like so:

Here's a system I was thinking about for autodiscovery. Suppose we have three nodes: A, B, and C. Discovery takes place as follows:

Step 1: Seeding

In order for the system to function, all nodes must be "seeded" with at least one other node. In this diagram, node A is joining the cluster by contacting C.

                            +--------+
                            |        |
                            | Node B |
                            |        |
+--------+                  +--------+
|        |
| Node A |
|        |
+----+---+
     |
     |
     |                        +--------+
     |                        |        |
     +------------------------> Node C |
                              |        |
                              +--------+

Step 2: Pass-through

In this step, peers wishing to join the cluster contact a current member of the cluster. In this diagram, B wishes to join the cluster; thus, B contacts C.

                            +--------+
                            |        |
                            | Node B |
                            |        |
+--------+                  +----+---+
|        |                       |
| Node A |                       |
|        |                       |
+----+---+                       |
     |                           |
     |                           |
     |                        +--v-----+
     |                        |        |
     +------------------------> Node C |
                              |        |
                              +--------+

Step 3: Repeat

C returns a list of peers it is immediately connected to, and B goes to each of those peers individually and requests a list of peers. In this diagram, B contacts A, but since the only peer A will return is C, and B is already connected to C, this "trail" has gone cold.

                            +--------+
                            |        |
     +----------------------+ Node B |
     |                      |        |
+----v---+                  +----+---+
|        |                       |
| Node A |                       |
|        |                       |
+----+---+                       |
     |                           |
     |                           |
     |                        +--v-----+
     |                        |        |
     +------------------------> Node C |
                              |        |
                              +--------+

This mechanism works really well when considering a more complicated model, like so:

                       +--------+
                       |        |
                       | Node A |
                       |        |
                       +---+----+
                           |
+--------+             +---v----+           +--------+
|        |             |        |           |        |
| Node D +-------------> Node B |           | Node F |
|        |             |        |           |        |
+---^----+             +----^---+           +---+----+
    |                       |                   |
    |                  +----+---+               |
    |                  |        |               |
    |                  | Node C <---------------+
    |                  |        |
+---+----+             +--------+
|        |
| Node E |
|        |
+--------+

Let's assume that nodes are seeded with Node B. No matter what order nodes connect in, they should all end up in the same cluster.

@hsanjuan hsanjuan added status/in-progress In progress and removed status/ready Ready to be worked labels Feb 2, 2017
hsanjuan added a commit that referenced this issue Feb 2, 2017
Join() takes a peer's multiaddress. It then sends a "PeerAdd()"
RPC request with that peer and the local multiaddress by which
the remote peer is reached. The the remote peer uses the PeerAdd()
functionality to a) Tell the rest of the cluster about the new member
and b) send us the list of cluster members.

Join() has no REST API support, but it is provided instead via
"ipfs-cluster-service --join". A "--leave" flag has been included too,
by which a peer triggers "PeerRemove()" on itself when shutting down.

This allows a node to join an existing cluster and leave it by the
end of its life.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
@hsanjuan hsanjuan self-assigned this Feb 2, 2017
@hsanjuan hsanjuan added need/review Needs a review and removed status/in-progress In progress labels Feb 2, 2017
@hsanjuan
Copy link
Collaborator Author

hsanjuan commented Feb 2, 2017

@20zinnm thanks for the comments. For the moment I'm not supporting the case where everyone bootstraps from someone else at the same time.

C returns a list of peers it is immediately connected to, and B goes to each of those peers individually and requests a list of peers

This is a graph search, but in cluster C should know all its nodes and if it doesn't yet it should tell B and everyone else about any modifications (trails may warm up). Since it's not just about having a graph, but about having everyone connected to everyone else, this grows very complex.

An easier solution is to have nodes joining running clusters where they can receive the full list of peers from whoever they connect to directly, and where it is possibly to push the new update to all peers. After joining the full peer list is saved to the configuration for future use on start up. This also re-uses most of the PeerAdd() code path, as B is basically requesting the bootstrap node to PeerAdd(B).

@meyerzinn
Copy link

meyerzinn commented Feb 2, 2017

@hsanjuan Why would it need to announce modifications if the seeker needs to contact all nodes anyways? In order to establish connections, it needs to contact the nodes anyways, so it could announce its own "joining."

Also, the running cluster means that the ability of a node to join is dependant on (a) how up-to-date the cluster is, and (b) whether or not there is already a cluster. You also make it impossible to create a new cluster automatically without pre-filling peer lists, so my solution is the same as yours in the abstract except the peers are discovered at runtime.

@hsanjuan
Copy link
Collaborator Author

hsanjuan commented Feb 3, 2017

ok @20zinnm I'm gonna give a try to the way that you propose it. It is not way more complicated and the result is better.

@meyerzinn
Copy link

The only difference in practice is that mine only mandates the seeding of one node, which could be done by a Kubernetes service or whatnot. It requires no persistence and only one entry point. If you would like, I could put together a distributed demo cluster (obviously not IPFS, just a simple demo cluster).

hsanjuan added a commit that referenced this issue Feb 3, 2017
Join() takes a peer's multiaddress. It then sends a "PeerAdd()"
RPC request with that peer and the local multiaddress by which
the remote peer is reached. The the remote peer uses the PeerAdd()
functionality to a) Tell the rest of the cluster about the new member
and b) send us the list of cluster members.

Join() has no REST API support, but it is provided instead via
"ipfs-cluster-service --join". A "--leave" flag has been included too,
by which a peer triggers "PeerRemove()" on itself when shutting down.

This allows a node to join an existing cluster and leave it by the
end of its life.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
@hsanjuan hsanjuan added the status/in-progress In progress label Feb 3, 2017
hsanjuan added a commit that referenced this issue Feb 3, 2017
This reworks PeerAdd() and adds a Join() operation.

Join(multiaddress) uses the given bootstrap multiaddress to register
itself with it and fetch its list of cluster peers. Then it proceeds
to register itself with every peer and fetch their lists of cluster peers.
If a cluster peer is not known, then we register ourselves with it again
and so on.

PeerAdd() is now "simply" registering the new node and sending a
Join() request to it. The new node then will fetch the list
of peers and register itself with everyone like explained above.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
@hsanjuan hsanjuan added need/review Needs a review and removed need/review Needs a review labels Feb 3, 2017
@hsanjuan
Copy link
Collaborator Author

hsanjuan commented Feb 3, 2017

@20zinnm I know, my original solution would also work with 1 seeding node which everyone uses. The new one would theoretically work even when using several seeding nodes and everyone connects at the same time. For this the "newcomer checks-in with every node" way that you propose is mentally simpler to me than adapting my original approach.

I have a more or less working solution but I wanna give it another round.

hsanjuan added a commit that referenced this issue Feb 3, 2017
This reworks PeerAdd() and adds a Join() operation.

Join(multiaddress) uses the given bootstrap multiaddress to register
itself with it and fetch its list of cluster peers. Then it proceeds
to register itself with every peer and fetch their lists of cluster peers.
If a cluster peer is not known, then we register ourselves with it again
and so on.

PeerAdd() is now "simply" registering the new node and sending a
Join() request to it. The new node then will fetch the list
of peers and register itself with everyone like explained above.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
hsanjuan added a commit that referenced this issue Feb 3, 2017
This reworks PeerAdd() and adds a Join() operation.

Join(multiaddress) uses the given bootstrap multiaddress to register
itself with it and fetch its list of cluster peers. Then it proceeds
to register itself with every peer and fetch their lists of cluster peers.
If a cluster peer is not known, then we register ourselves with it again
and so on.

PeerAdd() is now "simply" registering the new node and sending a
Join() request to it. The new node then will fetch the list
of peers and register itself with everyone like explained above.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
hsanjuan added a commit that referenced this issue Feb 3, 2017
This reworks PeerAdd() and adds a Join() operation.

Join(multiaddress) uses the given bootstrap multiaddress to register
itself with it and fetch its list of cluster peers. Then it proceeds
to register itself with every peer and fetch their lists of cluster peers.
If a cluster peer is not known, then we register ourselves with it again
and so on.

PeerAdd() is now "simply" registering the new node and sending a
Join() request to it. The new node then will fetch the list
of peers and register itself with everyone like explained above.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
@hsanjuan hsanjuan removed the status/in-progress In progress label Feb 3, 2017
hsanjuan added a commit that referenced this issue Feb 3, 2017
This reworks PeerAdd() and adds a Join() operation.

Join(multiaddress) uses the given bootstrap multiaddress to register
itself with it and fetch its list of cluster peers. Then it proceeds
to register itself with every peer and fetch their lists of cluster peers.
If a cluster peer is not known, then we register ourselves with it again
and so on.

PeerAdd() is now "simply" registering the new node and sending a
Join() request to it. The new node then will fetch the list
of peers and register itself with everyone like explained above.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
hsanjuan added a commit that referenced this issue Feb 3, 2017
This reworks PeerAdd() and adds a Join() operation.

Join(multiaddress) uses the given bootstrap multiaddress to register
itself with it and fetch its list of cluster peers. Then it proceeds
to register itself with every peer and fetch their lists of cluster peers.
If a cluster peer is not known, then we register ourselves with it again
and so on.

PeerAdd() is now "simply" registering the new node and sending a
Join() request to it. The new node then will fetch the list
of peers and register itself with everyone like explained above.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
hsanjuan added a commit that referenced this issue Feb 3, 2017
This reworks PeerAdd() and adds a Join() operation.

Join(multiaddress) uses the given bootstrap multiaddress to register
itself with it and fetch its list of cluster peers. Then it proceeds
to register itself with every peer and fetch their lists of cluster peers.
If a cluster peer is not known, then we register ourselves with it again
and so on.

PeerAdd() is now "simply" registering the new node and sending a
Join() request to it. The new node then will fetch the list
of peers and register itself with everyone like explained above.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
@hsanjuan hsanjuan added status/in-progress In progress and removed need/review Needs a review labels Feb 6, 2017
hsanjuan added a commit that referenced this issue Feb 7, 2017
This is the third implementation attempt. This time, rather than
broadcasting PeerAdd/Join requests to the whole cluster, we use the
consensus log to broadcast new peers joining.

This makes it easier to recover from errors and to know who exactly
is member of a cluster and who is not. The consensus is, after all,
meant to agree on things, and the list of cluster peers is something
everyone has to agree on.

Raft itself uses a special log operation to maintain the peer set.

The tests are almost unchanged from the previous attempts so it should
be the same, except it doesn't seem possible to bootstrap a bunch of nodes
at the same time using different bootstrap nodes. It works when using
the same. I'm not sure this worked before either, but the code is
simpler than recursively contacting peers, and scales better for
larger clusters.

Nodes have to be careful about joining clusters while keeping the state
from a different cluster (disjoint logs). This may cause problems with
Raft.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
hsanjuan added a commit that referenced this issue Feb 7, 2017
This is the third implementation attempt. This time, rather than
broadcasting PeerAdd/Join requests to the whole cluster, we use the
consensus log to broadcast new peers joining.

This makes it easier to recover from errors and to know who exactly
is member of a cluster and who is not. The consensus is, after all,
meant to agree on things, and the list of cluster peers is something
everyone has to agree on.

Raft itself uses a special log operation to maintain the peer set.

The tests are almost unchanged from the previous attempts so it should
be the same, except it doesn't seem possible to bootstrap a bunch of nodes
at the same time using different bootstrap nodes. It works when using
the same. I'm not sure this worked before either, but the code is
simpler than recursively contacting peers, and scales better for
larger clusters.

Nodes have to be careful about joining clusters while keeping the state
from a different cluster (disjoint logs). This may cause problems with
Raft.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
hsanjuan added a commit that referenced this issue Feb 7, 2017
This is the third implementation attempt. This time, rather than
broadcasting PeerAdd/Join requests to the whole cluster, we use the
consensus log to broadcast new peers joining.

This makes it easier to recover from errors and to know who exactly
is member of a cluster and who is not. The consensus is, after all,
meant to agree on things, and the list of cluster peers is something
everyone has to agree on.

Raft itself uses a special log operation to maintain the peer set.

The tests are almost unchanged from the previous attempts so it should
be the same, except it doesn't seem possible to bootstrap a bunch of nodes
at the same time using different bootstrap nodes. It works when using
the same. I'm not sure this worked before either, but the code is
simpler than recursively contacting peers, and scales better for
larger clusters.

Nodes have to be careful about joining clusters while keeping the state
from a different cluster (disjoint logs). This may cause problems with
Raft.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
hsanjuan added a commit that referenced this issue Feb 7, 2017
This is the third implementation attempt. This time, rather than
broadcasting PeerAdd/Join requests to the whole cluster, we use the
consensus log to broadcast new peers joining.

This makes it easier to recover from errors and to know who exactly
is member of a cluster and who is not. The consensus is, after all,
meant to agree on things, and the list of cluster peers is something
everyone has to agree on.

Raft itself uses a special log operation to maintain the peer set.

The tests are almost unchanged from the previous attempts so it should
be the same, except it doesn't seem possible to bootstrap a bunch of nodes
at the same time using different bootstrap nodes. It works when using
the same. I'm not sure this worked before either, but the code is
simpler than recursively contacting peers, and scales better for
larger clusters.

Nodes have to be careful about joining clusters while keeping the state
from a different cluster (disjoint logs). This may cause problems with
Raft.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
hsanjuan added a commit that referenced this issue Feb 7, 2017
This is the third implementation attempt. This time, rather than
broadcasting PeerAdd/Join requests to the whole cluster, we use the
consensus log to broadcast new peers joining.

This makes it easier to recover from errors and to know who exactly
is member of a cluster and who is not. The consensus is, after all,
meant to agree on things, and the list of cluster peers is something
everyone has to agree on.

Raft itself uses a special log operation to maintain the peer set.

The tests are almost unchanged from the previous attempts so it should
be the same, except it doesn't seem possible to bootstrap a bunch of nodes
at the same time using different bootstrap nodes. It works when using
the same. I'm not sure this worked before either, but the code is
simpler than recursively contacting peers, and scales better for
larger clusters.

Nodes have to be careful about joining clusters while keeping the state
from a different cluster (disjoint logs). This may cause problems with
Raft.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
@hsanjuan hsanjuan added need/review Needs a review and removed status/in-progress In progress labels Feb 7, 2017
@hsanjuan
Copy link
Collaborator Author

hsanjuan commented Feb 7, 2017

@20zinnm went a third way in the end by commiting new members to the distributed log rather than broadcasting and recursively contacting peers. I have realized it is very easy to mess the consensus when someone does not cooperate perfectly or peers have disjoint logs, or different leadership terms etc. Despite locking sections of the code I wasn't very happy with the results and added complexity.

As of the latest approach, a cluster should be able to bootstrap from the same node (at the same time).

@meyerzinn
Copy link

@hsanjuan What do you mean by distributed log? What implementation is the log using that wouldn't suffer the same faults?

I think it's fine to have everyone join through the same node but it's not scalable in my opinion. My check-in method means you can bootstrap a thousand nodes by giving each a pointer to the one before it, without having to spam a single member for a list.

hsanjuan added a commit that referenced this issue Feb 8, 2017
Fix #24: Auto-join and auto-leave operations for Cluster
@hsanjuan hsanjuan removed the need/review Needs a review label Feb 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants