Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with leader election when peers go down #160

Closed
ccampbell opened this issue Oct 1, 2017 · 79 comments

Comments

Projects
None yet
6 participants
@ccampbell
Copy link

commented Oct 1, 2017

I have been playing around with ipfs-cluster to try to set up a dynamic cluster. I have read through the docs and what available information I can find on other tickets. What I am noticing is that the cluster only seems to be working and healthy if the setup happens like this (assume three nodes A, B, C - using leave_on_shutdown set to true).

  1. Start up ipfs service on node A with cluster secret and no specified peers (A becomes the leader)
  2. Start up ipfs service on node B with cluster secret and --bootstrap set to node A (B joins the cluster as a new peer)
  3. Start up ipfs service on node C with cluster secret and --bootstrap set to node A (C joins the cluster as a new peer)

So that is all fine and works great. The issue has to do with what happens when peers have issues. If I kill ipfs-cluster-service on node B it correctly is removed and then when it starts up (again bootstrapping to A) it comes back which is expected.

Now If I kill the ipfs-cluster-service on node A (the leader) that is where there is a problem. Since peer A has been elected the leader, when it goes down there is no longer a leader present. I would expect either peer B or C to become the leader in this case, but I could not find anything specific in the documentation to say what the expected behavior here is. They start logging about how there is no leader present, but neither one takes over as the leader. The second issue is that as soon as A comes back (bootstrapping to either B or C) it does not become the leader again according to all peers. Once the cluster gets into this state, the only way to get things back to normal is to kill the service on peers B and C and then restart them bootstrapping to A. You can understand why this is not an ideal situation for a peer to peer network.

Another issue which seems like it might be related is that I have noticed when I bootstrap a node D (running on my Mac - the other 3 are running on Linux), only nodes A and C pick up the new peer. I can manually add it using the peer add command on node B, and it will pick it up, but when it goes down it is only automatically removed from A and C, so B never picks up on the changes. I am not sure if that actually leads to any problems or not.

It seems like it could be some kind of race condition. I have not had a chance to dig through the code and since this is kind of a complicated issue to reproduce, I don’t have code to reproduce it right now, but I am hoping you can provide some guidance or idea about why this is happening, what might be causing it, and what I might be able to do to get around it. Not sure if it is related at all, but node A is running in Amsterdam, B in New Jersey, C in Tokyo.

Also I am curious what the expected behavior is in this situation since I couldn’t find it exactly outlined in the docs. I am happy to help debug this further. Thanks so much.

@hsanjuan

This comment has been minimized.

Copy link
Collaborator

commented Oct 2, 2017

Hi!

I am going to assume that you are soft-killing your nodes, so they have time to perform the leave_on_shutdown procedures are remove themselves from the cluster.

In an A, B, C cluster, removing the leader (A) from it will leave a 2-peer cluster and trigger an election. I think might be then dead-locked as none of the candidates nodes (B,C) obtain a majority of votes to proclaim themselves leaders. I am not sure if you plan to remove peers more or less permanently, but if it is just simulating an unexpected downtime, or a reboot, you would rather set leave_on_shutdown to false: this make sure that the cluster size remains at 3, and saves problems from updating the cluster's topology (https://github.com/ipfs/ipfs-cluster/blob/master/docs/ipfs-cluster-guide.md#dynamic-cluster-membership-considerations)

Regarding adding a D node, make sure you are in a healthy cluster first (I am not sure it was the case). If B did not pick it up, it would mean that B is virtually split from the A-C cluster.

Last of all, there is a real possibility that Raft latency/timeout options are not tuned for multi-region/high latency operation. As pointed out in #157, this is something we aim to address in the short term, as it is clearly a pain point for some users.

@ccampbell

This comment has been minimized.

Copy link
Author

commented Oct 4, 2017

Thanks for your reply. I am testing using SIGTERM for the killing. I am going to play around with trying leave_on_shutdown set to false for the leader to see if that makes a difference.

I am noticing very strange behavior, and I can’t figure out a pattern. Sometimes just the process of bootstrapping a new node to the cluster (node D) will cause some of the existing nodes to lose track of the leader and start logging

error broadcasting metric: leader unknown or not existing yet

After that pinning to the cluster will fail which is expected if it doesn’t think it has enough peers. I have noticed that sometimes when that happens, killing the service on the node that had just joined and then bootstrapping it again can trigger a re-election. Also killing the service on the node that is logging errors and starting that one again can trigger it as well.

I am definitely still seeing where when a specific node is removed some other peers will remove it and some will not, but still fail to connect to it and peers ls will still list it with:

Qmdr4GTYHCa7BGWE4jtVUpqwN6MKWZWiHZj618xDDYo6Ys | ERROR: dial backoff

I am just trying to simulate the condition of nodes being added and removed. The use case here will eventually be a peer to peer network where anyone can add their own peer to offer up storage, so it has to be very fault tolerant, and it worries me that sometimes just the process of bootstrapping a new node to the cluster can cause the whole cluster to end up in an unhealthy state in my testing.

@hsanjuan

This comment has been minimized.

Copy link
Collaborator

commented Oct 5, 2017

Would be super useful if you pin down a way to reproduce it reliably. Alternatively, you are saying that bootstrapping a 4th node to an otherwise working cluster fails [sometimes]?

@hsanjuan

This comment has been minimized.

Copy link
Collaborator

commented Oct 5, 2017

Also @ccampbell , maybe you can send ipfs-cluster-ctl peers ls for each A,B,C,D peer?

@hsanjuan hsanjuan added this to the User feedback and bugs [Q4O3] milestone Oct 10, 2017

@hsanjuan

This comment has been minimized.

Copy link
Collaborator

commented Oct 25, 2017

@ccampbell, just a quick word that I'm working on #131 and I will attempt to reproduce your problem with latest 0.2.0 and with the new raft this week. Hopefully I can get a clearer picture of what's failing.

@hsanjuan hsanjuan self-assigned this Oct 25, 2017

@hsanjuan

This comment has been minimized.

Copy link
Collaborator

commented Oct 27, 2017

@ccampbell I have been doing some testing with the last version of master today.

I have discovered and fixed few issues with saving and reading configuration.

I can have 3 -A,B,C- nodes with leave_on_shutdown set to true. Killing the leader causes it to leave cleanly. Bootstrapping the node to the others works fine too.

I think you might have been affected by a misconfiguration. We made recently made a change that saves all existing cluster peers as bootstrap peers when leaving the cluster and empties the cluster_peers. Otherwise, re-starting the peer would ignore bootstrap and try to use the former list of cluster peers without letting everyone know that it wishes to re-join.

This was non-obvious behaviour. A peer that has left the cluster should only rejoin with bootstrap. It should not be started on its own (as a single peer cluster) as it's state should not be allowed to diverge. If that is the case, it will be necessary to fully remove the ~/.ipfs-cluster/ipfs-cluster-data folder before so it can simply download the full state from other peers.

I will be publishing version 0.2.2 soon (0.2.0 and 0.2.1` have reading/writing configuration bugs). In the meantime, I think using master should probably solve the issues you were having.

Also, since your nodes are very far away, make sure you tune your consensus.raft options in the configuration and increase the timeouts.

@hsanjuan hsanjuan removed the in progress label Oct 27, 2017

@ccampbell

This comment has been minimized.

Copy link
Author

commented Oct 31, 2017

Thanks so much! I will test it out again using master and let you know what I find

@ccampbell

This comment has been minimized.

Copy link
Author

commented Nov 2, 2017

Okay now I am seeing some new issues. For one when I start up a single node A it fails to elect itself as the leader and immediately shuts down after some period of time. The output is:

16:01:13.454  INFO    cluster: IPFS Cluster v0.2.1 listening on: cluster.go:90
16:01:13.455  INFO    cluster:         /ip4/127.0.0.1/tcp/9096/ipfs/QmVDrbRYZckmBaNYexBw8Kwkhz9fPKGgccN5AARkPSKmrz cluster.go:92
16:01:13.455  INFO    cluster:         /ip4/127.0.0.2/tcp/9096/ipfs/QmVDrbRYZckmBaNYexBw8Kwkhz9fPKGgccN5AARkPSKmrz cluster.go:92
16:01:13.455  INFO    cluster:         /ip4/80.209.230.17/tcp/9096/ipfs/QmVDrbRYZckmBaNYexBw8Kwkhz9fPKGgccN5AARkPSKmrz cluster.go:92
16:01:13.456  INFO  consensus: starting Consensus and waiting for a leader... consensus.go:68
16:01:13.459  INFO    restapi: REST API: /ip4/127.0.0.1/tcp/9094 restapi.go:266
16:01:13.459  INFO   ipfshttp: IPFS Proxy: /ip4/127.0.0.1/tcp/9095 -> /ip4/127.0.0.1/tcp/5001 ipfshttp.go:158
16:01:43.459 ERROR    cluster: consensus start timed out cluster.go:361
16:01:43.460  INFO    cluster: shutting down IPFS Cluster cluster.go:415
16:01:43.460 WARNI    cluster: Attempting to leave Cluster. This may take some seconds cluster.go:419
16:01:43.490  INFO  consensus: Raft Leader elected: QmVDrbRYZckmBaNYexBw8Kwkhz9fPKGgccN5AARkPSKmrz raft.go:132
16:01:43.492  INFO    cluster: removing Cluster peer QmVDrbRYZckmBaNYexBw8Kwkhz9fPKGgccN5AARkPSKmrz peer_manager.go:55
16:01:43.494  INFO  consensus: peer removed from global state: <peer.ID VDrbRY> consensus.go:385
16:01:45.494  INFO  consensus: stopping Consensus component consensus.go:161
16:01:45.496  INFO     config: Saving configuration config.go:289
16:01:45.496  INFO    monitor: stopping Monitor peer_monitor.go:161
16:01:45.497  INFO    restapi: stopping Cluster API restapi.go:284
16:01:45.497  INFO   ipfshttp: stopping IPFS Proxy ipfshttp.go:532
16:01:45.497  INFO pintracker: stopping MapPinTracker maptracker.go:116

And the process kills itself.

I have increased the raft heartbeat timeout and election timeout to 10s.

When I start up another node bootstrapping to the first while the first is booting up, the A node returns:

16:04:48.614  INFO    cluster: new Cluster peer /ip4/80.209.230.16/tcp/9096/ipfs/QmYVdQTKmbyz1nU4bR9wCzfqLfmYpnV7zjyPdnno5c8QZF peer_manager.go:41
16:04:59.938  INFO  consensus: Raft Leader elected: QmVDrbRYZckmBaNYexBw8Kwkhz9fPKGgccN5AARkPSKmrz raft.go:132
16:05:14.184 ERROR    cluster: consensus start timed out cluster.go:361
16:05:14.184  INFO    cluster: shutting down IPFS Cluster cluster.go:415
16:05:14.184 WARNI    cluster: Attempting to leave Cluster. This may take some seconds cluster.go:419
16:05:14.186  INFO  consensus: Raft Leader elected: QmVDrbRYZckmBaNYexBw8Kwkhz9fPKGgccN5AARkPSKmrz raft.go:132
16:05:14.186  INFO  consensus: Raft Leader elected: QmVDrbRYZckmBaNYexBw8Kwkhz9fPKGgccN5AARkPSKmrz raft.go:132
16:05:14.187  INFO    cluster: removing Cluster peer QmVDrbRYZckmBaNYexBw8Kwkhz9fPKGgccN5AARkPSKmrz peer_manager.go:55
16:05:14.189 ERROR       raft: Failed to AppendEntries to QmYVdQTKmbyz1nU4bR9wCzfqLfmYpnV7zjyPdnno5c8QZF: operation not supported with current protocol version logging.go:28
16:05:14.201 ERROR       raft: Failed to AppendEntries to QmYVdQTKmbyz1nU4bR9wCzfqLfmYpnV7zjyPdnno5c8QZF: operation not supported with current protocol version logging.go:28
16:05:14.214 ERROR       raft: Failed to AppendEntries to QmYVdQTKmbyz1nU4bR9wCzfqLfmYpnV7zjyPdnno5c8QZF: operation not supported with current protocol version logging.go:28
16:05:14.308 ERROR       raft: Failed to AppendEntries to QmYVdQTKmbyz1nU4bR9wCzfqLfmYpnV7zjyPdnno5c8QZF: operation not supported with current protocol version logging.go:28
16:05:14.447 ERROR       raft: Failed to AppendEntries to QmYVdQTKmbyz1nU4bR9wCzfqLfmYpnV7zjyPdnno5c8QZF: operation not supported with current protocol version logging.go:28
16:05:14.595 ERROR       raft: Failed to AppendEntries to QmYVdQTKmbyz1nU4bR9wCzfqLfmYpnV7zjyPdnno5c8QZF: operation not supported with current protocol version logging.go:28
16:05:14.696 ERROR  consensus: raft cannot remove peer: leadership lost while committing log raft.go:250
16:05:14.697 ERROR  consensus: raft cannot add peer: leadership lost while committing log raft.go:233
16:05:14.830 ERROR       raft: Failed to AppendEntries to QmYVdQTKmbyz1nU4bR9wCzfqLfmYpnV7zjyPdnno5c8QZF: operation not supported with current protocol version logging.go:28
16:05:29.697 ERROR  p2p-gorpc: context deadline exceeded client.go:125
16:05:29.697 ERROR  p2p-gorpc: context deadline exceeded client.go:125
16:05:29.897 ERROR  p2p-gorpc: context deadline exceeded client.go:125
16:05:32.147 ERROR       raft: Failed to make RequestVote RPC to QmYVdQTKmbyz1nU4bR9wCzfqLfmYpnV7zjyPdnno5c8QZF: operation not supported with current protocol version logging.go:28
16:05:44.698 ERROR    cluster: context deadline exceeded cluster.go:547
16:05:44.698  INFO    cluster: removing Cluster peer QmYVdQTKmbyz1nU4bR9wCzfqLfmYpnV7zjyPdnno5c8QZF peer_manager.go:55
16:05:44.898 ERROR    cluster: leaving cluster: context deadline exceeded cluster.go:422
16:05:44.898  INFO  consensus: stopping Consensus component consensus.go:161
16:05:44.899  INFO    monitor: stopping Monitor peer_monitor.go:161
16:05:44.899  INFO    restapi: stopping Cluster API restapi.go:284
16:05:44.899  INFO   ipfshttp: stopping IPFS Proxy ipfshttp.go:532
16:05:44.900  INFO pintracker: stopping MapPinTracker maptracker.go:116
16:05:44.900  INFO     config: Saving configuration config.go:289

The B node returns

16:04:48.502  INFO    cluster: IPFS Cluster v0.2.1 listening on: cluster.go:92
16:04:48.502  INFO    cluster:         /ip4/127.0.0.1/tcp/9096/ipfs/QmYVdQTKmbyz1nU4bR9wCzfqLfmYpnV7zjyPdnno5c8QZF cluster.go:94
16:04:48.502  INFO    cluster:         /ip4/127.0.0.2/tcp/9096/ipfs/QmYVdQTKmbyz1nU4bR9wCzfqLfmYpnV7zjyPdnno5c8QZF cluster.go:94
16:04:48.502  INFO    cluster:         /ip4/80.209.230.16/tcp/9096/ipfs/QmYVdQTKmbyz1nU4bR9wCzfqLfmYpnV7zjyPdnno5c8QZF cluster.go:94
16:04:48.503  INFO  consensus: starting Consensus and waiting for a leader... consensus.go:60
16:04:48.504  INFO  consensus: raft cluster is already bootstrapped raft.go:116
16:04:48.505  INFO   ipfshttp: IPFS Proxy: /ip4/127.0.0.1/tcp/9095 -> /ip4/127.0.0.1/tcp/5001 ipfshttp.go:158
16:04:48.505  INFO    restapi: REST API: /ip4/127.0.0.1/tcp/9094 restapi.go:266
16:04:48.505  INFO    cluster: Bootstrapping to /ip4/80.209.230.17/tcp/9096/ipfs/QmVDrbRYZckmBaNYexBw8Kwkhz9fPKGgccN5AARkPSKmrz cluster.go:404
16:05:44.702 ERROR    cluster: context deadline exceeded cluster.go:671
16:05:44.702 ERROR    cluster: context deadline exceeded cluster.go:409
16:05:44.702 ERROR    cluster: Bootstrap unsuccessful cluster.go:132
16:05:44.702  INFO    cluster: shutting down Cluster cluster.go:430
16:05:44.702  INFO  consensus: stopping Consensus component consensus.go:158
16:05:44.702  INFO    monitor: stopping Monitor peer_monitor.go:161
16:05:44.702  INFO    restapi: stopping Cluster API restapi.go:284
16:05:44.702  INFO   ipfshttp: stopping IPFS Proxy ipfshttp.go:532
16:05:44.702  INFO pintracker: stopping MapPinTracker maptracker.go:116
error starting cluster: bootstrap unsuccessful

This is using latest master

@ccampbell

This comment has been minimized.

Copy link
Author

commented Nov 2, 2017

One thing… I believe node A is using the code from a couple days ago, and node B is using the latest. I am going to rebuild node A with latest master so they are both using the same code…

@ccampbell

This comment has been minimized.

Copy link
Author

commented Nov 2, 2017

Okay now it fails immediately on node A with:

16:31:04.481 ERROR  consensus: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! raft.go:162
16:31:04.481 ERROR  consensus: error creating raft: cluster peers do not match raft peers consensus.go:64
16:31:04.481 ERROR    cluster: error creating consensus: cluster peers do not match raft peers cluster.go:184
16:31:04.481  INFO    cluster: shutting down Cluster cluster.go:430
16:31:04.481  INFO    monitor: stopping Monitor peer_monitor.go:161
16:31:04.481  INFO    restapi: stopping Cluster API restapi.go:284
16:31:04.481  INFO    restapi: REST API: /ip4/127.0.0.1/tcp/9094 restapi.go:266
16:31:04.481  INFO   ipfshttp: stopping IPFS Proxy ipfshttp.go:532
16:31:04.481  INFO pintracker: stopping MapPinTracker maptracker.go:116
error starting cluster: cluster peers do not match raft peers

I see the raft code has been updated. Is there something special that has to be done in order to start a single node cluster? I would expect it to be able to start up on its own and then other nodes can bootstrap to it. I have peers and bootstrap both set to empty arrays in the config

@hsanjuan

This comment has been minimized.

Copy link
Collaborator

commented Nov 2, 2017

hey @ccampbell please update both nodes to latest master and remove ~/.ipfs-cluster/ipfs-cluster-data. With the latest changes it's better if you start fresh.

@ccampbell

This comment has been minimized.

Copy link
Author

commented Nov 2, 2017

Okay I did all of that. I am still having issues. Node A starts up fine, elects itself as the leader with no other peers.

16:50:35.018  INFO    cluster: IPFS Cluster v0.2.1 listening on: cluster.go:92
16:50:35.018  INFO    cluster:         /ip4/127.0.0.1/tcp/9096/ipfs/QmZyixbZdEn4qHj5JBjYP5wnid5ir1yWSoYTsVHFRyC3LM cluster.go:94
16:50:35.018  INFO    cluster:         /ip4/127.0.0.2/tcp/9096/ipfs/QmZyixbZdEn4qHj5JBjYP5wnid5ir1yWSoYTsVHFRyC3LM cluster.go:94
16:50:35.018  INFO    cluster:         /ip4/80.209.230.17/tcp/9096/ipfs/QmZyixbZdEn4qHj5JBjYP5wnid5ir1yWSoYTsVHFRyC3LM cluster.go:94
16:50:35.019  INFO  consensus: starting Consensus and waiting for a leader... consensus.go:60
16:50:35.020  INFO  consensus: raft cluster is already bootstrapped raft.go:116
16:50:35.021  INFO   ipfshttp: IPFS Proxy: /ip4/127.0.0.1/tcp/9095 -> /ip4/127.0.0.1/tcp/5001 ipfshttp.go:158
16:50:35.021  INFO    restapi: REST API: /ip4/127.0.0.1/tcp/9094 restapi.go:266
16:50:37.021  INFO  consensus: Raft Leader elected: QmZyixbZdEn4qHj5JBjYP5wnid5ir1yWSoYTsVHFRyC3LM raft.go:261
16:50:37.021  INFO  consensus: Raft state is catching up raft.go:273
16:50:37.021  INFO  consensus: Consensus state is up to date consensus.go:115
16:50:37.021  INFO    cluster: Cluster Peers (not including ourselves): cluster.go:384
16:50:37.021  INFO    cluster:     - No other peers cluster.go:387
16:50:37.021  INFO    cluster: IPFS Cluster is ready cluster.go:394

But when I try to bootstrap Node B to Node A I get this on Node B:

16:52:23.922 ERROR    cluster: dial attempt failed: misdial to <peer.ID VDrbRY> through /ip4/80.209.230.17/tcp/9096 (got <peer.ID ZyixbZ>): read tcp 80.209.230.16:9096->80.209.230.17:9096: use of closed network connection cluster.go:671
16:52:23.922 ERROR    cluster: dial attempt failed: misdial to <peer.ID VDrbRY> through /ip4/80.209.230.17/tcp/9096 (got <peer.ID ZyixbZ>): read tcp 80.209.230.16:9096->80.209.230.17:9096: use of closed network connection cluster.go:409
16:52:23.922 ERROR    cluster: Bootstrap unsuccessful cluster.go:132
16:52:23.922  INFO    cluster: shutting down Cluster cluster.go:430
16:52:23.922  INFO  consensus: stopping Consensus component consensus.go:158
16:52:23.922  INFO    monitor: stopping Monitor peer_monitor.go:161
16:52:23.922  INFO    restapi: stopping Cluster API restapi.go:284
16:52:23.922  INFO   ipfshttp: stopping IPFS Proxy ipfshttp.go:532
16:52:23.922  INFO pintracker: stopping MapPinTracker maptracker.go:116

I am happy to give you the cluster secret if you want to try to join

@hsanjuan

This comment has been minimized.

Copy link
Collaborator

commented Nov 2, 2017

@ccampbell can you check you are bootstrapping to the right multiaddress ? From that message, it seems the peer ID in it is wrong

@ccampbell

This comment has been minimized.

Copy link
Author

commented Nov 2, 2017

Ugh sorry! I forgot I deleted the ~/.ipfs-cluster directory and initialized again. Trying again

@ccampbell

This comment has been minimized.

Copy link
Author

commented Nov 2, 2017

Okay it all seems to be working now! I will do some more testing and let you know if I discover anything. 🤞

I tried AB cluster and killed each node one at a time and brought them back up and they successfully rejoined

@ccampbell

This comment has been minimized.

Copy link
Author

commented Nov 7, 2017

Okay now I am seeing a different issue. This one may be a problem between the chair and keyboard, but I am seeing that after adding a file and pinning it to the cluster, it is hanging and eventually returning a 504 gateway timeout when requesting it from any of the nodes in the cluster that are not the node I added.

In this scenario I have my original cluster A, B, C (One of them is the leader). I joined the cluster with a new node D then ran the following from node D:

ipfs add file.txt
ipfs-cluster-ctl pin add [hash of file]

ipfs-cluster logs

  • Node A shows nothing

  • Node B shows:

    pin committed to global state: [hash]
    
  • Node C shows nothing

  • Node D shows:

    IPFS cluster pinning [hash] on [<peer.ID WdLkGA> <peer.ID ZyixbZ>]: cluster.go:854
    

I was under the impression that you could request pinned content from any node in the cluster and they should be able to return it and cache it locally? Previously when I tested this, it seemed to work fine. Am I doing something wrong?

To clarify, Node D returns the content correctly, but A, B, and C all hang

Thanks

@hsanjuan

This comment has been minimized.

Copy link
Collaborator

commented Nov 7, 2017

Hmm, it seems that steps on the ipfs side of things. Do your ipfs daemons have regular connectivity? Are they connected directly to the deamons of other cluster peers (ipfs swarm peers should show the multiaddresses of the others...) ? cluster is not involved in fetching content from ipfs, it just tells ipfs to pin. If A,B,C hang, it looks like those nodes cannot find providers for the file you just added. Sometimes, just restarting the ipfs daemon does the trick.

@ccampbell

This comment has been minimized.

Copy link
Author

commented Nov 7, 2017

Hmm they definitely have regular connectivity. I may have messed up the config. I am trying some things, but now when I run ipfs swarm peers the cluster service immediately crashes after logging:

13:11:18.915 ERROR    cluster: consensus start timed out cluster.go:375
13:11:18.915  INFO    cluster: shutting down Cluster cluster.go:430
13:11:18.915  INFO  consensus: stopping Consensus component consensus.go:158
13:11:18.915  INFO    monitor: stopping Monitor peer_monitor.go:161
13:11:18.915  INFO    restapi: stopping Cluster API restapi.go:284
13:11:18.915  INFO   ipfshttp: stopping IPFS Proxy ipfshttp.go:532
13:11:18.915  INFO pintracker: stopping MapPinTracker maptracker.go:116
13:11:18.915 ERROR    cluster: connection reset cluster.go:605

This is a side note, but is there a way to make ipfs nodes only discoverable to nodes within the same cluster but not to the public internet?


Also the ipfs daemon is logging this when attempting to fetch the file

ERROR core/serve: ipfs resolve -r /ipfs/[hash]: Failed to get block for [hash]: context canceled gateway_handler.go:621

Which happens to be the file that I tried to pin earlier to the cluster

@ccampbell

This comment has been minimized.

Copy link
Author

commented Nov 7, 2017

ipfs swarm peers is no longer crashing after restarting everything, but it is still returning nothing. All the peers are there when I run ipfs-cluster-ctl peers ls from each node

@hsanjuan

This comment has been minimized.

Copy link
Collaborator

commented Nov 8, 2017

hey @ccampbell , ipfs swarm peers and cluster have no relation. The log messages you posted seem from a cluster peer which failed to start properly (maybe because other peers weren't ready).

if you add something to ipfs and other ipfs nodes cannot see it, you can debug it with cluster turned off as it plays no part.

Did you remove the bootstrap nodes from the ipfs configuration? if ipfs swarm peers comes back empty, it means that ipfs does not see any other nodes, therefore DHT lookups will fail unless the content is pinned locally.

This is a side note, but is there a way to make ipfs nodes only discoverable to nodes within the same cluster but not to the public internet?

See https://github.com/ipfs/go-ipfs/blob/master/docs/experimental-features.md#private-networks

@ccampbell

This comment has been minimized.

Copy link
Author

commented Nov 8, 2017

Thanks. I did in fact remove the bootstrap nodes from the ipfs configuration. For some reason, I expected that by joining a cluster using bootstrap, that would also communicate to the ipfs daemon to connect to the other nodes from the cluster. Would that behavior be possible?

I suppose it would be possible to set up something similar using the private network key you just linked to, but it would be cool if this was supported out of the box.

Anyway I will try some more things, and let you know if I run into anything else. Sorry for all the back and forth!

@hsanjuan

This comment has been minimized.

Copy link
Collaborator

commented Apr 23, 2018

So just to summarize. When I kill -HUP on this node and the other nodes remove it from the cluster (leave_on_shutdown is set to true), the ipfs-cluster-data directory (which has a snapshot) gets renamed to ipfs-cluster-data.old.0 and an upgrade is not needed to start up using the new version?

Yes, correct!

@ccampbell

This comment has been minimized.

Copy link
Author

commented Apr 23, 2018

Okay great. I will give it a go. My plan is something like this:

  • current cluster: peer A - peer B - peer C (all running 0.3.4)
  • remove peer C from cluster -> upgrade to 0.3.5 -> start peer C on its own
  • remove peer B from cluster -> upgrade to 0.3.5 -> bootstrap to peer C
  • stop peer A -> upgrade to 0.3.5 -> bootstrap to peer C

Does that sound right? And all the pinned content should remain?

@hsanjuan

This comment has been minimized.

Copy link
Collaborator

commented Apr 23, 2018

NO

Do this:

  • Remove peer C -> upgrade to 0.3.5
  • Remove peer B -> upgrade to 0.3.5
  • Stop peer A -> upgrade to 0.3.5 -> state upgrade -> start
  • Bootstrap peer C to peer A
  • Bootstrap peer B to peer A

If you want to keep the state, you should not bootstrap to a peer which has cleaned it after being removed.

@ccampbell

This comment has been minimized.

Copy link
Author

commented Apr 23, 2018

Ahh so the final node needs the state upgrade, but that means that for some period of time the entire cluster has to be down? Is there any way to maintain 100% uptime during an upgrade?

I suppose at the least, I can keep IPFS daemon running which means delivery can continue to work

@ccampbell

This comment has been minimized.

Copy link
Author

commented Apr 23, 2018

Ideally there would be a way to remove one node, do a state upgrade on that and then bootstrap all the other nodes one at a time to that one. If the network has 100 nodes, having to remove 99 just to get to the final one to update the state and then bootstrap again seems a bit cumbersome unless I am missing something

Also in a peer to peer network, there is not a guarantee that all of the nodes will be in my control in which case this may not even be possible.

@hsanjuan

This comment has been minimized.

Copy link
Collaborator

commented Apr 23, 2018

Is there any way to maintain 100% uptime during an upgrade?

Not right now. But not having remove_on_shutdown would allow you to restart all your peers with --upgrade at the same time and the process is really fast. Your expected approach, if it worked and even if you always keep some peer running, requires more leader-elections so in terms of real availability it might not be the best either.

@ccampbell

This comment has been minimized.

Copy link
Author

commented Apr 23, 2018

Hmm. Well I believe the issue I was seeing with that was that the peer list does not get updated. That means as soon as a node is removed with remove_on_shutdown: false, any other node that tries to bootstrap into the cluster will fail since the cluster itself will be in an “unhealthy” state. So I would have to manually go in and remove the nodes that are timing out with ipfs-cluster-ctl peers rm [hash] on the bootstrap node.

Why would it require more leader elections to remove one node and then slowly bootstrap the old nodes to the new one one at a time? It seems like the suggestion now would have to swap out leaders each time a node is removed until the cluster is down to the final node at which point the state can be upgraded and the other nodes can bootstrap back in.

Setting remove_on_shutdown: false sounds more appealing in terms of the upgrade process, but is there a way to do that while also allowing nodes to join and making sure the cluster state is up to date and healthy?

@hsanjuan

This comment has been minimized.

Copy link
Collaborator

commented Apr 23, 2018

Setting remove_on_shutdown: false sounds more appealing in terms of the upgrade process, but is there a way to do that while also allowing nodes to join and making sure the cluster state is up to date and healthy?

Cluster will be healthy as long as a majority of peers is still up. After all, remove_on_shutdown only has effect on clean shutdowns of the peer. So in the case of peers dying suddenly, it does not help in keeping the cluster healthy.

I do agree things are not optimal here. Next round of iteration over raft should improve this (3rd point here: #384). The underlying problem is that Raft imposes many constraints on how things should work in terms of peerset modifications in order to keep the consensus working.

@ccampbell

This comment has been minimized.

Copy link
Author

commented Apr 23, 2018

Mmm. Well glad it is on the radar. Thanks for all the fast responses. I will let you know if I run into any other hiccups.

@hsanjuan hsanjuan removed this from the User feedback and bugs milestone Jun 26, 2018

@NatoBoram

This comment has been minimized.

Copy link

commented Jan 20, 2019

Adding a ipfs-cluster-service shutdown command would allow remove_on_shutdown to be performed if the cluster is running in the background

@hsanjuan

This comment has been minimized.

Copy link
Collaborator

commented Jan 21, 2019

@NatoBoram this would be roughly the equivalent of ipfs-cluster-ctl peers rm <peer-id>. Peers shutdown when removed. And you can just down/remove any peer, not just self.

@NatoBoram

This comment has been minimized.

Copy link

commented Jan 22, 2019

If I understand correctly, the correct way to cleanly shutdown the service when it's running as a service is to have it remove itself from the cluster?

@hsanjuan

This comment has been minimized.

Copy link
Collaborator

commented Jan 22, 2019

You can shut-down any peer at any time. But depending on why you shut it down and on the state of your cluster, you may want to remove it from the consensus peerset so that the rest of peers don't assume that the peer is going through an abnormal situation and keep trying to contact it. remove_on_shutdown does that automatically. Also, removing a peer with peers rm will cause it to shutdown afterwards.

But in all cases, ctrl-c'ing or stopping the service is a clean shutdown from the local point of view.

@hsanjuan

This comment has been minimized.

Copy link
Collaborator

commented Apr 25, 2019

Since we merged #685 , I'm closing this as fixed. With --consensus crdt peers can now join and leave without impacting other peers.

@hsanjuan hsanjuan closed this Apr 25, 2019

@NatoBoram

This comment has been minimized.

Copy link

commented Apr 27, 2019

Where can we use --consensus crdt?

@kishansagathiya

This comment has been minimized.

Copy link
Member

commented Apr 28, 2019

ipfs-cluster-service daemon --consensus crdt

@NatoBoram

This comment has been minimized.

Copy link

commented Apr 28, 2019

$ ipfs-cluster-service daemon --consensus crdt

Incorrect Usage: flag provided but not defined: -consensus

NAME:
   ipfs-cluster-service daemon - run the IPFS Cluster peer (default)

USAGE:
   ipfs-cluster-service daemon [command options] [arguments...]

OPTIONS:
   --upgrade, -u                run necessary state migrations before starting cluster service
   --bootstrap value, -j value  join a cluster providing an existing peers multiaddress(es)
   --alloc value, -a value      allocation strategy to use [disk-freespace,disk-reposize,numpin]. (default: "disk-freespace")
   --stats                      enable stats collection
   --tracing                    enable tracing collection

Version 0.10.1, built from master. What branch should I use?


Oh, I got the build method wrong.

go get -u -v github.com/ipfs/ipfs-cluster
cd ~/go/src/github.com/ipfs/ipfs-cluster
GO111MODULE=on make install

Now that works. Took a few hours to find, but I'm glad I found how to build it, because this new consensus looks much more stable than the raft at first sight.

@hsanjuan

This comment has been minimized.

Copy link
Collaborator

commented Apr 28, 2019

The official way is documented here: https://cluster.ipfs.io/download/#installing-from-source

@NatoBoram

This comment has been minimized.

Copy link

commented Apr 28, 2019

I couldn't build it using the old method, and that's where I got the idea of building it like that. I followed those instructions but ended up with the wrong version, 0.10.1, so I had to search again for how to build it.

@hsanjuan

This comment has been minimized.

Copy link
Collaborator

commented Apr 28, 2019

actually you're right, it doesn't set the commit information if you build directly, but you should get a master build if you're on master. version will say 0.10.1 because that's the last thing we released.

@NatoBoram

This comment has been minimized.

Copy link

commented Apr 28, 2019

But when I ran ipfs-cluster-service daemon --consensus crdt, I got #160 (comment), so I'm pretty sure I didn't get the right version. I'm not sure how that happened though, since I was building in the master branch.

@eiTanLaVi

This comment has been minimized.

Copy link

commented Apr 30, 2019

Is there a simple way to check which peer is the current leader?

@lanzafame

This comment has been minimized.

Copy link
Collaborator

commented Apr 30, 2019

@eiTanLaVi no. But for what purpose?

@eiTanLaVi

This comment has been minimized.

Copy link

commented Apr 30, 2019

to checksum leader election process has settled, part of cluster health checks etc..

@eiTanLaVi

This comment has been minimized.

Copy link

commented Apr 30, 2019

Q: if we use the --upgrade flag, on v0.10.1, will the --consensus crdt flag take affect when restarting the daemon (systemd) ? what is the safe way to upgrade to this flag on a running cluster?

@hsanjuan

This comment has been minimized.

Copy link
Collaborator

commented Apr 30, 2019

Answered in #685 . Please don't double post

@hsanjuan

This comment has been minimized.

Copy link
Collaborator

commented Apr 30, 2019

to checksum leader election process has settled, part of cluster health checks etc..

it sometimes will appear on the logs, if an operation is waiting for a leader. https://github.com/ipfs/ipfs-cluster/blob/master/consensus/raft/raft.go#L280

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.