Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

My thoughts on IPFS-Cluster after using it for lineageos-on-ipfs.com #783

Closed
NatoBoram opened this issue May 20, 2019 · 4 comments
Closed
Labels
topic/meta Topic meta topic/user-story User story

Comments

@NatoBoram
Copy link

NatoBoram commented May 20, 2019

I used IPFS-Cluster to host LineageOS for a few months. I've learned many things about IPFS and IPFS-Cluster and I've seen many patches resolving problems I was currently having. However, it's time for me to say goodbye to lineageos-on-ipfs.com. This is the full story of my project, including what I wanted from IPFS-Cluster at the beginning.

LineageOS

LineageOS has a limited storage space to store their builds. Every 4 weeks, they delete old builds. The problem with that is that sometimes, devices aren't receiving new builds because of a lack of maintainer, so they become unsupported and eventually vanish. However, using LineageOS, even if its support was dropped, is still advantageous and more secure than using the latest stock firmware for most devices that are unsupported by their manufacturer.

With that in mind, I've set up to store the latest build of every device on IPFS, so that old but useful builds will stop vanishing.

Using IPFS to store LineageOS' builds seemed like a good idea, and it still seems like a good idea from my perspective. However, the amount of effort it requires warrants a programmer working full-time on this alone. Hence, I quickly ran into problems.

Setup

Here's the material that I connected :

  • Droplet on Digital Ocean, named droplet
  • Laptop running Ubuntu Server, named hp
  • Desktop running Ubuntu, named helion
  • Laptop running Ubuntu, named asus

Droplet was 100% online, but had very limited storage space. 50 GiB, to be precise. Thus, whenever the cluster died, the node would full itself and choke to death. It was running the web interface lineageos-on-ipfs.com, which was responsible for displaying the builds to the public, and LOSGoI, which was responsible for fetching the builds from LineageOS' website and putting them on IPFS.

HP was running most of the times. However, my router would reboot every night at 4 AM. This brought down my home network and decimated the cluster every single time.

Helion was mostly running, but I was dual-booting Windows and I was actively using that device.

Asus was pretty much all the times offline, and sometimes came up online.

Raft

At first, raft was the only consensus available on IPFS-Cluster. When using raft, 50%+1 of nodes had to be online at all time for the cluster to be alive. However, achieving 100% uptime is nearly impossible using consumer material. After many days of trial, I realized that the project was doomed to fail, but I was still determined to make it work.

At first, I had leave_on_shutdown turned off. However, this caused the cluster to be irrecoverable pretty much every night. Nodes would be down, and bootstrapping them would fail because Droplet still thought the nodes were part of the cluster, but they had been removed when they came up online with the --bootstrap option. Moreover, because less than 50%+1 nodes were online, no other node could join the cluster.

Then I tried to enable leave_on_shutdown. This option was pretty much useless since they would leave when the network came down, without warning Droplet, which resulted in the same errors as before.

Overall, the raft consensus was too strict to be used at all. Nodes weren't able to connect after the first time they left the cluster, so I had to clean the state pretty much every day, which is quite a bad idea when you want to store critical data.

CRDT

Then, the crdt consensus came to be. It was a better consensus because my nodes could connect to the cluster even after leaving, which was a huge improvement. However, when the router rebooted at 4AM, every single member of the cluster would become their own cluster, without reconnecting to any other node after the network connectivity was established again. To join the cluster again, I would need to create a daemon that checks if it's part of the cluster, then reboot IPFS-Cluster to join the cluster.

Conclusion

Essentially, I wanted IPFS-Cluster to create a cluster of small IPFS nodes working together to solve larger problems. In reality, IPFS-Cluster will only shine with larger nodes in a production environment and 100% uptime.

Suggestions

What I want of IPFS-Cluster is the ability to create and join multiple logical clusters. Each logical cluster would be made of write nodes and read nodes. Write nodes would be nodes that have the ability to write to the consensus. By opposition, read nodes would be nodes that only have the ability to read a consensus.

To obtain read permission or write permissions, an IPFS-Cluster node would join a logical cluster using a read key or a write key previously generated by the logical cluster. Read keys are keys that could be given publicly so that random users could join a specific logical cluster and help it by donating storage. Write keys are keys to be kept secret and used by trusted nodes to write to the consensus.

Whenever a random user comes offline or is disconnected from the network, the cluster should redistribute its lost data. Ideally, the nodes with the larger amount of free space should take the charge, up to the pin's replication_factor_max. Nodes that are still alive but disconnected should attempt to reconnect to other nodes every minute. Moreover, nodes that're almost full should also attempt to offload some data to other, larger nodes.

This would allow the setup that I had to work regardless of network conditions or router reboots.

I dream of an IPFS Cluster that could be ran on servers, desktops, laptops, mobile devices and internet of things that wouldn't care about network conditions. Nodes could come and go, and the cluster would repair itself using the available peers and the rest of the IPFS network.

@NatoBoram NatoBoram added kind/enhancement A net-new feature or improvement to an existing feature need/review Needs a review labels May 20, 2019
@hsanjuan
Copy link
Collaborator

hsanjuan commented May 20, 2019

Each logical cluster would be made of write nodes and read nodes

Last week we merged RPC auth and CRDT consensus supports write peers (those part of the "trusted_peers" config option) and read-only peers (everyone else).

Whenever a random user comes offline or is disconnected from the network, the cluster should redistribute its lost data

This happens already, but note that Cluster cannot verify if the users are lying about the data they hold.

Nodes that are still alive but disconnected should attempt to reconnect to other nodes every minute.

Thanks for doing very early testing, we can surely improve this.

@hsanjuan hsanjuan added topic/meta Topic meta topic/user-story User story and removed kind/enhancement A net-new feature or improvement to an existing feature need/review Needs a review labels May 20, 2019
@NatoBoram
Copy link
Author

NatoBoram commented May 20, 2019

Thanks for your replies!

CRDT consensus supports write peers (those part of the "trusted_peers" config option)

Would it be appropriate to consider the bootstrap target a trusted peer regardless of config?

Last week we merged RPC auth

Is there a documentation on what it is, how to use it, and its use cases?

@hsanjuan
Copy link
Collaborator

hsanjuan commented May 20, 2019

Would it be appropriate to consider the bootstrap target a trusted peer regardless of config?

That's a very good question. I am not sure what the best approach is.

Is there a documentation on what it is, how to use it, and its use cases?

We will write docs for the stable release, but right now there is nothing. Other than that, the trusted_peers option in the crdt config section is the only thing that the user sees I think.

@hsanjuan
Copy link
Collaborator

hsanjuan commented Oct 3, 2019

Would it be appropriate to consider the bootstrap target a trusted peer regardless of config?

That's a very good question. I am not sure what the best approach is.

This happens now

Is there a documentation on what it is, how to use it, and its use cases?

We will write docs for the stable release, but right now there is nothing. Other than that, the trusted_peers option in the crdt config section is the only thing that the user sees I think.

We now have docs updated to the stable release. The trusted_peers part is mostly in https://cluster.ipfs.io/documentation/guides/security/

I will close this for the moment. Feel free to re-open with more questions!

@hsanjuan hsanjuan closed this as completed Oct 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/meta Topic meta topic/user-story User story
Projects
None yet
Development

No branches or pull requests

2 participants