New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
My thoughts on IPFS-Cluster after using it for lineageos-on-ipfs.com #783
Comments
Last week we merged RPC auth and CRDT consensus supports write peers (those part of the "trusted_peers" config option) and read-only peers (everyone else).
This happens already, but note that Cluster cannot verify if the users are lying about the data they hold.
Thanks for doing very early testing, we can surely improve this. |
|
Thanks for your replies!
Would it be appropriate to consider the bootstrap target a trusted peer regardless of config?
Is there a documentation on what it is, how to use it, and its use cases? |
That's a very good question. I am not sure what the best approach is.
We will write docs for the stable release, but right now there is nothing. Other than that, the |
This happens now
We now have docs updated to the stable release. The trusted_peers part is mostly in https://cluster.ipfs.io/documentation/guides/security/ I will close this for the moment. Feel free to re-open with more questions! |
NatoBoram commentedMay 20, 2019
•
edited
I used IPFS-Cluster to host LineageOS for a few months. I've learned many things about IPFS and IPFS-Cluster and I've seen many patches resolving problems I was currently having. However, it's time for me to say goodbye to lineageos-on-ipfs.com. This is the full story of my project, including what I wanted from IPFS-Cluster at the beginning.
LineageOS
LineageOS has a limited storage space to store their builds. Every 4 weeks, they delete old builds. The problem with that is that sometimes, devices aren't receiving new builds because of a lack of maintainer, so they become unsupported and eventually vanish. However, using LineageOS, even if its support was dropped, is still advantageous and more secure than using the latest stock firmware for most devices that are unsupported by their manufacturer.
With that in mind, I've set up to store the latest build of every device on IPFS, so that old but useful builds will stop vanishing.
Using IPFS to store LineageOS' builds seemed like a good idea, and it still seems like a good idea from my perspective. However, the amount of effort it requires warrants a programmer working full-time on this alone. Hence, I quickly ran into problems.
Setup
Here's the material that I connected :
droplethphelionasusDroplet was 100% online, but had very limited storage space. 50 GiB, to be precise. Thus, whenever the cluster died, the node would full itself and choke to death. It was running the web interface lineageos-on-ipfs.com, which was responsible for displaying the builds to the public, and LOSGoI, which was responsible for fetching the builds from LineageOS' website and putting them on IPFS.
HP was running most of the times. However, my router would reboot every night at 4 AM. This brought down my home network and decimated the cluster every single time.
Helion was mostly running, but I was dual-booting Windows and I was actively using that device.
Asus was pretty much all the times offline, and sometimes came up online.
Raft
At first,
raftwas the only consensus available on IPFS-Cluster. When usingraft, 50%+1 of nodes had to be online at all time for the cluster to be alive. However, achieving 100% uptime is nearly impossible using consumer material. After many days of trial, I realized that the project was doomed to fail, but I was still determined to make it work.At first, I had
leave_on_shutdownturned off. However, this caused the cluster to be irrecoverable pretty much every night. Nodes would be down, and bootstrapping them would fail because Droplet still thought the nodes were part of the cluster, but they had been removed when they came up online with the--bootstrapoption. Moreover, because less than 50%+1 nodes were online, no other node could join the cluster.Then I tried to enable
leave_on_shutdown. This option was pretty much useless since they would leave when the network came down, without warning Droplet, which resulted in the same errors as before.Overall, the
raftconsensus was too strict to be used at all. Nodes weren't able to connect after the first time they left the cluster, so I had to clean the state pretty much every day, which is quite a bad idea when you want to store critical data.CRDT
Then, the
crdtconsensus came to be. It was a better consensus because my nodes could connect to the cluster even after leaving, which was a huge improvement. However, when the router rebooted at 4AM, every single member of the cluster would become their own cluster, without reconnecting to any other node after the network connectivity was established again. To join the cluster again, I would need to create a daemon that checks if it's part of the cluster, then reboot IPFS-Cluster to join the cluster.Conclusion
Essentially, I wanted IPFS-Cluster to create a cluster of small IPFS nodes working together to solve larger problems. In reality, IPFS-Cluster will only shine with larger nodes in a production environment and 100% uptime.
Suggestions
What I want of IPFS-Cluster is the ability to create and join multiple logical clusters. Each logical cluster would be made of write nodes and read nodes. Write nodes would be nodes that have the ability to write to the consensus. By opposition, read nodes would be nodes that only have the ability to read a consensus.
To obtain read permission or write permissions, an IPFS-Cluster node would join a logical cluster using a read key or a write key previously generated by the logical cluster. Read keys are keys that could be given publicly so that random users could join a specific logical cluster and help it by donating storage. Write keys are keys to be kept secret and used by trusted nodes to write to the consensus.
Whenever a random user comes offline or is disconnected from the network, the cluster should redistribute its lost data. Ideally, the nodes with the larger amount of free space should take the charge, up to the pin's
replication_factor_max. Nodes that are still alive but disconnected should attempt to reconnect to other nodes every minute. Moreover, nodes that're almost full should also attempt to offload some data to other, larger nodes.This would allow the setup that I had to work regardless of network conditions or router reboots.
I dream of an IPFS Cluster that could be ran on servers, desktops, laptops, mobile devices and internet of things that wouldn't care about network conditions. Nodes could come and go, and the cluster would repair itself using the available peers and the rest of the IPFS network.
The text was updated successfully, but these errors were encountered: