Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use case: home directories in ipfs-cluster #3

Open
xelra opened this issue Jul 5, 2016 · 0 comments

Comments

Projects
None yet
2 participants
@xelra
Copy link

commented Jul 5, 2016

I read in the meeting notes that you're asking for use cases. Here is my very personal use case and wishlist for ipfs-cluster. I'm not 100% sure if this doesn't go beyond the scope of ipfs-cluster, but I'll just write it down here anyway.

I want to replace any distributed filesystem I've currently in use with ipfs. I'm using XtreemFS (because it works well over WAN) and have been using AndrewFS, GlusterFS and HDFS (without WANdisco) in the past.

My first use case is to store home directories in ipfs clusters and these are the features that I would really like to have:

  • Auth: Every user gets authenticated and has access only to the parts of the cluster that he has permissions for (authorization).
  • Encryption: Yes, one could roll his own. But having it already integrated could come with a lot of goodies. Like per-user keys and key management in general. This could maybe be done with the excellent gocryptfs.
  • Admin management web interface: It would be great to have a management interace, where you can set the above 2 things (auth and encryption) and also manage the whole of the cluster. Which replication scheme to use. Managing sub-clusters. Setting different replication schemes for different sub-clusters. Kinda like Amazon S3 -> Amazon Glacier. One cluster is the high availability store, another is less well replicated (effect on speed through striping). Also in the interface the admin should be able to control the traffic and disk space quotas of the participants of the cluster (nodes, sub-clusters)
  • Client-side quota limit: Although the actual quotas that are used should be handled by the system/admin, each participant should be able to set a hard limit to traffic and diskspace that cannot be exceeded.
  • Replication schemes: RAID schemes are well understood, but RAID5 or RAID6 isn't enough. Especially when there are many participants who need high availability of the files that are on the cluster, which is over WAN. The replication isn't just about data safety, it's also about speed. So just going up the RAID level with an increasing number of participants, RAIDX? It would be great to have replication schemes that would take into account the popularity and age of a file. Have old documents only replicated for example on 5 nodes, while new and popular documents that get opened and used a lot are replicated on 20 nodes.
  • Balancer: Every distributed filesystem needs a balancer. Different levels of aggressiveness for the balancer would be great. Especially since the cluster could be over WAN, with nodes or sub-clusters going offline a lot. So there should be balancing schemes for LAN and WAN.

So how would I use the above?
I would have a company-wide cluster where everyone can access their home directories from everywhere. The cluster would have sub-clusters which represent the different sites. Those would be connected over WAN. Inside each sub-cluster there would be nodes which are locally connected over LAN. I don't want to just have dedicated machines building up the cluster, but also each and every "client", which is why the client-side quota limit is important in my opinion.

I hope this is the same vision that you have for ipfs-cluster. I think it's a pretty common use case for a distributed filesystem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.