Joshua is an attempt to write a (globally) distributed, resilient, read-through file server. Don't use it right now, I'm just trying it out.
The archtypical use case, and the one I'm attempting to cover, is a web app with servers in multiple data centers that accepts uploads from users. If you upload a file to one data center, it should be available for reading or deleting in any other DC.
- Apparent durability. Once data has been written to any node, it MUST be available from any node.
- Automatic eventual consistency. If you upload a dozen files to one node and wait a little while, all nodes MUST eventually be consistent without any further human action.
- HTTP read access. Consumers MUST not need to touch the file contents once it's uploaded.
- Self-bootstrapping. If a new node comes online, it MUST notify the cluster and populate itself.
- No database dependency. For metadata, it SHOULD not be necessary to run a local database or data store outside the server process. (A database-backed datastore is allowed.)
- Protocol-based. It SHOULD be possible to implement the protocol in another language and join the cluster.
- Load balancing. Use anycast between DCs. Use a load balancer within a DC.
- Updating objects. Only create and delete operations will be supported.
- Web-based status and monitoring. View cluster status over the web.
- Pluggable storage backends. Starting with the file system, but MongoDB is an obvious place to go.
- Performance. Obviously.