Support for multiple servers/mirrors #110

DavidJFelix · 2013-02-07T21:12:24Z

Possible enhancement Idea I had when rolling out a second seafile instance on a local machine (my current setup is on amazon AWS): I thought that a client being able to connect to multiple, mirrored servers might be beneficial for data durability and download speed/network utilization. It's really just a basic idea, I haven't had much time to think about how it might be implemented or any downsides, just thought I'd bring it up for discussion.

freeplant · 2013-02-08T02:49:47Z

The libraries are easy to be mirrored, while other information stored in the database, like user info and permission info, is hard to.

For download speed/network utilization, we can mirror the file blocks, and client can download file blocks from multiple block servers. This feature is implemented but not tested yet.

blablup · 2013-02-12T19:50:05Z

how can i test this feature? What will it do when only 1 server is reachable?

freeplant · 2013-02-15T13:30:47Z

Not able to test yet, config options are not added.

Seafile breaks files into blocks. In future, Seafile server will breaks into one metadata server and multiple block servers. You can replicate blocks into multiple block servers. The data replication will be asynchronous. If one block server is unreachable, causing some blocks missing, the file syncing processes that requires the missing blocks will be blocked until the server becomes reachable.

In current version, the seafile server acts both as metadata server and block server. The syncing process is already breaks into two phases, i.e., metadata syncing and block syncing.

DavidJFelix · 2013-02-15T14:29:38Z

I understand this is in the planning phase still, but is there a plan for metadata mirrors? Having additional block servers is fine, but redundancy of that level could be accomplished with a local disk mirrored to an offsite NAS. Multiple servers means a lower likelihood of outages. Relying on a single metadata server defeats this purpose as it's still a single point of failure.

freeplant · 2013-02-16T03:35:53Z

Seafile metadata server stores its data in database. Itself is stateless. If you make a high-availability solution for the database, the problem is solved.

DavidJFelix · 2013-02-18T15:11:17Z

So say I have a MySQL cluster and 2 Seafile servers. Is the DB the metadata server, or is it one of the Seafile servers? Would I have to designate one of the servers as a "metadata" server or does it do that automatically? When the metadata (if it's a Seafile server in the answer above) server goes down, do I need to detect that and tell the other server to stand in, or can the other server detect the event and handle it on its own? I'm just confused because above it felt like you were implying a master/slave style of configuration, so I'm just wondering if that was wrongly inferred on my part or if there is some mechanism for a slave/block server to stand in as a master/metadata server, which I don't understand.

freeplant · 2013-02-18T16:00:36Z

There are actually two problems, high availability and scalability.

Suppose we store the metadata in MySQL cluster and the file blocks in AWS S3 like distributed storage.

To address high availability. The simplest solution is to run two Seafile servers with different IP addresses. The seafile client connects the server via DNS. When one server is down, you can switch the DNS to the second server.

To address scalability, we separate seafile servers into two groups: metadata servers and block servers. The client keeps a persistent connection with one metadata server and constantly checks whether something need to be synced. The metadata server checks the database when clients ask the question.

During syncing, the metadata server reads metadata from database and sends them to the client. Then it tells the client the addresses of block servers. The client then connects the block servers to get blocks. When all file blocks are downloaded, the client update the local files.

These are the planned architecture. We will implement a seafile cluster on AWS in next few months and solve the details.

freeplant · 2013-02-18T16:15:23Z

Another harder problem is how to deploy seafile in different data-centers to improve availability and performance.
Say if your company have two offices in two different countries, but want to have one Seafile that servers the two places as well (Appears as one seafile to users).

The metadata servers could only be deployed in one data-center. The block servers can be deployed in different data-centers to speed up block transferring, but blocks can only be replicated asynchronously. This is what I mean by "The data replication will be asynchronous. If one block server is unreachable, causing some blocks missing, the file syncing processes that requires the missing blocks will be blocked until the server becomes reachable".

DavidJFelix · 2013-02-18T18:20:04Z

Can two metadata servers operate simultaneously, or do you have to use DNS switching? Say I have 2 metadata servers, cloud1.example.com and cloud2.example.com. My DNS record for cloud.example.com redirects to the primary metadata server, cloud1. If this server fails, I'll update the DNS record so that it redirects to cloud2 - but while the update is propagating, users still can connect if their client knows to contact cloud2. Perhaps when users connect to the primary metadata server, it could push a configuration update, notifying them of other metadata servers that are active, the client can do load balancing and attempt to recover connections on its own.

freeplant · 2013-02-19T01:55:27Z

Two metadata servers can operate simultaneously.

DavidJFelix · 2013-02-19T18:56:57Z

Sweet. So is there anything that's missing or should I just close this?

Recap:

Database can be clustered for durability/availability
Metadata servers can be run on multiple machines for durability/availability
Block servers can be mirrored via LVM RAID 1/NAS for durability
Block servers can operate in a redundant fashion for availability (existing feature that needs a config setting)

anything I missed?

blablup · 2013-03-05T08:00:36Z

I don't know how hard it would be to implement, but for the Issue with 2 data-centers wouldn't it be possible:

that the block-servers them self can asynchronous replicate the data to another block-server (It still will be asynchronous I know)
that you cold have different meta-data Servers with all needed databases replicated to the other data-center (I saw that you split things up into different databases, maybe replicate all databases except of the config one).
that each meta-data server had its own dns names settings.
then all clients would can connect to there local meta-data and block-server only and download/upload files only there.
Shared libraries syncing would maybe be stuck in one data-center, for the time the block-servers replicate. But that is (imho) not a Problem because the transfer would succeed when the data is replicated.

If someone needs ha above that, it could be added to on top of that in each data-center. Or if it is possible maybe even make it possible to have a failover meta-data/block-server. Then the client would only connect to the other data-center if the local instance is down.

That would also address a scenario where a company needs to deploy some data to a lot of locations. The data could be edited on from one employee and could be read in all branch offices locally.

Only some thought I just had (and some use cases I have in my head which could really use that, specially the first feature).

freeplant · 2013-03-06T06:57:30Z

I forgot to mention MooseFS. You can store file blocks in MooseFS. It is an easy to setup distributed file system, can provide a simple high availability solution.

Note, MooseFS is not good at storing lots of small files. So it can't provide a scalability solution for Seafile.

DavidJFelix · 2013-03-06T07:08:16Z

I'm not familiar with mooseFS, would it have any benefits over configuring your drives as SAN, passing them to LVM and then using whatever FS you want on top of an LVM RAID1(or 1+0)?

0xarve · 2013-03-06T07:26:20Z

I don't think Seafile should handle content replication. There are various other good projects, such as Openstack Swift and similar that focus purely on this task. I see it as complicating the service if you were to add such functionality into the product directly. Rather, the focus should be on adding more storage backends (local, swift, S3 etc).

For metadata, look at adding support for a key/value (NoSQL) based database that scales easily, rather than MySQL or any other relational database engine. Perhaps Couchbase (http://www.couchbase.com).

freeplant · 2013-03-06T07:28:21Z

With MooseFS, Users can build a distributed file system using 3-10 normal Linux machines for HA. I think SAN is a high-end solution. While MooseFS is a low-end solution.

freeplant · 2013-03-06T07:30:38Z

@maxim We use MySQL because we need Transaction and high consistency. NoSQL is not good at the two. I used Cassandra before, it is only eventually consistency and no transaction.

blablup · 2013-03-06T11:43:21Z

Is it possible that 2 Servers access the same data-pool (via replicated filesystem like mosesFS/GlusterFS/Ceph...) without a problem?
then my question in issue #113 (#issuecomment-13516103) should have been answered "yes", because if that is possible I can run another instance of seaf/ccnet on different ports with the same data-pool and a shared database.

killing · 2013-03-06T12:14:46Z

I think Swift is the most promising open source storage solution for Seafile.
It's object storage, with the same interface as S3, and support HA and is scalable.
I believe RackSpace is using it for CloudFile. But I'm not sure how reliable is it for production use outside of RackSpace.

MooseFS has only one metadata server. There is a SPOF. But I think any NAS-like filesystem is good enough for in house deployment. For internet scale deployment, it's better to use S3 or Swift.

killing · 2013-03-06T12:23:39Z

@blablup Yes, it's complete possible to use shared storage and db.

chenull · 2013-03-17T20:20:47Z

beside swift, ceph has also distributed block storage (rados). it already has metadata server and monitoring server. is it feasible for seafile to use the library provided by ceph ? let the seafile tackle the higher level (web, user, revisions, etc) while ceph/librados providing block storage.

killing · 2013-03-18T01:13:29Z

@chenull Yes, Seafile already supports rados as a block backend.

bvleur · 2013-04-16T09:21:55Z

freeplant commented:

These are the planned architecture. We will implement a seafile cluster on AWS in next few months and solve the details.

For people watching this issue: It can be implemented sucessfully (as seen on Seacloud ) but on the mailinglist I got the disappointing answer:

Blocks and objects are stored in S3. We currently don't plan to open source the backend for S3.

So we will have to duplicate the effort implement our own backends.

Is anyone else working on that or open to collaboration?

jackloom · 2013-05-02T00:18:24Z

Interested!

Deradon · 2014-01-19T13:09:30Z

Another harder problem is how to deploy seafile in different data-centers to improve availability and performance.
Say if your company have two offices in two different countries, but want to have one Seafile that servers the two places as well (Appears as one seafile to users).

Is this possible yet?

I'd like to setup seafile in two different LANs (for performance sync) which themselves should sync with a Seafile-Instance in the web. Not sure if this is possible at the moment.

freeplant · 2014-01-19T13:58:54Z

Not possible yet.

freeplant · 2014-06-05T10:49:15Z

It is possible to deploy Seafile in different data-centers now, by using swift storage backend and MariaDB cluster.

thorgbarth · 2014-06-24T21:15:15Z

Hello freeplant,

this is good news :-)

We consider implementing this for synchronizing three AFP (Mac OS X) file servers at different company sites in Germany, and for about 40 users that are frequently mobile with their MacBooks, but also frequently in one of the three offices. I like the ability of SeaFile to pause the sync of every checked out library for bandwidth management when being mobile... but Seafile misses the LAN sync option that can be found in DropBox, so a multi-site installation is needed to save bandwidth when 40 users are working with larger data sets where nearly a GB is changed each day during work hours, and a 10 mbit/s internet connection... :)

Is there a possibility to get paid support for such a project? Is this really a good solution for productive use? I am thinking of having a server instance on every Mac OS file server, and one in a datacenter in the web, which would be the "master".

Regards
Thorsten

xchardon · 2014-07-11T08:06:47Z

Also interested. Is there any documentation? And / or any possibility foir paid support?
Anything new about the multiple block servers thing? I mean, without using a distributed file system?

fossxplorer · 2014-12-28T10:31:43Z

Very interesting. Has anyone tested any of the mentioned/discussed solutions or running in production?
Thx.

ftrojahn · 2015-03-13T11:39:09Z

I'm interested in this, too.

As a workaround, atm, one can tweek the DNS entry of the seafile server locally.

Example: myserver.mydomain.com is the external address, where the seafile server is reachable online. In my local lan, I change the IP of myserver.mydomain.com to the local, private IP of the server, so synchronization bypasses the online route and uses local lan directly. Makes no sense if the seafile server is only online reachable - e.g. datacenter.

geojanm · 2015-11-09T12:07:13Z

@ftrojahn How did you manage to sync different servers?
My Infrastructure should contain one Server where all users are able to work on. But I need to completely duplicate this to another server with a very slow internet connection. But all Devices (only locally) connected to the permanent slow (or temporary not to internet) connected server should also be up to date.

ftrojahn · 2015-11-09T22:39:27Z

@geojanm no, my example needs only one server; if the client is local, it uses the ip of seafile server in local lan - if client is remote, it get's the online reachable ip e.g. of the external ip of a firewall which does portforwarding to the local server. This does not imply syncing between different offices, just speeds up local sync when the client is local and needs no reconfiguration when client is remote, since the dns name of the server is localy faked with different ip.

Shortly I've come across the possibility of glusterfs geo replication, especially georepsetup. Didn't use that myself, yet, and don't know if seafile may work on top of it, but perhaps you could give a combination of both a try.

shoeper · 2015-11-11T07:51:49Z

Same should work with Ceph as well, but I'm not sure how well it works with a slow link between the two locations (this is not related to Ceph but such a setup in general).

DavidJFelix closed this as completed Feb 19, 2013

freeplant reopened this Mar 6, 2013

freeplant closed this as completed Jun 5, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for multiple servers/mirrors #110

Support for multiple servers/mirrors #110

DavidJFelix commented Feb 7, 2013

freeplant commented Feb 8, 2013

blablup commented Feb 12, 2013

freeplant commented Feb 15, 2013

DavidJFelix commented Feb 15, 2013

freeplant commented Feb 16, 2013

DavidJFelix commented Feb 18, 2013

freeplant commented Feb 18, 2013

freeplant commented Feb 18, 2013

DavidJFelix commented Feb 18, 2013

freeplant commented Feb 19, 2013

DavidJFelix commented Feb 19, 2013

blablup commented Mar 5, 2013

freeplant commented Mar 6, 2013

DavidJFelix commented Mar 6, 2013

0xarve commented Mar 6, 2013

freeplant commented Mar 6, 2013

freeplant commented Mar 6, 2013

blablup commented Mar 6, 2013

killing commented Mar 6, 2013

killing commented Mar 6, 2013

chenull commented Mar 17, 2013

killing commented Mar 18, 2013

bvleur commented Apr 16, 2013

jackloom commented May 2, 2013

Deradon commented Jan 19, 2014

freeplant commented Jan 19, 2014

freeplant commented Jun 5, 2014

thorgbarth commented Jun 24, 2014

xchardon commented Jul 11, 2014

fossxplorer commented Dec 28, 2014

ftrojahn commented Mar 13, 2015

geojanm commented Nov 9, 2015

ftrojahn commented Nov 9, 2015

shoeper commented Nov 11, 2015

Support for multiple servers/mirrors #110

Support for multiple servers/mirrors #110

Comments

DavidJFelix commented Feb 7, 2013

freeplant commented Feb 8, 2013

blablup commented Feb 12, 2013

freeplant commented Feb 15, 2013

DavidJFelix commented Feb 15, 2013

freeplant commented Feb 16, 2013

DavidJFelix commented Feb 18, 2013

freeplant commented Feb 18, 2013

freeplant commented Feb 18, 2013

DavidJFelix commented Feb 18, 2013

freeplant commented Feb 19, 2013

DavidJFelix commented Feb 19, 2013

blablup commented Mar 5, 2013

freeplant commented Mar 6, 2013

DavidJFelix commented Mar 6, 2013

0xarve commented Mar 6, 2013

freeplant commented Mar 6, 2013

freeplant commented Mar 6, 2013

blablup commented Mar 6, 2013

killing commented Mar 6, 2013

killing commented Mar 6, 2013

chenull commented Mar 17, 2013

killing commented Mar 18, 2013

bvleur commented Apr 16, 2013

jackloom commented May 2, 2013

Deradon commented Jan 19, 2014

freeplant commented Jan 19, 2014

freeplant commented Jun 5, 2014

thorgbarth commented Jun 24, 2014

xchardon commented Jul 11, 2014

fossxplorer commented Dec 28, 2014

ftrojahn commented Mar 13, 2015

geojanm commented Nov 9, 2015

ftrojahn commented Nov 9, 2015

shoeper commented Nov 11, 2015