Requesting delta-sync in longterm [$325.00] #417
Comments
I have looked into zsync (client rsync). If we would ever implement such a feature it will most likely will be using that. Since you have to offload all the computation to the client side else you will kill the server. I don't know about truecrypt/veracrypt. But actually most container formats (and encrypted is even worse). don't lend themself particually well for delta sync. Since often a small change results in a a lot of changed bytes. |
I can only speak for truecrypt (but suppose that veracrypt does the same): if you change 1MB within the container the whole file changes only a bit more than 1MB as well (I am very sure about this). Also not even password changes would change the whole file, only little parts, refer https://news.ycombinator.com/item?id=6523286 and http://crypto.stackexchange.com/questions/18479/how-does-truecrypt-change-password-without-the-need-for-a-complete-re-encryption Thanks for considering the feature in any future release. |
That is an interesting feature for virtual images. Are there any experiences with encrypted containers and diff sync tools? One annoying thing about the sync-process is that you must transfer the files through the client. You can't just place them from a hard disk or use a faster transfer mechanism. Therefore many ask for a diff-sync feature but the ability to compare files (based on a hash-sum) would already help a lot of these people and it's much easier to implement. |
I don't mind at all to offload all the computation to the client side, as long as such feature is made available!!! We, actually, consider this a very important feature for business! To have an idea in a 0-10 scale what would be the cost (not monetary) of developing this? Tks |
Dropbox and Onedrive have delta sync. |
@rullzer How would the client have the previous file for calculating the delta? |
@cowai you either need to keep a copy of your last sync around (using file-system specific things like shadow copy seems out of the question for the broad range of platforms with sync clients) or you have to do block-level syncing instead, like "syncthing" does it. |
I think block-level syncing like syncthing is probably the easiest implementation in code, and perhaps the cheapest to write. I'm seriously interested in this, and I know some companies that are too (Quickbooks files man...ugly stuff). Like @Bigpet said, you'd need a copy of the file before changes onhand, or put some hooks into writes that go into that specific directory, but the latter sounds very messy and dangerous. I wish I knew how to write code better because I would 100% do this..I'm definitely a Kindergarten koder compared to a lot of people that put stuff on github. Thought I'd voice that there's interest on my end, and on the end of local companies I know. |
Are there concrete plans when delta syncing will be available? May I hope to see this implemented in Nextcloud 13 already? |
There is some progress on owncloud: |
@gschenck Please feel free to try out the latest code, the core implementation should be complete now. |
@ahmedammar any plans on submitting the PR against NC as well further down the road? |
@jkaberg once the work is complete and merged in oC I can have a look, assuming the code-base isn't too different at the core ... |
@ahmedammar can you give us an update about the feature? (If possible a probable ETA?) |
@maverick74 no ETA for nextcloud, if someone is willing to open a bounty for it I could look into it more urgently, otherwise, for reference: |
The bounty is already at 115 dollar now. It should not be terribly hard to get this merged in Nc client and server, I think, but it won't make it for 13 |
I won’t be looking into this until oC actually merge first, since that saves me any duplicated effort. Unless this bounty gets so big that I can ignore oC all together :) |
FWIW i guess there are some news at owncloud/core#29404 |
@maverick74 So It can be merged... @rullzer @jospoortvliet |
That's the server side. Client-side is still on a development branch and subject to testing (https://github.com/owncloud/client/labels/Delta-sync). Unless this is not finished, it doesn't make a lot of sense to merge anything at the moment, so you can only help testing it. |
I think nextcloud should hurry up, delta sync will be released in the next owncloud update: |
@petrk94 yeah, it could in theory be merged - but ownCloud notes it'll be in testing until 2019, let's see. @ahmedammar can make a PR for the server - the client will get it as we sync upstream actively still. |
Im wondering why I get so much thump down, just want to keep the thread updated :/ |
If I'm understanding stuff correctly it sounds like NextCloud won't be having this feature any time soon, correct? |
Chipped in on the bounty because this is high-priority for me, and I would much rather stay with NC than convert (back) to OC. I'd be happy to help test as well. |
+1 for me. Delta sync is hugely important. I can only hypothesize that the reason it's low on your priority list is that you are chasing cool new features vs what everyone can benefit from and maybe the voice of this need just isn't being heard (he who shouts loudest?) I need to sync VM images and huge PST files daily. |
Is it possible to use Nextcloud server 15 but owncloud 2.6.0+ (featuring DeltaSync)? I migrated from ownCloud to nextCloud and would rather not risk migrating back,. This feature seems important. What drives the prioritization of Drive and E2EE ahead of DeltaSync? |
Ill have a stab at this the next few days, looking at ownCloud - they use zsync, with a lot of code to integrate it with owncloud apis, this .zsync metadata file, meh. Id rather go full rsync, it should be possible for the server to shell out to rsync daemon or client, and connect its stdin and stdout through a http tunnel to the nextcloud-clients. |
FWIW, Windows has (had?) Remote Differential Compression built-in and there was some technical documentation on it that might have been useful, but I cannot find it anymore. |
@iskradelta reinventing the wheel sounds like a great plan! |
@ahmedammar, you're not making it easier... |
@ahmedammar reinvinting the wheel? Thats the opposite of my plan, instead of "reinventing the wheel", meaning "reimplement rsync algorithm or another differential algorithm", and then "reimplement or make yourslef a new protcol" or "now fit the existing wire protocol on top of your api"... the plan is to do the opposite, tunnel the existing rsync network wire protocol over an existing connection which nextcloud-servers to client has - the websocket connection, instead of HTTP tunnels as I wrote above, since most people cant configure that correctly. A prototype is already working for me, on the nextcloud-server part, it took one evening of "coding". Zsync implementation is self-mutilation "oh rsync cant be done over http, lets modify the network protocol to do rsync over http", but yeah you can tunnel anything over http or websockets, and the owncloud implementation of it, is bugy and too large to maintain. |
@iskradelta will your implementation scale? If you have many users doing rsync you will make it do way more work than with zsync if I am not mistaken? |
@ariselseng rsync is only cpu intensive on the sender side. The sender side can be the client or the server, depending on if the user is uploading or downloading. There is a limit to how many users can be syncing their tree (initial downloading) at the same time, that limit is the cpu available to the server, if not hitting bandwidth limit before that, and only gets hit - when the users tree (files) have changed timestamp or size - so once synced - many users can keep "syncing" without causing high cpu. When, if ever, this becomes a problem there is a solution, to condier caching to avoid the expensive checksumming. But I dont like it, since it means we just assume that syncing means "is always initial sync" - that users dont have any of their data on their phones/clients. And its really a benefit (zsync pre-calculated metadatafile) when all the users are downloading the same tree (files), again in the case of zsync makes sense when its made for public data like iso images. There is a reason even dropbox is using librsync. Its the best tool, the best. |
Good luck. |
@iskradelta I look forward to try out your experiment ;-) wrt others asking about priorities - we prioritize things that benefit more users or that are paid for by customers. While everyone here cares deeply about deltasync, 99% of the users don't handle very big files in which small parts are regularly changed - the only scenario's I can think of are VM's and encrypted filesystems, both of which are never used by the vast majority of computer users. The drive and E2E have big benefits for normal users, meanwhile, so we focus there. And finishing those is taking more than long enough, I hope you don't mind that we don't take on another huge task until we have those both done. Our team can actually barely handle the support load for customers, that's the main reason we are not making much progress. We're trying to hire more people for 3 years already :( |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
I think it was explained before but:
So almost all common file types, including office documents (yes they are compressed), images, music and large PSD files etc do not benefit from it. A metadata change to a large movie might (not always, depends on the file format) and sometimes to large images, too. But how often do you do that? Once a month? It is really almost exclusively nice for VM images and encrypted container formats. And yes, they matter, but aren't the most important in the world for most of our users, sorry. Look, customers use Nextcloud in many ways. SIEMENS for example uses it only with HUGE files (minimum 30 gigabyte, typically 50-100gb). Some media companies use it with PSD files of hundreds of MB's. If we could make those cases much more efficient with deltasync, we would look into it, but it wouldn't make a difference so we don't. There is little point in discussing this further. We have a lot of work to do and until we have a larger team and have finished other tasks, we won't get to this. If somebody else wants to do it - please, go ahead, pull requests are welcome. If somebody wants to pay for it, get in contact with sales. |
Hi all,
Delta Sync would be great for my truecrypt/veracrypt huge files (~30-100gb). Without delta sync I must stay absent from this product.
Delta Sync would also provide you a feature to distinguish from owncloud
Couldn't there be an optional (maybe extension / folder / file -based) mechanism to perform Delta Sync ("optionally" as I agree that Delta Sync does not make sense for all kind of files/folders)? Maybe even using something existing like rsync?
There is an open bounty on this issue. Add to the bounty at Bountysource

The text was updated successfully, but these errors were encountered: