Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requesting delta-sync in longterm [$325.00] #417

Open
exokkk opened this issue Jul 15, 2016 · 53 comments
Open

Requesting delta-sync in longterm [$325.00] #417

exokkk opened this issue Jul 15, 2016 · 53 comments

Comments

@exokkk
Copy link

@exokkk exokkk commented Jul 15, 2016

Hi all,

Delta Sync would be great for my truecrypt/veracrypt huge files (~30-100gb). Without delta sync I must stay absent from this product.

Delta Sync would also provide you a feature to distinguish from owncloud

Couldn't there be an optional (maybe extension / folder / file -based) mechanism to perform Delta Sync ("optionally" as I agree that Delta Sync does not make sense for all kind of files/folders)? Maybe even using something existing like rsync?


There is an open bounty on this issue. Add to the bounty at Bountysource
badge

@rullzer
Copy link
Member

@rullzer rullzer commented Jul 15, 2016

I have looked into zsync (client rsync). If we would ever implement such a feature it will most likely will be using that. Since you have to offload all the computation to the client side else you will kill the server.

I don't know about truecrypt/veracrypt. But actually most container formats (and encrypted is even worse). don't lend themself particually well for delta sync. Since often a small change results in a a lot of changed bytes.

@exokkk
Copy link
Author

@exokkk exokkk commented Jul 15, 2016

I can only speak for truecrypt (but suppose that veracrypt does the same): if you change 1MB within the container the whole file changes only a bit more than 1MB as well (I am very sure about this). Also not even password changes would change the whole file, only little parts, refer https://news.ycombinator.com/item?id=6523286 and http://crypto.stackexchange.com/questions/18479/how-does-truecrypt-change-password-without-the-need-for-a-complete-re-encryption
So Delta Sync + Truecrypt (veracrypt) is a really perfect combination.
Although I can see that this feature will not be desired from many people there are some cases, like mine, where it would be great. Maybe there are other cases that I/we cannot think of but exist. Not sure, but VM images might profit from delta sync as well for example. Also, for some files you might uncompress -> compare -> delta_sync -> compress_server_side_again [ok this might be a too costly action, I do not know. This would work for e.g. *.pptx etc as well]
Clientside computation seems rational to me.

Thanks for considering the feature in any future release.

@tflidd
Copy link
Contributor

@tflidd tflidd commented Jul 16, 2016

That is an interesting feature for virtual images. Are there any experiences with encrypted containers and diff sync tools?

One annoying thing about the sync-process is that you must transfer the files through the client. You can't just place them from a hard disk or use a faster transfer mechanism. Therefore many ask for a diff-sync feature but the ability to compare files (based on a hash-sum) would already help a lot of these people and it's much easier to implement.

@maverick74
Copy link

@maverick74 maverick74 commented Dec 13, 2016

I don't mind at all to offload all the computation to the client side, as long as such feature is made available!!! We, actually, consider this a very important feature for business!

To have an idea in a 0-10 scale what would be the cost (not monetary) of developing this?

Tks

@wudimenghuan
Copy link

@wudimenghuan wudimenghuan commented Jan 19, 2017

Dropbox and Onedrive have delta sync.
Seafile have delta sync, but it cause files broken.
I hope you see the rsync. I do need delta sync

@ariselseng
Copy link
Member

@ariselseng ariselseng commented Feb 16, 2017

@rullzer How would the client have the previous file for calculating the delta?

@Bigpet
Copy link

@Bigpet Bigpet commented Mar 23, 2017

@cowai you either need to keep a copy of your last sync around (using file-system specific things like shadow copy seems out of the question for the broad range of platforms with sync clients) or you have to do block-level syncing instead, like "syncthing" does it.

@stratacast
Copy link

@stratacast stratacast commented Apr 13, 2017

I think block-level syncing like syncthing is probably the easiest implementation in code, and perhaps the cheapest to write. I'm seriously interested in this, and I know some companies that are too (Quickbooks files man...ugly stuff). Like @Bigpet said, you'd need a copy of the file before changes onhand, or put some hooks into writes that go into that specific directory, but the latter sounds very messy and dangerous. I wish I knew how to write code better because I would 100% do this..I'm definitely a Kindergarten koder compared to a lot of people that put stuff on github. Thought I'd voice that there's interest on my end, and on the end of local companies I know.

@eglipeter
Copy link

@eglipeter eglipeter commented Jun 5, 2017

Are there concrete plans when delta syncing will be available? May I hope to see this implemented in Nextcloud 13 already?

@gschenck
Copy link

@gschenck gschenck commented Oct 18, 2017

There is some progress on owncloud:

owncloud/core#16162

@ahmedammar
Copy link

@ahmedammar ahmedammar commented Oct 26, 2017

@gschenck Please feel free to try out the latest code, the core implementation should be complete now.

@jkaberg
Copy link

@jkaberg jkaberg commented Oct 26, 2017

@ahmedammar any plans on submitting the PR against NC as well further down the road?

@ahmedammar
Copy link

@ahmedammar ahmedammar commented Oct 26, 2017

@jkaberg once the work is complete and merged in oC I can have a look, assuming the code-base isn't too different at the core ...

@maverick74
Copy link

@maverick74 maverick74 commented Nov 28, 2017

@ahmedammar can you give us an update about the feature? (If possible a probable ETA?)

@ahmedammar
Copy link

@ahmedammar ahmedammar commented Nov 28, 2017

@maverick74 no ETA for nextcloud, if someone is willing to open a bounty for it I could look into it more urgently, otherwise, for reference:
owncloud/client#6131
owncloud/core#29404

@L00maca
Copy link

@L00maca L00maca commented Dec 24, 2017

It's not much and I'm not even sure I did this right since I never did this before, but I don't mind chipping in to help this get done.
Bountysource

@jospoortvliet
Copy link
Member

@jospoortvliet jospoortvliet commented Jan 2, 2018

The bounty is already at 115 dollar now. It should not be terribly hard to get this merged in Nc client and server, I think, but it won't make it for 13 😄

@ahmedammar
Copy link

@ahmedammar ahmedammar commented Jan 2, 2018

I won’t be looking into this until oC actually merge first, since that saves me any duplicated effort. Unless this bounty gets so big that I can ignore oC all together :)

@maverick74
Copy link

@maverick74 maverick74 commented Mar 12, 2018

FWIW i guess there are some news at owncloud/core#29404

@wudimenghuan
Copy link

@wudimenghuan wudimenghuan commented Mar 17, 2018

@maverick74 So It can be merged... @rullzer @jospoortvliet

@tflidd
Copy link
Contributor

@tflidd tflidd commented Mar 21, 2018

FWIW i guess there are some news at owncloud/core#29404

That's the server side. Client-side is still on a development branch and subject to testing (https://github.com/owncloud/client/labels/Delta-sync). Unless this is not finished, it doesn't make a lot of sense to merge anything at the moment, so you can only help testing it.

@petrk94
Copy link

@petrk94 petrk94 commented Apr 9, 2018

I think nextcloud should hurry up, delta sync will be released in the next owncloud update:
https://owncloud.com/owncloud-implements-delta-sync-technology/

@jospoortvliet
Copy link
Member

@jospoortvliet jospoortvliet commented Apr 12, 2018

@petrk94 yeah, it could in theory be merged - but ownCloud notes it'll be in testing until 2019, let's see. @ahmedammar can make a PR for the server - the client will get it as we sync upstream actively still.

@petrk94
Copy link

@petrk94 petrk94 commented Apr 12, 2018

Im wondering why I get so much thump down, just want to keep the thread updated :/

@nextcloud-bot nextcloud-bot added the stale label Jun 20, 2018
@jcklpe
Copy link

@jcklpe jcklpe commented Aug 9, 2018

If I'm understanding stuff correctly it sounds like NextCloud won't be having this feature any time soon, correct?

@fracture-point
Copy link

@fracture-point fracture-point commented Apr 11, 2019

Chipped in on the bounty because this is high-priority for me, and I would much rather stay with NC than convert (back) to OC. I'd be happy to help test as well.

@leewsb
Copy link

@leewsb leewsb commented Jul 7, 2019

+1 for me. Delta sync is hugely important. I can only hypothesize that the reason it's low on your priority list is that you are chasing cool new features vs what everyone can benefit from and maybe the voice of this need just isn't being heard (he who shouts loudest?) I need to sync VM images and huge PST files daily.

@lowlyocean
Copy link

@lowlyocean lowlyocean commented Sep 25, 2019

Is it possible to use Nextcloud server 15 but owncloud 2.6.0+ (featuring DeltaSync)? I migrated from ownCloud to nextCloud and would rather not risk migrating back,. This feature seems important. What drives the prioritization of Drive and E2EE ahead of DeltaSync?

@rubo77 rubo77 changed the title Requesting delta-sync in longterm Requesting delta-sync in longterm [$325.00] Oct 10, 2019
@rubo77 rubo77 added the bounty label Oct 10, 2019
@iskradelta
Copy link

@iskradelta iskradelta commented Oct 18, 2019

Ill have a stab at this the next few days, looking at ownCloud - they use zsync, with a lot of code to integrate it with owncloud apis, this .zsync metadata file, meh.

Id rather go full rsync, it should be possible for the server to shell out to rsync daemon or client, and connect its stdin and stdout through a http tunnel to the nextcloud-clients.

@nearwood
Copy link

@nearwood nearwood commented Oct 18, 2019

FWIW, Windows has (had?) Remote Differential Compression built-in and there was some technical documentation on it that might have been useful, but I cannot find it anymore.

@ahmedammar
Copy link

@ahmedammar ahmedammar commented Oct 19, 2019

@iskradelta reinventing the wheel sounds like a great plan!

@realies
Copy link

@realies realies commented Oct 19, 2019

@ahmedammar, you're not making it easier... 😏

@iskradelta
Copy link

@iskradelta iskradelta commented Oct 19, 2019

@ahmedammar reinvinting the wheel? Thats the opposite of my plan, instead of "reinventing the wheel", meaning "reimplement rsync algorithm or another differential algorithm", and then "reimplement or make yourslef a new protcol" or "now fit the existing wire protocol on top of your api"... the plan is to do the opposite, tunnel the existing rsync network wire protocol over an existing connection which nextcloud-servers to client has - the websocket connection, instead of HTTP tunnels as I wrote above, since most people cant configure that correctly.

A prototype is already working for me, on the nextcloud-server part, it took one evening of "coding".

Zsync implementation is self-mutilation "oh rsync cant be done over http, lets modify the network protocol to do rsync over http", but yeah you can tunnel anything over http or websockets, and the owncloud implementation of it, is bugy and too large to maintain.

@ariselseng
Copy link
Member

@ariselseng ariselseng commented Oct 19, 2019

@iskradelta will your implementation scale? If you have many users doing rsync you will make it do way more work than with zsync if I am not mistaken?

@iskradelta
Copy link

@iskradelta iskradelta commented Oct 19, 2019

@ariselseng rsync is only cpu intensive on the sender side. The sender side can be the client or the server, depending on if the user is uploading or downloading. There is a limit to how many users can be syncing their tree (initial downloading) at the same time, that limit is the cpu available to the server, if not hitting bandwidth limit before that, and only gets hit - when the users tree (files) have changed timestamp or size - so once synced - many users can keep "syncing" without causing high cpu.

When, if ever, this becomes a problem there is a solution, to condier caching to avoid the expensive checksumming. But I dont like it, since it means we just assume that syncing means "is always initial sync" - that users dont have any of their data on their phones/clients. And its really a benefit (zsync pre-calculated metadatafile) when all the users are downloading the same tree (files), again in the case of zsync makes sense when its made for public data like iso images.

There is a reason even dropbox is using librsync. Its the best tool, the best.

@ahmedammar
Copy link

@ahmedammar ahmedammar commented Oct 19, 2019

Good luck.

@jospoortvliet
Copy link
Member

@jospoortvliet jospoortvliet commented Dec 7, 2019

@iskradelta I look forward to try out your experiment ;-)

wrt others asking about priorities - we prioritize things that benefit more users or that are paid for by customers. While everyone here cares deeply about deltasync, 99% of the users don't handle very big files in which small parts are regularly changed - the only scenario's I can think of are VM's and encrypted filesystems, both of which are never used by the vast majority of computer users. The drive and E2E have big benefits for normal users, meanwhile, so we focus there. And finishing those is taking more than long enough, I hope you don't mind that we don't take on another huge task until we have those both done. Our team can actually barely handle the support load for customers, that's the main reason we are not making much progress. We're trying to hire more people for 3 years already :(

@nextcloud nextcloud deleted a comment from Ornias1993 Jan 31, 2020
@nextcloud nextcloud deleted a comment from Ornias1993 Jan 31, 2020
@kesselb

This comment was marked as off-topic.

@nextcloud nextcloud deleted a comment from Ornias1993 Jan 31, 2020
@kesselb

This comment was marked as off-topic.

@Ornias1993

This comment was marked as off-topic.

@realies

This comment was marked as off-topic.

@kesselb

This comment was marked as off-topic.

@Lordroran

This comment was marked as off-topic.

@RedKage

This comment was marked as off-topic.

@tehXor

This comment was marked as off-topic.

@nextcloud nextcloud locked as too heated and limited conversation to collaborators Feb 14, 2020
@jospoortvliet
Copy link
Member

@jospoortvliet jospoortvliet commented Feb 20, 2020

I think it was explained before but:

  • small files (under 5 or 10 mb) don't benefit from deltasync - the overhead is not worth it
  • files that are compressed and/or encrypted usually change everywhere when a small modification is made, so they don't benefit either

So almost all common file types, including office documents (yes they are compressed), images, music and large PSD files etc do not benefit from it. A metadata change to a large movie might (not always, depends on the file format) and sometimes to large images, too. But how often do you do that? Once a month? It is really almost exclusively nice for VM images and encrypted container formats. And yes, they matter, but aren't the most important in the world for most of our users, sorry.

Look, customers use Nextcloud in many ways. SIEMENS for example uses it only with HUGE files (minimum 30 gigabyte, typically 50-100gb). Some media companies use it with PSD files of hundreds of MB's. If we could make those cases much more efficient with deltasync, we would look into it, but it wouldn't make a difference so we don't.

There is little point in discussing this further. We have a lot of work to do and until we have a larger team and have finished other tasks, we won't get to this. If somebody else wants to do it - please, go ahead, pull requests are welcome. If somebody wants to pay for it, get in contact with sales.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet