Sync only the file change, not entire file [$1,755] #179

Open
wesleyhuang opened this Issue Dec 14, 2012 · 117 comments

Comments

@wesleyhuang

wesleyhuang commented Dec 14, 2012

In Dropbox, there notes that only the file changes are sync-ed, not the entire file. https://www.dropbox.com/help/8/en. It is great if ownCloud can do the same. Especially for large files, sync-ing the entire files make a lot of bandwidth and time unnessararily wasted.

In my testing with latest 4.5.4 and sync client in Ubuntu 12.04, I prepared a 1GB text file, append a few characters to the end, and monitor the traffic and the file in the server, I see the entire 1GB file is transffered to the server and the server is actually creating a new file.

A reply from the forum indicates that librsync has this feature http://librsync.sourceforge.net/, maybe the csync can be switched to the libsync.

@allo-

This comment has been minimized.

Show comment
Hide comment
@allo-

allo- Jan 26, 2013

i think this is ESSENTIAL for a sync client. And detecting of moved files. Maybe combining this like rsync -y (trying to find a similiar base file on the remote side to speedup upload), but note that rsync -y can slow down (lib)rsync when working in large directories.

allo- commented Jan 26, 2013

i think this is ESSENTIAL for a sync client. And detecting of moved files. Maybe combining this like rsync -y (trying to find a similiar base file on the remote side to speedup upload), but note that rsync -y can slow down (lib)rsync when working in large directories.

@rarspace01

This comment has been minimized.

Show comment
Hide comment
@rarspace01

rarspace01 Mar 14, 2013

any update on this?

any update on this?

@LoZio

This comment has been minimized.

Show comment
Hide comment
@LoZio

LoZio Jul 11, 2013

+1 for this. There's a wealth of files that may slightly change each time and are big to sync. Outlook PST, truecrypt, databases, phone backups, ...
Any update on this?

LoZio commented Jul 11, 2013

+1 for this. There's a wealth of files that may slightly change each time and are big to sync. Outlook PST, truecrypt, databases, phone backups, ...
Any update on this?

@jmstacey

This comment has been minimized.

Show comment
Hide comment
@jmstacey

jmstacey Jul 14, 2013

I'd like to see this as well.

I'd like to see this as well.

@fbeauvais

This comment has been minimized.

Show comment
Hide comment
@fbeauvais

fbeauvais Jul 22, 2013

Waiting on this to install my first owncloud server.

Waiting on this to install my first owncloud server.

@notDavid

This comment has been minimized.

Show comment
Hide comment
@notDavid

notDavid Aug 3, 2013

I hope this will be implemented, seems essential..

  • for some reason my ownCloud client (1.3.0, OS X) has trouble syncing big files with small changes, and always re-uploads entire files...

notDavid commented Aug 3, 2013

I hope this will be implemented, seems essential..

  • for some reason my ownCloud client (1.3.0, OS X) has trouble syncing big files with small changes, and always re-uploads entire files...
@tityrus

This comment has been minimized.

Show comment
Hide comment
@tityrus

tityrus Aug 4, 2013

I also see this as essential. Please.

tityrus commented Aug 4, 2013

I also see this as essential. Please.

@xeloader

This comment has been minimized.

Show comment
Hide comment
@xeloader

xeloader Aug 14, 2013

I agree on this one as well, we need to be able to have some kind of incremental sync possibility before this is 100% usable.

I agree on this one as well, we need to be able to have some kind of incremental sync possibility before this is 100% usable.

@dragotin

This comment has been minimized.

Show comment
Hide comment
@dragotin

dragotin Aug 14, 2013

Contributor

The problem is known, and we will get to it. At one day.

Until then, please try to retain from +1 comments ✌️

Contributor

dragotin commented Aug 14, 2013

The problem is known, and we will get to it. At one day.

Until then, please try to retain from +1 comments ✌️

@Poelziminator

This comment has been minimized.

Show comment
Hide comment
@Poelziminator

Poelziminator Sep 28, 2013

Can you give an estimate when will see the “one day”?
This functionality would open up whole new possibilities like syncing Truecrypt-volume files or Lightroom catalogs.

Can you give an estimate when will see the “one day”?
This functionality would open up whole new possibilities like syncing Truecrypt-volume files or Lightroom catalogs.

@danimo

This comment has been minimized.

Show comment
Hide comment
@danimo

danimo Sep 28, 2013

Contributor

@Poelziminator Not in any release this year. Note that most work for this will be in the server and the general design.

Contributor

danimo commented Sep 28, 2013

@Poelziminator Not in any release this year. Note that most work for this will be in the server and the general design.

@zwo-bot

This comment has been minimized.

Show comment
Hide comment
@zwo-bot

zwo-bot Oct 29, 2013

This is the key-feature for everone works with trucrypt or bigger dbfiles. Please make it happen.

zwo-bot commented Oct 29, 2013

This is the key-feature for everone works with trucrypt or bigger dbfiles. Please make it happen.

@jancborchardt

This comment has been minimized.

Show comment
Hide comment
@jancborchardt

jancborchardt Jan 5, 2014

Member

This as well seems related to the 1.6 »Sync Performance« milestone. @MTRichards @dragotin @danimo?

Member

jancborchardt commented Jan 5, 2014

This as well seems related to the 1.6 »Sync Performance« milestone. @MTRichards @dragotin @danimo?

@MTRichards

This comment has been minimized.

Show comment
Hide comment
@MTRichards

MTRichards Jan 6, 2014

This is delta file syncing, and likely too complicated to get into 1.6 because it requires major server work too. The idea is to first improve performance on file level sync to get it more efficient, and then increase the granularity of the file comparisons (to file chunks), but going right to file chunks without first getting the file level sync comparisons would hurt performance more than help at this point because of the sheer volume of comparisons required.

This is delta file syncing, and likely too complicated to get into 1.6 because it requires major server work too. The idea is to first improve performance on file level sync to get it more efficient, and then increase the granularity of the file comparisons (to file chunks), but going right to file chunks without first getting the file level sync comparisons would hurt performance more than help at this point because of the sheer volume of comparisons required.

@curtisz

This comment has been minimized.

Show comment
Hide comment
@curtisz

curtisz Jan 27, 2014

I was about to choose ownCloud (with a very high probability of buying Enterprise) for use at our company but decided against it because of this particular issue. I'm a bit shocked that this isn't supported, considering how important this is for (potential) customers using TrueCrypt et al. This isn't so much of a +1 as an "at least one enterprise customer lost to the other guys".

curtisz commented Jan 27, 2014

I was about to choose ownCloud (with a very high probability of buying Enterprise) for use at our company but decided against it because of this particular issue. I'm a bit shocked that this isn't supported, considering how important this is for (potential) customers using TrueCrypt et al. This isn't so much of a +1 as an "at least one enterprise customer lost to the other guys".

@huksley

This comment has been minimized.

Show comment
Hide comment
@huksley

huksley Mar 5, 2014

This is very important for virtual machine disk image files. Single byte change causes upload of multi-gigabyte files to remote server.

huksley commented Mar 5, 2014

This is very important for virtual machine disk image files. Single byte change causes upload of multi-gigabyte files to remote server.

@beniroquai

This comment has been minimized.

Show comment
Hide comment
@beniroquai

beniroquai Mar 11, 2014

Yes! Great idea!

Yes! Great idea!

@powerpaul17

This comment has been minimized.

Show comment
Hide comment
@powerpaul17

powerpaul17 Mar 11, 2014

As this is really the most important feature of a cloud service and it seems that nobody is interested in or working on it, I would like to offer some help with this issue. Is there already some information about what needs to be done, where to start, etc.?

As this is really the most important feature of a cloud service and it seems that nobody is interested in or working on it, I would like to offer some help with this issue. Is there already some information about what needs to be done, where to start, etc.?

@sagar-srivastava

This comment has been minimized.

Show comment
Hide comment
@sagar-srivastava

sagar-srivastava Mar 24, 2014

I agree this is a very important feature. Owncloud is such a wonderful piece of software. I tested it today and found its quality up to the mark. Just Delta file sync addition will make it complete. Web interface/WebDav/Desktopsync/file sharing/ has worked out great on my VPS, works with ISPConfig3 implementation.

please initiate this effort and I am willing to buy the enterprise edition for my company.

I agree this is a very important feature. Owncloud is such a wonderful piece of software. I tested it today and found its quality up to the mark. Just Delta file sync addition will make it complete. Web interface/WebDav/Desktopsync/file sharing/ has worked out great on my VPS, works with ISPConfig3 implementation.

please initiate this effort and I am willing to buy the enterprise edition for my company.

@sodetemplin

This comment has been minimized.

Show comment
Hide comment
@sodetemplin

sodetemplin Mar 31, 2014

This point should be an ownCloud priority, for sure ! Without this functionnality, the EE version is "just" the community one with more support ?

This point should be an ownCloud priority, for sure ! Without this functionnality, the EE version is "just" the community one with more support ?

@kevincox

This comment has been minimized.

Show comment
Hide comment
@kevincox

kevincox Mar 31, 2014

I remember hearing that owncloud wanted to keep the files stored on disk in their entirety. Is this (still) true? Because if so you could just generate a zsync signature file and a custom receiver that generated the entire file.

If the files are allowed to be broken up (maybe in a future version) then they can be chunked and a very efficient sync endpoint could be made.

What are the thoughts on this. I may consider working on this is my spare time.

I remember hearing that owncloud wanted to keep the files stored on disk in their entirety. Is this (still) true? Because if so you could just generate a zsync signature file and a custom receiver that generated the entire file.

If the files are allowed to be broken up (maybe in a future version) then they can be chunked and a very efficient sync endpoint could be made.

What are the thoughts on this. I may consider working on this is my spare time.

@jancborchardt

This comment has been minimized.

Show comment
Hide comment
@jancborchardt

jancborchardt Apr 11, 2014

Member

@dragotin @danimo @MTRichards what’s the plan on this one? I know it’s a big one, but it’s requested very often and seems to be important to improve performance.

Member

jancborchardt commented Apr 11, 2014

@dragotin @danimo @MTRichards what’s the plan on this one? I know it’s a big one, but it’s requested very often and seems to be important to improve performance.

@danimo

This comment has been minimized.

Show comment
Hide comment
@danimo

danimo Apr 12, 2014

Contributor

@jancborchardt As long as the server doesn't offer any delta syncs, we can't implement it.

Contributor

danimo commented Apr 12, 2014

@jancborchardt As long as the server doesn't offer any delta syncs, we can't implement it.

@L0j1k

This comment has been minimized.

Show comment
Hide comment
@L0j1k

L0j1k Apr 12, 2014

@danimo Can you explain what you mean in a little more detail, please?

L0j1k commented Apr 12, 2014

@danimo Can you explain what you mean in a little more detail, please?

@danimo

This comment has been minimized.

Show comment
Hide comment
@danimo

danimo Apr 12, 2014

Contributor

@LoZio Currently, we use WebDAV as the communication protocol with the server. Additionally, we can upload in chunks, but only if we transfer the entire file. Delta-sync requires another protocol extension. Also, we have no gurantee that the server is holding the hash-wise same file, since the server does not store file hashes.

Contributor

danimo commented Apr 12, 2014

@LoZio Currently, we use WebDAV as the communication protocol with the server. Additionally, we can upload in chunks, but only if we transfer the entire file. Delta-sync requires another protocol extension. Also, we have no gurantee that the server is holding the hash-wise same file, since the server does not store file hashes.

@jancborchardt

This comment has been minimized.

Show comment
Hide comment
@jancborchardt

jancborchardt Apr 12, 2014

Member

So what’s the plan with the server-side regarding this one? @karlitschek @DeepDiver1975 @PVince81

Member

jancborchardt commented Apr 12, 2014

So what’s the plan with the server-side regarding this one? @karlitschek @DeepDiver1975 @PVince81

@karlitschek

This comment has been minimized.

Show comment
Hide comment
@karlitschek

karlitschek Apr 12, 2014

Member

We should do this in the future. But it is more a long term feature

Member

karlitschek commented Apr 12, 2014

We should do this in the future. But it is more a long term feature

@menelic

This comment has been minimized.

Show comment
Hide comment
@menelic

menelic Apr 14, 2014

@jospoortvliet @DeepDiver1975 has anyone followed up with @powerpaul17 s offer to help with implementation? Just asking because this is indeed a very important feature - would be great if it wouldn't be too "long-term"

menelic commented Apr 14, 2014

@jospoortvliet @DeepDiver1975 has anyone followed up with @powerpaul17 s offer to help with implementation? Just asking because this is indeed a very important feature - would be great if it wouldn't be too "long-term"

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Apr 21, 2014

The lack of this feature is deal-breaking for me :( It may be suitable over LAN, but resyncing the entire file with offsite servers is no-go. I understand it requires server support, so who is the best person to contact about that? Is there a protocol extension proposal?

ghost commented Apr 21, 2014

The lack of this feature is deal-breaking for me :( It may be suitable over LAN, but resyncing the entire file with offsite servers is no-go. I understand it requires server support, so who is the best person to contact about that? Is there a protocol extension proposal?

@powerpaul17

This comment has been minimized.

Show comment
Hide comment
@powerpaul17

powerpaul17 Apr 23, 2014

@menelic Nobody has come back to me, but it seems that the first thing to do is implement the necessary features in the server. I tried looking at the sources but haven't quite found out where to tie in.

@menelic Nobody has come back to me, but it seems that the first thing to do is implement the necessary features in the server. I tried looking at the sources but haven't quite found out where to tie in.

@jancborchardt

This comment has been minimized.

Show comment
Hide comment
@jancborchardt

jancborchardt Apr 24, 2014

Member

@powerpaul17 and anyone who is willing to help, please also join our IRC channel at #owncloud-dev, as well as our developer mailing list. There you can ask questions if you need help.

Thanks!

Member

jancborchardt commented Apr 24, 2014

@powerpaul17 and anyone who is willing to help, please also join our IRC channel at #owncloud-dev, as well as our developer mailing list. There you can ask questions if you need help.

Thanks!

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Apr 28, 2014

Member

@powerpaul17 the thing is that WebDAV is used for downloading and uploading files.
So somehow the connector might need to be extended (check out lib/private/connector/sabre) to support requesting/sending partial files.

But the more important question is first to find out how to diff files (xdelta?) and how the server/client can store an older version of the file somewhere to be able to create that diff in the first place, considering that there might be conflicts.

Member

PVince81 commented Apr 28, 2014

@powerpaul17 the thing is that WebDAV is used for downloading and uploading files.
So somehow the connector might need to be extended (check out lib/private/connector/sabre) to support requesting/sending partial files.

But the more important question is first to find out how to diff files (xdelta?) and how the server/client can store an older version of the file somewhere to be able to create that diff in the first place, considering that there might be conflicts.

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Apr 28, 2014

@PVince81 I wouldn't diff the files, because that would require keeping an entire copy of every sync directory to diff against. Instead, I would do something like keep a SHA3 hash of each 1MiB block in each file, as well as the SHA3 hash of the entire file (both as seen on the server). For files less than 1MiB, just sync the entire file whenever the file no longer matches the hash. Anything over 1MiB, only sync the blocks that no longer match the hash. Let the 1 MiB threshold be configurable.

ghost commented Apr 28, 2014

@PVince81 I wouldn't diff the files, because that would require keeping an entire copy of every sync directory to diff against. Instead, I would do something like keep a SHA3 hash of each 1MiB block in each file, as well as the SHA3 hash of the entire file (both as seen on the server). For files less than 1MiB, just sync the entire file whenever the file no longer matches the hash. Anything over 1MiB, only sync the blocks that no longer match the hash. Let the 1 MiB threshold be configurable.

@kevincox

This comment has been minimized.

Show comment
Hide comment
@kevincox

kevincox Apr 28, 2014

I agree. zsync is a simple solution it is essentially a different way to use the rsync algroithm. Basically the server keeps a static signature file (as @DarthAndroid suggested) and the client downloads it to figure out what it needs to download/upload.

Even if zsync is not used directly it is a nice approach to look at as there is very little logic on the server (just calculate a new signature file every time the file changes) and no "history" needs to be kept on either side. While the delta is not ideal (as something like xdelta would get closer to) that is the only major downside.

I agree. zsync is a simple solution it is essentially a different way to use the rsync algroithm. Basically the server keeps a static signature file (as @DarthAndroid suggested) and the client downloads it to figure out what it needs to download/upload.

Even if zsync is not used directly it is a nice approach to look at as there is very little logic on the server (just calculate a new signature file every time the file changes) and no "history" needs to be kept on either side. While the delta is not ideal (as something like xdelta would get closer to) that is the only major downside.

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Apr 28, 2014

Member

@DarthAndroid I thought about your idea while in the subway and came to the following questions/issues:

  1. Would your approach work well with file size changes (basically, new chunks that might be inserted/deleted) ?

  2. You should be aware that some files might be stored on external storage on the server (ex: SMB server) and recomputing the hashes of that file in case it changed remotely (not through ownCloud) might need to recompute the hash of all chunks, which might be expensive and need redownloading the whole file to a temporary store. (smbclient doesn't support partial file download)

  3. What file formats only have parts of them changing and would benefit from this approach ? A few formats that come to might are TXT files, WAV, PNG, TIFF files. Other compressed files like JPG, MP3, OGG, AVI, ZIP, RAR, ODC, DOCX (zip file) etc will mostly likely change completely when working on them.

My concern with this is that it might introduce a very high level of complexity and maintenance costs where the benefit might not be that big (ratio between complexity/time to invest and overall benefit)

@kevincox are you talking about a full file signature like a MD5/SHA hash ? And when that one changes sync the whole file ? Note that the sync client already uses etags (similar hashes but not based on content) to detect changes. Just wanted to clarify 😄

I did some experiments in the past with xdelta (binary diff) for another project and noticed that diffing two ZIP files produced a patch that was almost the same size as the ZIP file itself. I had to first extract the ZIP file then do a xdelta on every file and even extract the JAR files inside (there were JAR files inside the ZIP) to have even better compression. Only then the patch file was much smaller. But doing this would require to have the compressing/decompressing logic on both sides. Complexity seems to also be quite high.

Member

PVince81 commented Apr 28, 2014

@DarthAndroid I thought about your idea while in the subway and came to the following questions/issues:

  1. Would your approach work well with file size changes (basically, new chunks that might be inserted/deleted) ?

  2. You should be aware that some files might be stored on external storage on the server (ex: SMB server) and recomputing the hashes of that file in case it changed remotely (not through ownCloud) might need to recompute the hash of all chunks, which might be expensive and need redownloading the whole file to a temporary store. (smbclient doesn't support partial file download)

  3. What file formats only have parts of them changing and would benefit from this approach ? A few formats that come to might are TXT files, WAV, PNG, TIFF files. Other compressed files like JPG, MP3, OGG, AVI, ZIP, RAR, ODC, DOCX (zip file) etc will mostly likely change completely when working on them.

My concern with this is that it might introduce a very high level of complexity and maintenance costs where the benefit might not be that big (ratio between complexity/time to invest and overall benefit)

@kevincox are you talking about a full file signature like a MD5/SHA hash ? And when that one changes sync the whole file ? Note that the sync client already uses etags (similar hashes but not based on content) to detect changes. Just wanted to clarify 😄

I did some experiments in the past with xdelta (binary diff) for another project and noticed that diffing two ZIP files produced a patch that was almost the same size as the ZIP file itself. I had to first extract the ZIP file then do a xdelta on every file and even extract the JAR files inside (there were JAR files inside the ZIP) to have even better compression. Only then the patch file was much smaller. But doing this would require to have the compressing/decompressing logic on both sides. Complexity seems to also be quite high.

@kevincox

This comment has been minimized.

Show comment
Hide comment
@kevincox

kevincox Apr 28, 2014

@PVince81

No, I don't claim to be an expert but check the link for more details. Essentially the signature file has a list of hashes for each block in the file. The you download that file and figure out what blocks the server already has (using a rolling checksum algorithm a la rsync).

As I said I'm not an expert but if you want I can explain what I understand in more detail.

@PVince81

No, I don't claim to be an expert but check the link for more details. Essentially the signature file has a list of hashes for each block in the file. The you download that file and figure out what blocks the server already has (using a rolling checksum algorithm a la rsync).

As I said I'm not an expert but if you want I can explain what I understand in more detail.

@kevincox

This comment has been minimized.

Show comment
Hide comment
@kevincox

kevincox Apr 28, 2014

@PVince81

Regarding @DarthAndroid's idea.

  1. You probably wont want to hash/checksum fixed sized blocks because adding or removing one byte to the beginning would force a full re-upload. Instead you would use some sort of hash function to find chunks that are approximately a given size, but adding or removing one byte will only affect one block.

  2. This approach would probably be best if the files weren't stored as plain files on the server (this is why I as asking before) so the server would be the master of the file and modifications would have to be done through it. This isn't necessary but otherwise I don't see how to get around this problem.

  3. This is a good point. And this approach is the "all in" method and might not be worth it.

About xdelta. I believe that it decompresses files before generating the delta (then compresses the delta) to help avoid this problem. But there are many formats that are not designed to keep similar data residing in a similar file. The general solution is "Don't put a compressed jar in a compressed zip" but for an end user solution you can't expect them do do the "sane" thing and should try to handle this situations as gracefully as possible.

@PVince81

Regarding @DarthAndroid's idea.

  1. You probably wont want to hash/checksum fixed sized blocks because adding or removing one byte to the beginning would force a full re-upload. Instead you would use some sort of hash function to find chunks that are approximately a given size, but adding or removing one byte will only affect one block.

  2. This approach would probably be best if the files weren't stored as plain files on the server (this is why I as asking before) so the server would be the master of the file and modifications would have to be done through it. This isn't necessary but otherwise I don't see how to get around this problem.

  3. This is a good point. And this approach is the "all in" method and might not be worth it.

About xdelta. I believe that it decompresses files before generating the delta (then compresses the delta) to help avoid this problem. But there are many formats that are not designed to keep similar data residing in a similar file. The general solution is "Don't put a compressed jar in a compressed zip" but for an end user solution you can't expect them do do the "sane" thing and should try to handle this situations as gracefully as possible.

ahmedammar added a commit to ahmedammar/client that referenced this issue Nov 25, 2017

Implementation of delta-sync support on client-side.
This commit adds client-side support for delta-sync, this adds a new 3rdparty submodule gh:ahmedammar/zsync. This zsync
tree is a modified version of upstream, adding some needed support for the upload path and other requirements.

If the server does not announce the required zsync capability then a full upload/download is fallen back to. Delta
synchronization can be enabled/disabled using command line, config, or gui options.

On both upload and download paths, a check is made for the existance of a zsync metadata file on the server for a given
path. This is provided by a dav property called `zsync`, found during discovery phase. If it doesn't exist the code
reverts back to a complete upload or download, i.e. previous implementations. In the case of upload, a new zsync
metadata file will be uploaded as part of the chunked upload and future synchronizations will be delta-sync capable.

Chunked uploads no longer use sequential file names for each chunk id, instead, they are named as the byte offset into
the remote file, this is a minimally intrusive modification to allow fo delta-sync and legacy code paths to run
seamlessly. A new http header OC-Total-File-Length is sent, which informs the server of the final expected size of
the file not just the total transmitted bytes as reported by OC-Total-Length.

The seeding of the zsync metadata file is done in a seperate thread since this is a cpu intensive task, ensuring main
thread is not blocked.

This commit closes owncloud/client#179.

ahmedammar added a commit to ahmedammar/client that referenced this issue Nov 28, 2017

Implementation of delta-sync support on client-side.
This commit adds client-side support for delta-sync, this adds a new
3rdparty submodule gh:ahmedammar/zsync. This zsync tree is a modified
version of upstream, adding some needed support for the upload path and
other requirements.

If the server does not announce the required zsync capability then a
full upload/download is fallen back to. Delta synchronization can be
enabled/disabled using command line, config, or gui options.

On both upload and download paths, a check is made for the existance of
a zsync metadata file on the server for a given path. This is provided
by a dav property called `zsync`, found during discovery phase. If it
doesn't exist the code reverts back to a complete upload or download,
i.e. previous implementations. In the case of upload, a new zsync
metadata file will be uploaded as part of the chunked upload and future
synchronizations will be delta-sync capable.

Chunked uploads no longer use sequential file names for each chunk id,
instead, they are named as the byte offset into the remote file, this is
a minimally intrusive modification to allow fo delta-sync and legacy
code paths to run seamlessly. A new http header OC-Total-File-Length is
sent, which informs the server of the final expected size of the file
not just the total transmitted bytes as reported by OC-Total-Length.

The seeding and generation of the zsync metadata file is done in a
seperate thread since this is a cpu intensive task, ensuring main thread
is not blocked.

This commit closes owncloud/client#179.

@ogoffart ogoffart added this to the 2.5.0 milestone Dec 6, 2017

@ogoffart ogoffart added gold-ticket and removed gold-ticket labels Dec 6, 2017

ahmedammar added a commit to ahmedammar/client that referenced this issue Dec 16, 2017

Implementation of delta-sync support on client-side.
This commit adds client-side support for delta-sync, this adds a new
3rdparty submodule gh:ahmedammar/zsync. This zsync tree is a modified
version of upstream, adding some needed support for the upload path and
other requirements.

If the server does not announce the required zsync capability then a
full upload/download is fallen back to. Delta synchronization can be
enabled/disabled using command line, config, or gui options.

On both upload and download paths, a check is made for the existance of
a zsync metadata file on the server for a given path. This is provided
by a dav property called `zsync`, found during discovery phase. If it
doesn't exist the code reverts back to a complete upload or download,
i.e. previous implementations. In the case of upload, a new zsync
metadata file will be uploaded as part of the chunked upload and future
synchronizations will be delta-sync capable.

Chunked uploads no longer use sequential file names for each chunk id,
instead, they are named as the byte offset into the remote file, this is
a minimally intrusive modification to allow fo delta-sync and legacy
code paths to run seamlessly. A new http header OC-Total-File-Length is
sent, which informs the server of the final expected size of the file
not just the total transmitted bytes as reported by OC-Total-Length.

The seeding and generation of the zsync metadata file is done in a
seperate thread since this is a cpu intensive task, ensuring main thread
is not blocked.

This commit closes owncloud/client#179.

ahmedammar added a commit to ahmedammar/client that referenced this issue Dec 17, 2017

Implementation of delta-sync support on client-side.
This commit adds client-side support for delta-sync, this adds a new
3rdparty submodule gh:ahmedammar/zsync. This zsync tree is a modified
version of upstream, adding some needed support for the upload path and
other requirements.

If the server does not announce the required zsync capability then a
full upload/download is fallen back to. Delta synchronization can be
enabled/disabled using command line, config, or gui options.

On both upload and download paths, a check is made for the existance of
a zsync metadata file on the server for a given path. This is provided
by a dav property called `zsync`, found during discovery phase. If it
doesn't exist the code reverts back to a complete upload or download,
i.e. previous implementations. In the case of upload, a new zsync
metadata file will be uploaded as part of the chunked upload and future
synchronizations will be delta-sync capable.

Chunked uploads no longer use sequential file names for each chunk id,
instead, they are named as the byte offset into the remote file, this is
a minimally intrusive modification to allow fo delta-sync and legacy
code paths to run seamlessly. A new http header OC-Total-File-Length is
sent, which informs the server of the final expected size of the file
not just the total transmitted bytes as reported by OC-Total-Length.

The seeding and generation of the zsync metadata file is done in a
seperate thread since this is a cpu intensive task, ensuring main thread
is not blocked.

This commit closes owncloud/client#179.

ahmedammar added a commit to ahmedammar/client that referenced this issue Dec 17, 2017

Implementation of delta-sync support on client-side.
This commit adds client-side support for delta-sync, this adds a new
3rdparty submodule gh:ahmedammar/zsync. This zsync tree is a modified
version of upstream, adding some needed support for the upload path and
other requirements.

If the server does not announce the required zsync capability then a
full upload/download is fallen back to. Delta synchronization can be
enabled/disabled using command line, config, or gui options.

On both upload and download paths, a check is made for the existance of
a zsync metadata file on the server for a given path. This is provided
by a dav property called `zsync`, found during discovery phase. If it
doesn't exist the code reverts back to a complete upload or download,
i.e. previous implementations. In the case of upload, a new zsync
metadata file will be uploaded as part of the chunked upload and future
synchronizations will be delta-sync capable.

Chunked uploads no longer use sequential file names for each chunk id,
instead, they are named as the byte offset into the remote file, this is
a minimally intrusive modification to allow fo delta-sync and legacy
code paths to run seamlessly. A new http header OC-Total-File-Length is
sent, which informs the server of the final expected size of the file
not just the total transmitted bytes as reported by OC-Total-Length.

The seeding and generation of the zsync metadata file is done in a
seperate thread since this is a cpu intensive task, ensuring main thread
is not blocked.

This commit closes owncloud/client#179.

ahmedammar added a commit to ahmedammar/client that referenced this issue Dec 18, 2017

Implementation of delta-sync support on client-side.
This commit adds client-side support for delta-sync, this adds a new
3rdparty submodule gh:ahmedammar/zsync. This zsync tree is a modified
version of upstream, adding some needed support for the upload path and
other requirements.

If the server does not announce the required zsync capability then a
full upload/download is fallen back to. Delta synchronization can be
enabled/disabled using command line, config, or gui options.

On both upload and download paths, a check is made for the existance of
a zsync metadata file on the server for a given path. This is provided
by a dav property called `zsync`, found during discovery phase. If it
doesn't exist the code reverts back to a complete upload or download,
i.e. previous implementations. In the case of upload, a new zsync
metadata file will be uploaded as part of the chunked upload and future
synchronizations will be delta-sync capable.

Chunked uploads no longer use sequential file names for each chunk id,
instead, they are named as the byte offset into the remote file, this is
a minimally intrusive modification to allow fo delta-sync and legacy
code paths to run seamlessly. A new http header OC-Total-File-Length is
sent, which informs the server of the final expected size of the file
not just the total transmitted bytes as reported by OC-Total-Length.

The seeding and generation of the zsync metadata file is done in a
seperate thread since this is a cpu intensive task, ensuring main thread
is not blocked.

This commit closes owncloud/client#179.

ahmedammar added a commit to ahmedammar/client that referenced this issue Dec 18, 2017

Implementation of delta-sync support on client-side.
This commit adds client-side support for delta-sync, this adds a new
3rdparty submodule gh:ahmedammar/zsync. This zsync tree is a modified
version of upstream, adding some needed support for the upload path and
other requirements.

If the server does not announce the required zsync capability then a
full upload/download is fallen back to. Delta synchronization can be
enabled/disabled using command line, config, or gui options.

On both upload and download paths, a check is made for the existance of
a zsync metadata file on the server for a given path. This is provided
by a dav property called `zsync`, found during discovery phase. If it
doesn't exist the code reverts back to a complete upload or download,
i.e. previous implementations. In the case of upload, a new zsync
metadata file will be uploaded as part of the chunked upload and future
synchronizations will be delta-sync capable.

Chunked uploads no longer use sequential file names for each chunk id,
instead, they are named as the byte offset into the remote file, this is
a minimally intrusive modification to allow fo delta-sync and legacy
code paths to run seamlessly. A new http header OC-Total-File-Length is
sent, which informs the server of the final expected size of the file
not just the total transmitted bytes as reported by OC-Total-Length.

The seeding and generation of the zsync metadata file is done in a
seperate thread since this is a cpu intensive task, ensuring main thread
is not blocked.

This commit closes owncloud/client#179.

ahmedammar added a commit to ahmedammar/client that referenced this issue Jan 11, 2018

Implementation of delta-sync support on client-side.
This commit adds client-side support for delta-sync, this adds a new
3rdparty submodule gh:ahmedammar/zsync. This zsync tree is a modified
version of upstream, adding some needed support for the upload path and
other requirements.

If the server does not announce the required zsync capability then a
full upload/download is fallen back to. Delta synchronization can be
enabled/disabled using command line, config, or gui options.

On both upload and download paths, a check is made for the existance of
a zsync metadata file on the server for a given path. This is provided
by a dav property called `zsync`, found during discovery phase. If it
doesn't exist the code reverts back to a complete upload or download,
i.e. previous implementations. In the case of upload, a new zsync
metadata file will be uploaded as part of the chunked upload and future
synchronizations will be delta-sync capable.

Chunked uploads no longer use sequential file names for each chunk id,
instead, they are named as the byte offset into the remote file, this is
a minimally intrusive modification to allow fo delta-sync and legacy
code paths to run seamlessly. A new http header OC-Total-File-Length is
sent, which informs the server of the final expected size of the file
not just the total transmitted bytes as reported by OC-Total-Length.

The seeding and generation of the zsync metadata file is done in a
seperate thread since this is a cpu intensive task, ensuring main thread
is not blocked.

This commit closes owncloud/client#179.

ahmedammar added a commit to ahmedammar/client that referenced this issue Jan 11, 2018

Implementation of delta-sync support on client-side.
This commit adds client-side support for delta-sync, this adds a new
3rdparty submodule gh:ahmedammar/zsync. This zsync tree is a modified
version of upstream, adding some needed support for the upload path and
other requirements.

If the server does not announce the required zsync capability then a
full upload/download is fallen back to. Delta synchronization can be
enabled/disabled using command line, config, or gui options.

On both upload and download paths, a check is made for the existance of
a zsync metadata file on the server for a given path. This is provided
by a dav property called `zsync`, found during discovery phase. If it
doesn't exist the code reverts back to a complete upload or download,
i.e. previous implementations. In the case of upload, a new zsync
metadata file will be uploaded as part of the chunked upload and future
synchronizations will be delta-sync capable.

Chunked uploads no longer use sequential file names for each chunk id,
instead, they are named as the byte offset into the remote file, this is
a minimally intrusive modification to allow fo delta-sync and legacy
code paths to run seamlessly. A new http header OC-Total-File-Length is
sent, which informs the server of the final expected size of the file
not just the total transmitted bytes as reported by OC-Total-Length.

The seeding and generation of the zsync metadata file is done in a
separate thread since this is a cpu intensive task, ensuring main thread
is not blocked.

This commit closes owncloud/client#179.

ahmedammar added a commit to ahmedammar/client that referenced this issue Jan 11, 2018

Implementation of delta-sync support on client-side.
This commit adds client-side support for delta-sync, this adds a new
3rdparty submodule `gh:ahmedammar/zsync`. This zsync tree is a modified
version of upstream, adding some needed support for the upload path and
other requirements.

If the server does not announce the required zsync capability then a
full upload/download is fallen back to. Delta synchronization can be
enabled/disabled using command line, config, or gui options.

On both upload and download paths, a check is made for the existance of
a zsync metadata file on the server for a given path. This is provided
by a dav property called `zsync`, found during discovery phase. If it
doesn't exist the code reverts back to a complete upload or download,
i.e. previous implementations. In the case of upload, a new zsync
metadata file will be uploaded as part of the chunked upload and future
synchronizations will be delta-sync capable.

Chunked uploads no longer use sequential file names for each chunk id,
instead, they are named as the byte offset into the remote file, this is
a minimally intrusive modification to allow fo delta-sync and legacy
code paths to run seamlessly. A new http header OC-Total-File-Length is
sent, which informs the server of the final expected size of the file
not just the total transmitted bytes as reported by OC-Total-Length.

The seeding and generation of the zsync metadata file is done in a
separate thread since this is a cpu intensive task, ensuring main thread
is not blocked.

This commit closes owncloud/client#179.

ahmedammar added a commit to ahmedammar/client that referenced this issue Jan 12, 2018

Implementation of delta-sync support on client-side.
This commit adds client-side support for delta-sync, this adds a new
3rdparty submodule `gh:ahmedammar/zsync`. This zsync tree is a modified
version of upstream, adding some needed support for the upload path and
other requirements.

If the server does not announce the required zsync capability then a
full upload/download is fallen back to. Delta synchronization can be
enabled/disabled using command line, config, or gui options.

On both upload and download paths, a check is made for the existance of
a zsync metadata file on the server for a given path. This is provided
by a dav property called `zsync`, found during discovery phase. If it
doesn't exist the code reverts back to a complete upload or download,
i.e. previous implementations. In the case of upload, a new zsync
metadata file will be uploaded as part of the chunked upload and future
synchronizations will be delta-sync capable.

Chunked uploads no longer use sequential file names for each chunk id,
instead, they are named as the byte offset into the remote file, this is
a minimally intrusive modification to allow fo delta-sync and legacy
code paths to run seamlessly. A new http header OC-Total-File-Length is
sent, which informs the server of the final expected size of the file
not just the total transmitted bytes as reported by OC-Total-Length.

The seeding and generation of the zsync metadata file is done in a
separate thread since this is a cpu intensive task, ensuring main thread
is not blocked.

This commit closes owncloud/client#179.

ahmedammar added a commit to ahmedammar/client that referenced this issue Jan 14, 2018

Implementation of delta-sync support on client-side.
This commit adds client-side support for delta-sync, this adds a new
3rdparty submodule `gh:ahmedammar/zsync`. This zsync tree is a modified
version of upstream, adding some needed support for the upload path and
other requirements.

If the server does not announce the required zsync capability then a
full upload/download is fallen back to. Delta synchronization can be
enabled/disabled using command line, config, or gui options.

On both upload and download paths, a check is made for the existance of
a zsync metadata file on the server for a given path. This is provided
by a dav property called `zsync`, found during discovery phase. If it
doesn't exist the code reverts back to a complete upload or download,
i.e. previous implementations. In the case of upload, a new zsync
metadata file will be uploaded as part of the chunked upload and future
synchronizations will be delta-sync capable.

Chunked uploads no longer use sequential file names for each chunk id,
instead, they are named as the byte offset into the remote file, this is
a minimally intrusive modification to allow fo delta-sync and legacy
code paths to run seamlessly. A new http header OC-Total-File-Length is
sent, which informs the server of the final expected size of the file
not just the total transmitted bytes as reported by OC-Total-Length.

The seeding and generation of the zsync metadata file is done in a
separate thread since this is a cpu intensive task, ensuring main thread
is not blocked.

This commit closes owncloud/client#179.

ahmedammar added a commit to ahmedammar/client that referenced this issue Jan 15, 2018

Implementation of delta-sync support on client-side.
This commit adds client-side support for delta-sync, this adds a new
3rdparty submodule `gh:ahmedammar/zsync`. This zsync tree is a modified
version of upstream, adding some needed support for the upload path and
other requirements.

If the server does not announce the required zsync capability then a
full upload/download is fallen back to. Delta synchronization can be
enabled/disabled using command line, config, or gui options.

On both upload and download paths, a check is made for the existance of
a zsync metadata file on the server for a given path. This is provided
by a dav property called `zsync`, found during discovery phase. If it
doesn't exist the code reverts back to a complete upload or download,
i.e. previous implementations. In the case of upload, a new zsync
metadata file will be uploaded as part of the chunked upload and future
synchronizations will be delta-sync capable.

Chunked uploads no longer use sequential file names for each chunk id,
instead, they are named as the byte offset into the remote file, this is
a minimally intrusive modification to allow fo delta-sync and legacy
code paths to run seamlessly. A new http header OC-Total-File-Length is
sent, which informs the server of the final expected size of the file
not just the total transmitted bytes as reported by OC-Total-Length.

The seeding and generation of the zsync metadata file is done in a
separate thread since this is a cpu intensive task, ensuring main thread
is not blocked.

This commit closes owncloud/client#179.

ahmedammar added a commit to ahmedammar/client that referenced this issue Jan 15, 2018

Implementation of delta-sync support on client-side.
This commit adds client-side support for delta-sync, this adds a new
3rdparty submodule `gh:ahmedammar/zsync`. This zsync tree is a modified
version of upstream, adding some needed support for the upload path and
other requirements.

If the server does not announce the required zsync capability then a
full upload/download is fallen back to. Delta synchronization can be
enabled/disabled using command line, config, or gui options.

On both upload and download paths, a check is made for the existance of
a zsync metadata file on the server for a given path. This is provided
by a dav property called `zsync`, found during discovery phase. If it
doesn't exist the code reverts back to a complete upload or download,
i.e. previous implementations. In the case of upload, a new zsync
metadata file will be uploaded as part of the chunked upload and future
synchronizations will be delta-sync capable.

Chunked uploads no longer use sequential file names for each chunk id,
instead, they are named as the byte offset into the remote file, this is
a minimally intrusive modification to allow fo delta-sync and legacy
code paths to run seamlessly. A new http header OC-Total-File-Length is
sent, which informs the server of the final expected size of the file
not just the total transmitted bytes as reported by OC-Total-Length.

The seeding and generation of the zsync metadata file is done in a
separate thread since this is a cpu intensive task, ensuring main thread
is not blocked.

This commit closes owncloud/client#179.
@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Feb 7, 2018

Member

Does anyone has an idea what file types delta sync will actually make sense ?

@ahmedammar does delta sync / zsync also detect shifts in a file ? For example if I take a wav file (no compression) and delete a chunk of audio in the middle, will it still detect this or will it resync the whole file ?

Member

PVince81 commented Feb 7, 2018

Does anyone has an idea what file types delta sync will actually make sense ?

@ahmedammar does delta sync / zsync also detect shifts in a file ? For example if I take a wav file (no compression) and delete a chunk of audio in the middle, will it still detect this or will it resync the whole file ?

@ahmedammar

This comment has been minimized.

Show comment
Hide comment
@ahmedammar

ahmedammar Feb 7, 2018

@DeepDiver1975

This comment has been minimized.

Show comment
Hide comment
@DeepDiver1975

DeepDiver1975 Feb 7, 2018

Member

Technically zsync could detect moves, but my code doesn’t support that, all bytes moved will be resent, this was to make it easier to get something working quickly. I mentioned this as a future work area.

in which area are the adoptions neccessary to make this work? I'm trying to find out the impact - if bigger refactorings are necessary I question the current in depth review ...

Member

DeepDiver1975 commented Feb 7, 2018

Technically zsync could detect moves, but my code doesn’t support that, all bytes moved will be resent, this was to make it easier to get something working quickly. I mentioned this as a future work area.

in which area are the adoptions neccessary to make this work? I'm trying to find out the impact - if bigger refactorings are necessary I question the current in depth review ...

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Feb 7, 2018

Member

does the current impl at least help if appending stuff at the end of files ? In this case I expect that since the beginning's offset did not change it would only sync the appended block.

Member

PVince81 commented Feb 7, 2018

does the current impl at least help if appending stuff at the end of files ? In this case I expect that since the beginning's offset did not change it would only sync the appended block.

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Feb 7, 2018

Member

did anyone here already test the feature ? if you did, please report where you saw improvements (file types, use case, etc)

Member

PVince81 commented Feb 7, 2018

did anyone here already test the feature ? if you did, please report where you saw improvements (file types, use case, etc)

@ahmedammar

This comment has been minimized.

Show comment
Hide comment
@ahmedammar

ahmedammar Feb 7, 2018

Appending will only send appended bytes. Regarding where the work needs to be done:

  1. zsync - code wasn’t designed with upload path in mind, I implemented that and use the path of least resistance, meaning dropping moved block support.
  2. oC - again to support moved blocks we’d need a more complex upload path where moved chunks are processed somehow, this can be as simple as sending a file with moved block to:from and modifying assembly code to handle appropriately.

I recommend we don’t try to do this now but after current simple approach is well tested.

Appending will only send appended bytes. Regarding where the work needs to be done:

  1. zsync - code wasn’t designed with upload path in mind, I implemented that and use the path of least resistance, meaning dropping moved block support.
  2. oC - again to support moved blocks we’d need a more complex upload path where moved chunks are processed somehow, this can be as simple as sending a file with moved block to:from and modifying assembly code to handle appropriately.

I recommend we don’t try to do this now but after current simple approach is well tested.

@ahmedammar

This comment has been minimized.

Show comment
Hide comment
@ahmedammar

ahmedammar Feb 24, 2018

Just wanted to clarify something, the above only applies to the uploader, the downloader will not redownload moved chunks.

Just wanted to clarify something, the above only applies to the uploader, the downloader will not redownload moved chunks.

@gpothier

This comment has been minimized.

Show comment
Hide comment
@gpothier

gpothier Apr 8, 2018

@PVince81 Regarding the file types that would benefit from delta sync: we are a publishing house and we work with rather big (up to 500MB) Adobe InDesign files. Small changes to these files are very good candidates for delta sync (Dropbox syncs small changes to these files almost instantly).

gpothier commented Apr 8, 2018

@PVince81 Regarding the file types that would benefit from delta sync: we are a publishing house and we work with rather big (up to 500MB) Adobe InDesign files. Small changes to these files are very good candidates for delta sync (Dropbox syncs small changes to these files almost instantly).

@ckamm ckamm modified the milestones: 2.5.0, 2.6.0 Apr 25, 2018

@ckamm

This comment has been minimized.

Show comment
Hide comment
@ckamm

ckamm Apr 25, 2018

Member

The feature is stabilizing in the delta-sync branch and test builds will become available around the time 2.5.0 is released. If things go smoothly it'll be in 2.6.0.

Member

ckamm commented Apr 25, 2018

The feature is stabilizing in the delta-sync branch and test builds will become available around the time 2.5.0 is released. If things go smoothly it'll be in 2.6.0.

@KevinLeigh

This comment has been minimized.

Show comment
Hide comment
@KevinLeigh

KevinLeigh Jun 10, 2018

Is this issue currently being resolved? Would like to start resolving this issue but do not want to if someone else is.

Is this issue currently being resolved? Would like to start resolving this issue but do not want to if someone else is.

@ahmedammar

This comment has been minimized.

Show comment
Hide comment
@ahmedammar

ahmedammar Jun 10, 2018

Yeah would be nice if this gets closed so I can claim the bounty?

Yeah would be nice if this gets closed so I can claim the bounty?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment