Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please add gzip/bzip compression for delta files in rdiff #8

Open
pavel-odintsov opened this issue Jul 31, 2014 · 10 comments
Open

Please add gzip/bzip compression for delta files in rdiff #8

pavel-odintsov opened this issue Jul 31, 2014 · 10 comments
Labels

Comments

@pavel-odintsov
Copy link

@pavel-odintsov pavel-odintsov commented Jul 31, 2014

Hello!

I tried to use flags --gzip/--bzip for rdiff but got error:

rdiff: ERROR: (rdiff_options) sorry, compression is not really implemented yet

For my data (VPS disks) compression provides really excellent compression for delta files:

source size: 4.6 Gb delta size: 2093.0 MB compressed size: 223.0
source size: 14.8 Gb delta size: 2205.0 MB compresses size: 998.7 MB

Thank you!

@pavel-odintsov
Copy link
Author

@pavel-odintsov pavel-odintsov commented Jul 31, 2014

I tried to compress signatures biut it's really useless:

du -sh /root/rdiff_signatures_25_june/
20M /root/rdiff_signatures_25_june/

tar -cpzf /root/rdiff_signatures_25_june.tar.gz /root/rdiff_signatures_25_june/
ls -alh /root/rdiff_signatures_25_june.tar.gz
-rw-r--r-- 1 root root 19M Авг  1 00:41 /root/rdiff_signatures_25_june.tar.gz

tar -cpjf /root/rdiff_signatures_25_june.tar.bz2 /root/rdiff_signatures_25_june/
ls -alh /root/rdiff_signatures_25_june.tar.bz2
-rw-r--r-- 1 root root 19M Авг  1 00:41 /root/rdiff_signatures_25_june.tar.bz2

But compression for deltas is really useful, please add it :)

@sourcefrog
Copy link
Contributor

@sourcefrog sourcefrog commented Aug 1, 2014

You can just pipe it into gzip.

@pavel-odintsov
Copy link
Author

@pavel-odintsov pavel-odintsov commented Aug 1, 2014

Hello!

Thank you for answer!

Yes, I'm use rdiff delta in way:

rdiff delta signature.dat data.dat - | pigz > signature.gz 

But out of box support for compressed deltas will be fine feature.

@dbaarda
Copy link
Member

@dbaarda dbaarda commented Oct 10, 2014

Note rsync uses a modified zlib for delta compression that uses matching data that is not included in the delta to "prime" the compression data tables and then throws away the "matching" compressed output. This in general gives slightly better compression than just gzipping the resulting delta. For an example of how this can be done with an unmodified zlib you can look a pysync http://minkirri.apana.org.au/~abo/projects/pysync.

@dbaarda
Copy link
Member

@dbaarda dbaarda commented Oct 17, 2017

I'm considering tackling this next. Either that or Rabin-Karp rollsums... whichever people prefer.

Note that signature files being collections of hash values probably don't compress at all well, unless they have long runs of identical blocks. I'm planning to only add compression to the deltas, with optional "context compression" support (which compresses hits as well as misses to prime the compressor with context from matching blocks).

@yxj1992
Copy link

@yxj1992 yxj1992 commented Nov 28, 2017

I have set cmake -D ENABLE_COMPRESSION=ON .,
but it doesn't work,'ERROR: (rdiff_options) sorry, compression is not really implemented yet',
who can tell me why?

@dbaarda
Copy link
Member

@dbaarda dbaarda commented Feb 11, 2018

yxj1992: because that feature hasn't been implemented yet.

@dbaarda
Copy link
Member

@dbaarda dbaarda commented Aug 23, 2019

FTR, I found a good comparison between gzip, bz2, and xz here;

https://www.rootusers.com/gzip-vs-bzip2-vs-xz-performance-comparison/

Looking at this xz is the clear winner on compression ratio, but it is still diminishing returns against gzip compared to no compression at all. The clear winner on speed AKA cpu is gzip, particularly for decompression. For bz2, it beats gzip on compression, but it's not as good as xz, and pays a nasty speed/cpu price.

For librsync's application, I feel gzip is the winner, and there's not much point in implementing support for xz or particularly bzip2.

@sourcefrog
Copy link
Contributor

@sourcefrog sourcefrog commented Aug 23, 2019

@dbaarda
Copy link
Member

@dbaarda dbaarda commented Jun 4, 2020

I looked at zstd. I agree it looks like the best/only solution needed for compression. There is a Debian libzstd-dev package we can just build/link against.

On how to best implement this, I think this is blocked on implementing a hit/miss callback api for deltas as described in #197. That API would make it much easier to implement different kinds of delta output formats including things like compression and whole-file-checksums, as different callback implementations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.