-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long and very long range redundancy removal #3062
Comments
As a stopgap, you could start using Beyond that, we know that the current implementation is limited (though not the format, it's purely an implementation issue). Going beyond that limitation is substantial work though. |
@giovariot Have you tried the --long=31 option? What are the results compared to default --long and lrzip? |
The compression is a bit better than simply using --long, but not really as good as lrzip's one. I can try again timing the whole thing, but I don't have much free time lately so I'm not sure how long it's gonna be until I'll have some news. |
I'm seeing similar results – And some rough measurements for that data:
|
The problem is that these days, with huge hard disks being so cheap and so common is not unusual to forget about files you already saved and copy them again in another folder, or to try to reorganize one's lifetime files in different ways.
As an example I bring a 1TB SSD that I backed up from a friend of mine who inadvertently formatted it which had 2 partitions: one where they kept files well organized in clearly named folders and files, and a second partition used as a "messy" storage with lots of the same data from old backups used as a source to keep organizing the data on the first partition.
I’ve tried using ckolivas/lrzip which has implemented this kind of unlimited range redundancy removal through its -U option: there's a noticeable difference when compressing big disk images backups on using
zstd
andlrzip
. Of coursezstd
is faster but removing redundancies through ckolivas/lrzip most of the times gets a smaller file in the end. For example on a 127 GB disk image with 3 different partitions containing 3 different windows versions and with free space zeroed out:lrzip -U -l -p1
resulted in a 28.94 GB filezstd -T0 --long -19
resulted in a 33.66 GB fileI've then tried using its
-U -n
option (only redundancy removal, compression disabled) on the 1000.2 GB disk image I was naming as an example: it resulted in a 732.4 GB file. Just using a simple lzo compression algorithm with lrzip’s option-U -l
resulted in a 695 GB file. Using zstd on the original img took lots of hours (~40) and only got a 867.6 GB file.I know I could have simply used
lrzip -U -n
to remove the redundancies and then compress the output file withzstd
, but I think this could be a very useful feature for zstd itself indeed: I’m pretty sure the fast speed and high compression ratios combo make it a very common algorithm to compress disk images.So I'd find it very useful to implement some sort of redundancy removal process:
Don’t really know if anything of this is actually feasible inside the current zstd framework, but I really think some sort of long range redundancy removal should be part of it, considering its possible uses also in commercial environments (virtual machine image compression for example would be an area where this feature would surely be beneficial).
Thanks in advance, thanks for the great software and sorry for my pretty bad English, it’s not my mother tongue.
The text was updated successfully, but these errors were encountered: