-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CRC-32 used as undocumented default #57
Comments
One of the main differences between zpaqfranz and zpaq is the existence of whole file-size checksums (in fact there are even bigger differences for new tar-like format, to be completed) Sadly this ensures that any SHA1 collisions (there are famous PDF files in this respect) are NOT intercepted by zpaq CRC-32 has one major difference from 'cryptographic' hashes (including MD5): it is computable in disordered and combined portions (aka: fragments) The net result is that SHA1 collisions are detected by zpaqfranz (not correct, detected).
7.15 stage time 21.38 no error detected (RAM ~514.07 MB), try CRC-32 (if any) In this case everything is OK
Taking CRC32 too slows down the archiving stage, and make a bigger archive Since data reliability is more important to me, I use it as the default
PS -pakka change only the output, it is an interface for Windows' GUI. Essentially writes less information |
PS now it is 01:35, later I will fix and explain better, time to... bed :) |
You can find here, as the very first difference https://github.com/fcorbelli/zpaqfranz/wiki/Diff-against-7.15:-add |
BTW, I found that Apple Silicon (notably the M2 processor) seems to be hardware-optimized for SHA-256. When I ran the zpaqfranz benchmarks, even against a terabyte or two, SHA-256 performed in your benchmark at about the same speed as XXHASH3. It might make sense to check for hardware acceleration and use SHA-256 as a default instead of XXHASH3 when the performance is going to be roughly the same since SHA-256 is cryptographically strong while the various XXHASH algorithms don't have any cryptographic properties at all. Since I don't know how to the benchmarks are done, this may not actually be representative of real-world speeds. Still, it's at least worth thinking about since a number of other platforms also now include some form of AES hardware to speed up AES cipher operations. |
You can see "under the hood" with a
You need 3 "OK" to "automagically" get HW acceleration. The benchmark is very, very rude, just a quick check to get some infos on VPS' CPUs |
PS this is a "real world" example of a Intel-based server, with proxmox+FreeBSD VM, running on HDD
As you can see the "real" bandwidth of the drive is about 128MB/s, even a 10GB/s hasher will gain nothing |
Hello, I have a question about "t" command. Is there some bug or I should worry about may data? My use case. On Windows Server 2019 I have DB2 database. I do dump daily and store it in zpaq file using just "a" command without any switches. On that machine I use version "zpaqfranz v57.4h-JIT-L (HW BLAKE3,SHA1),SFX64 v55.1, (12 Mar 2023)". When I test on that server all is OK.
Then I transfer archive to other computer (I use filezilla resume option to download only new data). Other computer is Windows 10 with zpaqfranz version "zpaqfranz v58.4s-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2023-06-23)" And test results are:
But when I extract one file for example "C:/KOPIE/DATABASE.0.DB2.DBPART000.20230709223032.001", and get crc32 hash manually with zpaqfranz I get good stored checksum
Extracted hash AC3369F0 is equal with stored hash from "t" command Also I calculated SHA256 checksum of extracted dump file and original file on the server and they are the same. So can I believe that stored file in zpaq archive is good? PS1. During writing this comment I also downloaded zpaqfranz exe version from server and test is good:
So maybe there is some bug in "t" command on newer versions? Or incompatibility in archive format? PS2. Also thank You for fantastic job continuing developing zpaq. I used original zpaq for years and it was a nice found that someone continue the job ^_^ |
It is a known bug, for file size (in decimal) longer than 10 chars |
Thank You for quick response and... fixing release already ^_^ |
I have a job running using the following syntax:
The running job is reporting:
The use of CRC-32 isn't specified on the command line, and seems to occur whether or not I specify a specific xxhash or chunked format. For example, leaving off
-xxh3 -pakka
just results in the output line changing to:instead, which is not what the documentation seems to define as the default either. While I can see why the default of
-xxhash
would default to xxhash64 on a 64-bit system, I'm not sure why CRC-32 is being calculated or why it is a default, especially on a 64-bit system where 32 bits would seem to invite collisions.If you just want a fast default to add, why not use MD5, which (while cryptographically weak) is at least 128 bits? This seems like either an error in the documentation, an error in the defaults, or a sub-optimal choice for a fast and well-supported checksum.
The text was updated successfully, but these errors were encountered: