Skip to content

Voodoo collisions

Franco Corbelli edited this page Sep 2, 2023 · 1 revision

Why zpaqfranz?

ZPAQ use the SHA-1 hash function for internal deduplication.
It is possible to have collisions, i.e. two different files with the same SHA-1 code.
They are very rare and (it is believed) to do not occur in "real world" file, but only on those specially prepared.

ZPAQ v7.15 simply does not (and cannot) detect this weird situations.

C:\zpaqfranz>c:\nz\zpaq64 x z:\3.zpaq -test
zpaq v7.15 journaling archiver, compiled Aug 17 2016
z:/3.zpaq: 1 versions, 3 files, 10 fragments, 0.383906 MB
Extracting 0.844870 MB in 2 files -threads 32
[1..10] -> 422483
> c:/dropbox/Dropbox/sha1/shattered-1.pdf
> c:/dropbox/Dropbox/sha1/shattered-2.pdf
0.063 seconds (all OK)

while zpaqfranz can DETECT (DETECT, NOT FIX!)

C:\zpaqfranz>zpaqfranz t z:\3.zpaq
zpaqfranz v52.3-experimental snapshot archiver, compiled Jul 15 2021
z:/3.zpaq:
1 versions, 3 files, 10 fragments, 383.906 bytes (374.91 KB)
Check 844.870 in 2 files -threads 32
No error detected in first stage (standard 7.15), now try CRC-32 (if present)


Checking  2 blocks with CRC32 (844.870)
ERROR: STORED B3FBAB1C != DECOMPRESSED 348150FB (ck 00000001) c:/dropbox/Dropbox/sha1/shattered-1.pdf
ERROR: STORED B3FBAB1C != DECOMPRESSED 348150FB (ck 00000001) c:/dropbox/Dropbox/sha1/shattered-2.pdf

Verify time 0.047000 s
Blocks             844.870 (           2)
Zeros                    0 (           0) 0.000000 s
Total              844.870 speed 10.694.556/sec
ERRORS    : 00000002 (ERROR:  something WRONG)
WITH ERRORS

0.140 seconds (with errors)

Is this perfect? Is it impossible to have undetected collisions with zpaqfranz?
Of course, not.
This is an improvement but in general it is ALWAYS possible to have collisions. The point is how rare, and maintaining backward compatibility (aka: use the same files in zpaq/zpaqfranz).

For performance reasons there is a specific switch (-verify) that activates the check during the add file phase (add). Otherwise collisions are NOT detected, until you do a t (test)

To recap: collisions can be detected

  • never, if -nochecksum is used in add(). This is as zpaq works (fastest)
  • early, when adding, if -verify is specified in add()
  • later, doing a t (test) of the archive
Clone this wiki locally