Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Zip archive inconsistent" error, although supported by other unpackers #235

Open
pwuertz opened this issue Mar 17, 2021 · 7 comments
Open
Labels
enhancement Request a new feature.

Comments

@pwuertz
Copy link

pwuertz commented Mar 17, 2021

Describe the Bug
Trying to open a specific zip file with zip_open_from_source results in error "Zip archive inconsistent".
I'm using ZIP_RDONLY as the only flag, i.e. ZIP_CHECKCONS is not set.
The Test.zip is attached to this bug report.

Opening this archive works with all the usual desktop tools like 7zip, Windows built in unzip, Gnome file roller.

The unzip -t command on Linux does complain about this file though

EF block length (12374 bytes) exceeds remaining EF data (16 bytes)

I'm not an expert on the Zip file format and don't know if it is really corrupt, but given that most prominent tools
are able to handle this archive I'm wondering if there is a way to read it gracefully with libzip too.

Personal dilemma: This file is hosted by a hardware device, so repacking or fixing the file is not in my scope :/.

libzip Version
libzip 1.7.3 from conan package manager

Operating System
Windows 10, Ubuntu 20.10

Test Files
Offending test file Test.zip

@pwuertz pwuertz added the bug libzip doesn't behave as expected. label Mar 17, 2021
@0-wiz-0
Copy link
Member

0-wiz-0 commented Mar 17, 2021

As libzip and unzip say, this is a bug in the zip archive.
One extra field for the first entry claims that it contains 12374 bytes, but the whole area reserved for extra fields for this entry is 20 bytes.
You should fix it the file.

If this is not an option, but you can change the source code reading the zip archive, you could load the file into a memory buffer and overwrite the wrong extra field data there. Example code for that is in https://github.com/nih-at/libzip/blob/master/examples/in-memory.c

@0-wiz-0 0-wiz-0 closed this as completed Mar 17, 2021
@pwuertz
Copy link
Author

pwuertz commented Mar 18, 2021

Ok I think I got a better understanding of this now. I guess my real question was: There seem to be zip files out there in the wild with malformed extra fields (found other issues like this in other projects too). Other tools seem to be able to recover from this, assumingly by truncating or ignoring the extra field data. Couldn't the same be done with libzip? Perhaps with an opt-in zip_open flag for fault tolerant parsing? Or a flag that just skips EF data altogether if you're not interested in it?

Fixing such files in memory would be possible, but to do this you'd have to implement a zip structure parser from scratch, right?

@0-wiz-0
Copy link
Member

0-wiz-0 commented Mar 18, 2021

I suggested fixing the zip archive only as a workaround for your immediate problem, I don't think it's a good solution in general. You would need a ZIP parser for that, yes.

libzip uses some extra fields for basic features like zip64 or UTF-8 support. When ignoring extra fields completely, that would suffer.

I'm not convinced that we should support incorrectly created ZIP archives.

@0-wiz-0 0-wiz-0 reopened this Mar 18, 2021
@0-wiz-0 0-wiz-0 added enhancement Request a new feature. and removed bug libzip doesn't behave as expected. labels Mar 18, 2021
@fdegros
Copy link

fdegros commented Sep 16, 2021

Regarding how frequent this ZIP_ER_INCONS error could be, I have some data collected over the last few months, with the number of data points in the order of millions. Error percentages are fairly stable over time.

ZIP_ER_INCONS occurs for 0.5% of all the ZIP files. This is a significant proportion. But it is less than ZIP_ER_NOZIP, which occurs for 2.2% of all the ZIP files.

LibZip Errors

@pmqs
Copy link

pmqs commented Oct 31, 2022

@pwuertz I know this issue is a well over a year old, but do you remember where the Test.zip file originated or anything about what it is used for? Just trying to understand more about the misuse of the extra field.

The non-standard extra field looks deliberate because I see the identical invalid extra fields in both the local & central headers records in the zip file. That seems too much of a coincidence to mark down as corruption.

See the two ERROR lines below

0000 0004 50 4B 03 04 LOCAL HEADER #1       04034B50
0004 0001 14          Extract Zip Spec      14 '2.0'
0005 0001 00          Extract OS            00 'MS-DOS'
0006 0002 00 00       General Purpose Flag  0000
                      [Bits 1-2]            0 'Normal Compression'
0008 0002 08 00       Compression Method    0008 'Deflated'
000A 0004 65 99 FA 4E Last Mod Time         4EFA9965 'Fri Jul 26 19:11:10 2019'
000E 0004 24 41 BA 99 CRC                   99BA4124
0012 0004 3A CF 00 00 Compressed Length     0000CF3A
0016 0004 CC 1F 0A 00 Uncompressed Length   000A1FCC
001A 0002 27 00       Filename Length       0027
001C 0002 14 00       Extra Length          0014
001E 0027 42 61 73 6C Filename              'Basler_Ace_USB_99ba4124_Version_1_0.
          65 72 5F 41                       xml'
          63 65 5F 55
          53 42 5F 39
          39 62 61 34
          31 32 34 5F
          56 65 72 73
          69 6F 6E 5F
          31 5F 30 2E
          78 6D 6C
0045 0002 47 43       Extra ID #0001        4347
0047 0002 56 30         Length              3056
# ERROR: 'Length' field @ 0x47 in 'Extra ID' 0x4347 () invalid: value 0x3056 > 0x10 bytes remaining

0049 0010 01 00 01 00   Extra Payload       ................
          00 00 00 00
          01 00 00 00
          00 00 00 00
0059 CF3A ...         PAYLOAD

CF93 0004 50 4B 01 02 CENTRAL HEADER #1     02014B50
CF97 0001 14          Created Zip Spec      14 '2.0'
CF98 0001 03          Created OS            03 'Unix'
CF99 0001 14          Extract Zip Spec      14 '2.0'
CF9A 0001 00          Extract OS            00 'MS-DOS'
CF9B 0002 00 00       General Purpose Flag  0000
                      [Bits 1-2]            0 'Normal Compression'
CF9D 0002 08 00       Compression Method    0008 'Deflated'
CF9F 0004 65 99 FA 4E Last Mod Time         4EFA9965 'Fri Jul 26 19:11:10 2019'
CFA3 0004 24 41 BA 99 CRC                   99BA4124
CFA7 0004 3A CF 00 00 Compressed Length     0000CF3A
CFAB 0004 CC 1F 0A 00 Uncompressed Length   000A1FCC
CFAF 0002 27 00       Filename Length       0027
CFB1 0002 14 00       Extra Length          0014
CFB3 0002 00 00       Comment Length        0000
CFB5 0002 00 00       Disk Start            0000
CFB7 0002 00 00       Int File Attributes   0000
                      [Bit 0]               0 'Binary Data'
CFB9 0004 00 00 00 00 Ext File Attributes   00000000
CFBD 0004 00 00 00 00 Local Header Offset   00000000
CFC1 0027 42 61 73 6C Filename              'Basler_Ace_USB_99ba4124_Version_1_0.
          65 72 5F 41                       xml'
          63 65 5F 55
          53 42 5F 39
          39 62 61 34
          31 32 34 5F
          56 65 72 73
          69 6F 6E 5F
          31 5F 30 2E
          78 6D 6C
CFE8 0002 47 43       Extra ID #0001        4347
CFEA 0002 56 30         Length              3056
# ERROR: 'Length' field @ 0xCFEA in 'Extra ID' 0x4347 () invalid: value 0x3056 > 0x10 bytes remaining

CFEC 0010 01 00 01 00   Extra Payload       ................
          00 00 00 00
          01 00 00 00
          00 00 00 00

CFFC 0004 50 4B 05 06 END CENTRAL HEADER    06054B50
D000 0002 00 00       Number of this disk   0000
D002 0002 00 00       Central Dir Disk no   0000
D004 0002 01 00       Entries in this disk  0001
D006 0002 01 00       Total Entries         0001
D008 0004 69 00 00 00 Size of Central Dir   00000069
D00C 0004 93 CF 00 00 Offset to Central Dir 0000CF93
D010 0002 0E 00       Comment Length        000E
D012 000E 00 00 00 00 Comment               '              '
          00 00 00 00
          00 00 00 00
          00 00

Error Count: 2
Done

@pwuertz
Copy link
Author

pwuertz commented Nov 1, 2022

@pmqs

.. do you remember where the Test.zip file originated or anything about what it is used for?

Yea sure. See that Basler_Ace_USB_99ba4124_Version_1_0.xml file in there? It's a GenICam device descriptor from a Basler ace series industrial camera.

looks deliberate because I see the identical invalid extra fields in both the local & central headers records in the zip file. That seems too much of a coincidence to mark down as corruption.

True, not "corruption" in the sense of random errors affecting the transfer or storage of data. But most probably a fault in the program that was used to create the zip file, i.e. an algorithm that deterministically creates invalid or "corrupt" archives.

@pmqs
Copy link

pmqs commented Nov 1, 2022

Thanks @pwuertz

Interesting to note that the two byte extra ID used just happen to be ASCII "GC" -- that matches well with "GenCam".

May be a deliberate non-standard use of the extra field that breaks the zip spec or just vestigial data that ended up getting released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request a new feature.
Projects
None yet
Development

No branches or pull requests

4 participants