Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corruption of members using bzip compression #26

Closed
AGWA opened this issue Mar 14, 2016 · 17 comments

Comments

@AGWA
Copy link
Contributor

commented Mar 14, 2016

I know that Archive::Zip doesn't support bzip compression, but it would be nice if Archive::Zip could pass through bzip members unmodified.

A simple test case:

#!/usr/bin/perl

use Archive::Zip;

my $zip = Archive::Zip->new();
$zip->read("bzip.zip");
$zip->writeToFileNamed("bzip-out.zip");

The output (bzip-out.zip) differs from the input (bzip.zip) and can't be uncompressed with unzip:

$ diff bzip.zip bzip-out.zip
Binary files bzip-out.zip and bzip.zip differ
$ unzip bzip-out.zip
Archive:  bzip-out.zip
 bunzipping: hello.txt               
  error:  invalid compressed data to bunzip

This used to work, but was broken by ecf3d5e, specifically this change to lib/Archive/Zip/Member.pm:

@@ -704,7 +704,7 @@ sub _writeLocalFileHeader {
     $self->_print($fh, $signatureData)
       or return _ioError("writing local header signature");

-    my $header = $self->head(0);
+    my $header = $self->head(1);

     $self->_print($fh, $header) or return _ioError("writing local header");

Changing the 1 back to a 0 makes my test case work again, but breaks other test cases. Unfortunately I have no idea what the right solution is here.

@lamby

This comment has been minimized.

Copy link

commented Mar 21, 2019

Ping on this? It would be great to fix this bug in Debian and the Reproducible Builds project. :)

@lamby

This comment has been minimized.

Copy link

commented Aug 14, 2019

Another gentle ping? :)

@redhotpenguin

This comment has been minimized.

Copy link
Owner

commented Aug 14, 2019

@lamby

This comment has been minimized.

Copy link

commented Aug 14, 2019

Unfortunately the fix is not certain (see @AGWA's original comment) and thus I'm not sure we would be ready to merge this.

@redhotpenguin

This comment has been minimized.

Copy link
Owner

commented Aug 20, 2019

Ok. I'm happy to merge and release a fix that has some level of confidence, with that reflected in the changelog.

@lamby

This comment has been minimized.

Copy link

commented Aug 20, 2019

Getcha, but alas, do note "Unfortunately I have no idea what the right solution is here." :)

@pmqs

This comment has been minimized.

Copy link
Contributor

commented Aug 21, 2019

I took a quick look at the two zip files (bzip.zip and bzip-out.zip) that @AGWA included at the start of this thread.

Quick reminder - the zip files contain a single member, hello.txt

$ unzip -l bzip-out.zip
Archive:  bzip-out.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
       65  2016-03-13 23:19   hello.txt
---------                     -------
       65                     1 file

The only difference between bzip.zip and bzip-out.zip is one field in the local header.

Below is a dump of the local header from bzip.zip. The field that differs with bzip-out.zip is the Compressed Length, where it has the value 0, rather than the expected value 0000003E.

0000 LOCAL HEADER #1       04034B50
0004 Extract Zip Spec      2E '4.6'
0005 Extract OS            00 'MS-DOS'
0006 General Purpose Flag  0000
0008 Compression Method    000C 'BZIP2 '
000A Last Mod Time         486D826B 'Sun Mar 13 16:19:22 2016'
000E CRC                   62E3A2E8
0012 Compressed Length     0000003E
0016 Uncompressed Length   00000041
001A Filename Length       0009
001C Extra Length          001C
001E Filename              'hello.txt'
0027 Extra ID #0001        5455 'UT: Extended Timestamp'
0029   Length              0009
002B   Flags               '03 mod access'
002C   Mod Time            56E5F57A 'Sun Mar 13 23:19:22 2016'
0030   Access Time         56E5F57A 'Sun Mar 13 23:19:22 2016'
0034 Extra ID #0002        7875 'ux: Unix Extra Type 3'
0036   Length              000B
0038   Version             01
0039   UID Size            04
003A   UID                 000001F5
003E   GID Size            04
003F   GID                 000001F5
0043 PAYLOAD               BZh61AY&SY...).......`..@....
                           .1.0.Q...H.JH.KKJ..&'...N.$=+..@
@lamby

This comment has been minimized.

Copy link

commented Aug 22, 2019

What's the goal here, to detect bzip and skip them? I'm afraid I'm entirely lost here. :)

@AGWA

This comment has been minimized.

Copy link
Contributor Author

commented Aug 22, 2019

@lamby strip-nondeterminism just needs the ability to read a bzip'd member from the archive, normalize the metadata, and then write the member back out with the same bzip'd data, the same compressed length, and the new metadata. It doesn't need to look at the uncompressed data. As @pmqs's helpful analysis shows, this is working except for the compressed length, which gets changed to 0.

It makes sense that changing the argument to head from 1 to 0 fixes this, because it ($mode) is used like this:

$mode
? $self->_writeOffset() # compressed size
: $self->compressedSize(), # may need to be re-written later

Maybe _writeLocalFileHeader should check whether the member uses an unsupported compression method ($self->desiredCompressionMethod() != COMPRESSION_DEFLATED I think?) and pass 0 to head if so?

@redhotpenguin, does this sound like a plausible fix?

@pmqs

This comment has been minimized.

Copy link
Contributor

commented Aug 22, 2019

I did a very quick walk through with a Perl debugger, using the bzip.zip file, and the change from head(1) to head(0) does seem to be correct for the bzip2 use-case.

Only problem is that it breaks these tests. These failing tests are all concerned with the zero length zip member edge-conditions.

Test Summary Report
-------------------
t/18_bug_92205.t            (Wstat: 3072 Tests: 32 Failed: 12)
  Failed tests:  1-4, 9-12, 17-20
  Non-zero exit status: 12
t/22_deflated_dir.t         (Wstat: 256 Tests: 4 Failed: 1)
  Failed test:  4
  Non-zero exit status: 1
Files=26, Tests=346,  5 wallclock secs ( 0.09 usr  0.06 sys +  3.69 cusr  0.68 csys =  4.52 CPU)
Result: FAIL
Failed 2/26 test programs. 13/346 subtests failed.

@AGWA - your suggestion about getting _writeLocalFileHeader to check for not COMPRESSION_DEFLATED would be better done as not COMPRESSION_DEFLATED or COMPRESSION_STORED. That's because the failing tests impact both deflated & stored content.

That appears to work. I'll send a pull request shortly.

@pmqs

This comment has been minimized.

Copy link
Contributor

commented Aug 22, 2019

PR #46 created to address the issue.

I'm relatively happy with the change, but would appreciate another set of eyes on the change.

@redhotpenguin

This comment has been minimized.

Copy link
Owner

commented Aug 23, 2019

I've merged #46. Will release soonish.

@pmqs

This comment has been minimized.

Copy link
Contributor

commented Sep 7, 2019

With #46 merged, can this issue be closed?

@lamby

This comment has been minimized.

Copy link

commented Sep 7, 2019

Yep, I think so. @redhotpenguin, fancy making a release? :)

@redhotpenguin

This comment has been minimized.

Copy link
Owner

commented Sep 8, 2019

Ugh, looks like we are missing a test file:

Warning: the following files are missing in your kit:
	t/data/bzip2.zip
@AGWA

This comment has been minimized.

Copy link
Contributor Author

commented Sep 8, 2019

It's a typo in the manifest. It should be t/data/bzip.zip, which exists and is used by the test case.

@redhotpenguin

This comment has been minimized.

Copy link
Owner

commented Sep 8, 2019

Ok, 1.65 is away. Happy to make additional releases if needed, but my review tuits are a bit limited with little kids here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.