Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IO::Compress::Zip::zip( '/some/where/file.txt' => '/some/where/else.zip' ) creates an archive of the input file #38

Closed
jackdeguest opened this issue Apr 3, 2022 · 5 comments
Assignees
Labels
documentation Documentation issue

Comments

@jackdeguest
Copy link

The documentation from IO::Compress::Zip states If the $input_filename_or_reference parameter is a simple scalar, it is assumed to be a filename. This file will be opened for reading and the input data will be read from it.

However, this is not true. if one does IO::Compress::Zip::zip( '/some/where/file.txt' => '/some/where/output.zip' ) this will actually create an archive of /some/where/file.txt rather than reading data from it as the documentation advertise.

Calling the command line utility unzip on the resulting file else.zip would result in a folder hierarchy some/where/file.txt which shows the zip file keeps a record of the original file

However, if one use IO::Uncompress::Zip::unzip to decompress output.zip, we do get back our original content.

For example:

use IO::Compress::Zip;
use IO::Uncompress::Unzip;

open( my $fh, ">", "/tmp/file.txt" ) || die( "$!" );
select((select( $fh ), $| = 1)[0]);
print( $fh "Hello world\n" ) || die( $! );
close( $fh );
my $rv = IO::Compress::Zip::zip( '/tmp/file.txt' => '/tmp/output.zip' );
die( "Error: $IO::Compress::Zip::ZipError\n" ) if( !defined( $rv ) );
print( "ok, source file compressed -> /tmp/output.zip\n" );

IO::Uncompress::Unzip::unzip( '/tmp/file.txt' => '/tmp/decompressed.txt' );
open( my $in, "<", "/tmp/decompressed.txt" ) || die( $! );
my $txt = '';
read( $in, $txt, 1024 );
close( $in );
print( "Decompressed result is: '$txt'\n" ); # "Hello world" <-- ok

A look at /tmp/output.zip using perl -MDevel::Hexdump=xd -lE 'say xd <>' /tmp/output.zip proves that:

[0000]   50 4B 03 04  14 00 08 00  08 00 CA A5  83 54 00 00   PK.. .... .... .T..
[0010]   00 00 00 00  00 00 00 00  00 00 0D 00  1C 00 2F 74   .... .... .... ../t
[0020]   6D 70 2F 66  69 6C 65 2E  74 78 74 55  54 09 00 03   mp/f ile. txtU T...
[0030]   0C 89 49 62  7D 88 49 62  75 78 0B 00  01 04 E8 03   ..Ib }.Ib ux.. ....
[0040]   00 00 04 E8  03 00 00 F3  48 CD C9 C9  57 28 CF 2F   .... .... H... W(./
[0050]   CA 49 E1 02  00 50 4B 07  08 D5 E0 39  B7 0E 00 00   .I.. .PK. ...9 ....
[0060]   00 0C 00 00  00 50 4B 01  02 14 03 14  00 08 00 08   .... .PK. .... ....
[0070]   00 CA A5 83  54 D5 E0 39  B7 0E 00 00  00 0C 00 00   .... T..9 .... ....
[0080]   00 0D 00 18  00 00 00 00  00 01 00 00  00 B4 81 00   .... .... .... ....
[0090]   00 00 00 2F  74 6D 70 2F  66 69 6C 65  2E 74 78 74   .../ tmp/ file .txt
[00a0]   55 54 05 00  01 0C 89 49  62 75 78 0B  00 01 04 E8   UT.. ...I bux. ....
[00b0]   03 00 00 04  E8 03 00 00  50 4B 05 06  00 00 00 00   .... .... PK.. ....
[00c0]   01 00 01 00  53 00 00 00  65 00 00 00  00 00         .... S... e... ..

The expected behaviour would have been to have the actual content of file.txt zipped and not the file also.

@pmqs
Copy link
Owner

pmqs commented Apr 3, 2022

Hey Jacques,

thanks for the feedback.

As your script proves, my module will indeed read the contents of /tmp/file.txt and store it in the zip file /tmp/output.zip. It also (by design) stores the name of the file in the zip archive.

Apart from storing the compressed payload data, each entry in a zip archive needs a name associated with it. The most natural use-case when reading a file from the filesystem is to use that filename in the zip file. Thus when I run unzip -l against the file created by your code I get this

$ unzip -l !$
unzip -l /tmp/output.zip
Archive:  /tmp/output.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
       12  2022-04-03 18:26   /tmp/file.txt
---------                     -------
       12                     1 file

and if I dump the uncompressed contents of the archive I get this

$ unzip -p /tmp/output.zip
Hello world

What behaviour were you expecting for the naming of the entry in the zip file?

@jackdeguest
Copy link
Author

jackdeguest commented Apr 3, 2022

What behaviour were you expecting for the naming of the entry in the zip file?

I already mentioned in my initial comment the expected behaviour, which is the one advertised in the documentation, i.e. "If the $input_filename_or_reference parameter is a simple scalar, it is assumed to be a filename. This file will be opened for reading and the input data will be read from it." (emphasis added)

I think that, unless you want to modify the behaviour of your code, you need to amend the documentation accordingly.
The workaround that I found, and that works, is to open the file and pass a glob instead of a file path.

@pmqs
Copy link
Owner

pmqs commented Apr 3, 2022

The documented behaviour for naming the members in a zip file are listed in the File Naming Options section

"... By default when adding a filename to the zip archive, the archive member name will match the filename."

@jackdeguest
Copy link
Author

jackdeguest commented Apr 3, 2022

The documented behaviour for naming the members in a zip file are listed in the File Naming Options section

I did not see that, and that's because it is buried in the Name option, and also that section you quote is in the OO interface section while I am referring to the functional section, which, its options, do not have the Name option.

You decide, but I suggest you make it clearer.

@pmqs pmqs self-assigned this Apr 12, 2022
@pmqs pmqs added the documentation Documentation issue label Jun 23, 2022
pmqs added a commit that referenced this issue Jun 25, 2022
@pmqs
Copy link
Owner

pmqs commented Jun 25, 2022

Fixed in 2.201

@pmqs pmqs closed this as completed Jun 25, 2022
atoomic added a commit to atoomic/perl5 that referenced this issue Jul 20, 2022
From ChangeLog

  2.201 25 June 2022

      * Disable zib header tests
        Sat Jun 25 09:10:59 2022 +0100
        63eb5d37291b40dbf07d191a09b7876168008cd4

      * Version 2.201
        Sat Jun 25 09:00:42 2022 +0100
        af51310f68bb225d94eaa29b7f3d2bece1935dfd

      * doc update pmqs/IO-Compress#38
        Thu Jun 23 23:00:31 2022 +0100
        2002d4fd3b3a6f5de6c6c3dc5989cf42581c1758

      * Changes for zlib-ng
        Thu Jun 23 22:43:50 2022 +0100
        2bd52d2918823cc567c3e92dd3d15f87cb4ee8f8

      * Add perl 5.36
        Sun Jun 5 13:34:18 2022 +0100
        ede55370ed4c7eb3c66abc71bc25c7e4019b4c44

      * force streaming zip file when writing to stdout
      * pmqs/IO-Compress#42
        Sun Apr 24 19:43:19 2022 +0100
        b57a3f83f404f5a24242680de5b406cfcf5c03ac

      * read zip timestamp in localtime
        Sun Apr 24 13:11:58 2022 +0100
        0c838f43dc46f292714c82145c9add9932196b01

      * streamzip: tighten up version tests for failing windows tests
      * pmqs/IO-Compress#41
        Sun Apr 24 12:49:57 2022 +0100
        3497645228235ea12c4d559d6dedd4cef47fc94a

      * streamzip: update year
        Sun Apr 24 12:11:35 2022 +0100
        0ac0d1ef603d8854ffc35976196735b663764992

      * Use Time::Local instead of  POSIX::mktime
        Tue Apr 19 11:31:43 2022 +0100
        64a106f1119cbc7dec8db52dca016bb8baacf2d4
atoomic added a commit to Perl/perl5 that referenced this issue Jul 20, 2022
From ChangeLog

  2.201 25 June 2022

      * Disable zib header tests
        Sat Jun 25 09:10:59 2022 +0100
        63eb5d37291b40dbf07d191a09b7876168008cd4

      * Version 2.201
        Sat Jun 25 09:00:42 2022 +0100
        af51310f68bb225d94eaa29b7f3d2bece1935dfd

      * doc update pmqs/IO-Compress#38
        Thu Jun 23 23:00:31 2022 +0100
        2002d4fd3b3a6f5de6c6c3dc5989cf42581c1758

      * Changes for zlib-ng
        Thu Jun 23 22:43:50 2022 +0100
        2bd52d2918823cc567c3e92dd3d15f87cb4ee8f8

      * Add perl 5.36
        Sun Jun 5 13:34:18 2022 +0100
        ede55370ed4c7eb3c66abc71bc25c7e4019b4c44

      * force streaming zip file when writing to stdout
      * pmqs/IO-Compress#42
        Sun Apr 24 19:43:19 2022 +0100
        b57a3f83f404f5a24242680de5b406cfcf5c03ac

      * read zip timestamp in localtime
        Sun Apr 24 13:11:58 2022 +0100
        0c838f43dc46f292714c82145c9add9932196b01

      * streamzip: tighten up version tests for failing windows tests
      * pmqs/IO-Compress#41
        Sun Apr 24 12:49:57 2022 +0100
        3497645228235ea12c4d559d6dedd4cef47fc94a

      * streamzip: update year
        Sun Apr 24 12:11:35 2022 +0100
        0ac0d1ef603d8854ffc35976196735b663764992

      * Use Time::Local instead of  POSIX::mktime
        Tue Apr 19 11:31:43 2022 +0100
        64a106f1119cbc7dec8db52dca016bb8baacf2d4
scottchiefbaker pushed a commit to scottchiefbaker/perl5 that referenced this issue Nov 3, 2022
From ChangeLog

  2.201 25 June 2022

      * Disable zib header tests
        Sat Jun 25 09:10:59 2022 +0100
        63eb5d37291b40dbf07d191a09b7876168008cd4

      * Version 2.201
        Sat Jun 25 09:00:42 2022 +0100
        af51310f68bb225d94eaa29b7f3d2bece1935dfd

      * doc update pmqs/IO-Compress#38
        Thu Jun 23 23:00:31 2022 +0100
        2002d4fd3b3a6f5de6c6c3dc5989cf42581c1758

      * Changes for zlib-ng
        Thu Jun 23 22:43:50 2022 +0100
        2bd52d2918823cc567c3e92dd3d15f87cb4ee8f8

      * Add perl 5.36
        Sun Jun 5 13:34:18 2022 +0100
        ede55370ed4c7eb3c66abc71bc25c7e4019b4c44

      * force streaming zip file when writing to stdout
      * pmqs/IO-Compress#42
        Sun Apr 24 19:43:19 2022 +0100
        b57a3f83f404f5a24242680de5b406cfcf5c03ac

      * read zip timestamp in localtime
        Sun Apr 24 13:11:58 2022 +0100
        0c838f43dc46f292714c82145c9add9932196b01

      * streamzip: tighten up version tests for failing windows tests
      * pmqs/IO-Compress#41
        Sun Apr 24 12:49:57 2022 +0100
        3497645228235ea12c4d559d6dedd4cef47fc94a

      * streamzip: update year
        Sun Apr 24 12:11:35 2022 +0100
        0ac0d1ef603d8854ffc35976196735b663764992

      * Use Time::Local instead of  POSIX::mktime
        Tue Apr 19 11:31:43 2022 +0100
        64a106f1119cbc7dec8db52dca016bb8baacf2d4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Documentation issue
Projects
None yet
Development

No branches or pull requests

2 participants