IO::Uncompress::Unzip::getHeaderInfo() should provide (more) information in transparent mode #35

tlhackque · 2021-10-29T15:13:03Z

While I'm reporting issues with IO::Uncompress::Unzip:

When processing files that may turn out not to be compressed, I try to have a common code path.

getHeaderInfo() simply returns undef. I use the following code, which really belongs in getHeaderInfo.

     my $member = $in->getHeaderInfo();
     if( !defined $member ) {
        my $size = ( stat $filename )[7] || 0;
        $size = U64->new( int( $size / U64::HI_1 ), int( $size % U64::HI_1 ) );
        $member = { Name               => $filename,
                  Time               => ( ( stat _ )[9] ) || 0,
                  Type               => $type,
                  UncompressedLength => $size,
                  MethodName         => 'uncompressed',
                };
    }

Providing this pseudo-header metadata allows the following code to use @member->{*} to report filename, size, time, etc. (If this looks odd, in real life it's a subroutine...and $type is 'text', or if binmode has been called, data)

Note that the code in Uncompress::Base sets Name to undef and Type to 'plain', expecting a caller of getHeaderInfo to receive that - even though getHeaderInfo in fact returns undef.

I'm not especially happy about using the U64 method, but that's what getHeaderInfo returns for compressed files.

Note that what's returned for compressed files is not well documented - it should be.

This method returns either a hash reference (in scalar context) or a
list or hash references (in array context) that contains information
about each of the header fields in the compressed data stream(s).

requires a dive into the internals to figure out how to turn "information about" into specific usable data.

And it would be nice if u64::new did the math to split a 64-bit int (or on 32-bit systems, double) into the required upper and lower 1/2s.

The text was updated successfully, but these errors were encountered:

pmqs · 2021-10-30T14:42:49Z

Tidying getHeaderInfo has been on my TODO list for a while. Not been any demand to address it, until now.

Apart from the uncompressed use-case you have highlighted, it is mostly a case of documenting all the fields populated.

Agree about the use of U64 -- I really don't like it myself, but when the module was written the 32-bit OS was still predominant and I needed a way to guarantee support for 64-bit values.

I'll have a think about how best to change getHeaderInfo so that it returns native Perl inst instead/as well as the U64 objects.

tlhackque · 2021-10-30T15:29:05Z

I understand why you have U64 - it's a quite reasonable solution. My comment was more that since it's undocumented, it's awkard for user code to peek inside to use it. The reaching inside is what I wasn't happy about.

As for alternatives:

IEEE double gives 53 significand bits - not quite 64. quad gives 113. IIRC, native Perl switches to double when it feels like it.
(And then there are non-IEEE platforms - mostly older, like VAX.) So I'm not sure you can rely on native scalars giving precise sizes with 64-bit file pointers.

There's Math::Int64 that provides 64-bit int objects no matter how Perl is compiled. And if you have XS code, it his a C API.

But a possibly better choice might be Math::BigInt - it's a bit of overkill for this application, but it is a core module (since 5.6.1 according to corelist) with both PP and XS - so avoids adding a visible dependency to your modules.

Since U64 has been undocumented, I wouldn't object to deprecating or removing it if you switch to something like that. Don't know how many other people might have done the archaeology, but switching to either of those would take minimal effort - and since both overload the usual operators, would simplify any code that touched them.

The main reason I use getHeaderInfo is for logging - it's helpful to know what was processed when looking at application performance... With large input files , it can also be helpful as the basis for sending updates to a progress meter.

pmqs self-assigned this Oct 30, 2021

pmqs added enhancement New feature or request documentation Documentation issue labels Oct 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IO::Uncompress::Unzip::getHeaderInfo() should provide (more) information in transparent mode #35

IO::Uncompress::Unzip::getHeaderInfo() should provide (more) information in transparent mode #35

tlhackque commented Oct 29, 2021

pmqs commented Oct 30, 2021

tlhackque commented Oct 30, 2021

IO::Uncompress::Unzip::getHeaderInfo() should provide (more) information in transparent mode #35

IO::Uncompress::Unzip::getHeaderInfo() should provide (more) information in transparent mode #35

Comments

tlhackque commented Oct 29, 2021

pmqs commented Oct 30, 2021

tlhackque commented Oct 30, 2021