Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TGZ stream can't be read #3

Closed
grahamegrieve opened this issue Jan 5, 2024 · 8 comments
Closed

TGZ stream can't be read #3

grahamegrieve opened this issue Jan 5, 2024 · 8 comments

Comments

@grahamegrieve
Copy link

This tgz file can't be read by the library - it gives a Data Error.

But the other libraries / tools I have can all read it

package.tgz

@grahamegrieve
Copy link
Author

Code to produce the error:

program project1;

{$mode objfpc}{$H+}

uses
  {$IFDEF UNIX}
  cmem, cthreads,
  {$ENDIF}
  Classes, SysUtils, zflate;

function ungzip(bytes : TBytes) : TBytes;
begin
  result := zflate.zdecompress(bytes);
  if zlastError <> 0 then
    raise Exception.create('Failed to read compressed content: '+zflatetranslatecode(zlasterror));
end;

var
  b : TBytes;
  f : TFileStream;
begin
  try
    f := TFileStream.create('/Users/grahamegrieve/temp/package.tgz', fmOpenRead);
    try
      setLength(b, f.Size);
      f.Read(b[0], f.size);
      writeln('Unencrpyted is '+inttostr(length(ungzip(b)))+' bytes in size');
    finally
      f.free;
    end;
  except
    on e : Exception do
      writeln('Error: '+e.message);
  end;
end.

@grahamegrieve
Copy link
Author

reproduced on : Lazarus 3.1 (rev lazarus_3_0-15-g9bef988478) FPC 3.3.1 aarch64-darwin-cocoa and Lazarus 2.2.6 (rev 0df75f4) FPC 3.2.2 x86_64-win64-win32/win64

@grahamegrieve
Copy link
Author

Because z.state^.mode is BAD somewhere in the middle of the file

@fibodevy
Copy link
Owner

fibodevy commented Jan 6, 2024

Fixed by increasing buffer size (zbuffersize) from 4 to 16 MB, if you can confirm you can close this issue

costateixeira added a commit to costateixeira/fhirserver that referenced this issue Jan 6, 2024
@grahamegrieve
Copy link
Author

it does fix, thanks. But it raises multiple questions for me. Is 16 enough for everything? What is enough? Shouldn't the underlying zlib library return (say) E_OUT_OF_MEMORY rather than E_DATA_ERROR if there's not enough memory?

@fibodevy
Copy link
Owner

fibodevy commented Jan 6, 2024

Actually I dont know why it returned Z_DATA_ERROR and not Z_BUF_ERROR, I didnt dig that deep in to Z* units.

zchunkmaxsize and zbuffersize are VARs and not CONSTs on purpose. To adjust them. I wanted to create a rountine that would double the buffer size in case its too small. Maybe in the future.

The case is, you can compress a very large string to a very small output, lets say you have 20 MB of "x", just "x" repeated 20 mln times. This will compress to just 20405 bytes (in case of GZIP level 9). Now, you have a buffer of 16 MB and chunk size of 128 KB, this means you get whole 20405 bytes and try to expand it to 20 MB having only 16 MB buffer, it wont work. You need to use either smaller chunk size, or bigger buffer size.

@grahamegrieve
Copy link
Author

That's kind of unfortunate for code that's decompressing tgzs from unknown software - I don't know what buffer has to be? I mean, we can assume 16MB, I guess, and bump it up if there's ever a problem, but it kind of feels like hanging technical risk that I'd rather just not have. The paszlib code can't allocate a bigger buffer on the fly?

@fibodevy
Copy link
Owner

fibodevy commented Jan 6, 2024

I increased it to 64 MB, cos why not.

For GZIP there is original data size information that can be used to prepare buffer. Deflate streams should be streamed along with original data size.

I wouldnt worry, there is validation (CRC32 + original data size), so if you will find a valid GZ file that the zflate has problems with let me know.

@fibodevy fibodevy closed this as completed Jan 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants