Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading multiple concatenated zlib streams using poco::InflatingInputStream #1507

Closed
ztlpn opened this issue Dec 2, 2016 · 3 comments
Closed

Comments

@ztlpn
Copy link

ztlpn commented Dec 2, 2016

Suppose I have several (and it is not known in advance how many) concatenated gzipped files and I want to inflate them. To do that I repeatedly read from InflatingInputStream and when the eofbit is set, I call reset() and try to read some more. Sample code:

poco::InflatingInputStream inflating_stream(input_stream);
std::vector<char> buf(1024);
while (true)
{
	inflating_stream.read(buf.data(), buf.size());
	size_t gcount = inflating_stream.gcount();

	if (!gcount && inflating_stream.eof())
	{
		inflating_stream.reset();
		inflating_stream.read(buf.data(), buf.size());
		gcount = inflating_stream.gcount();
	}

	if (gcount)
	{
		// Good, do something to data stored in buf.
	}
	else
	{
		// It is impossible to distinguish here if all data from the input_stream
		// has been read or there is some leftover data which zlib couldn't process.
		break;
	}
}

The problem is that it is impossible to distinguish between the case when all data from input_stream has been read and the case when there is still some invalid data left at the end of input_stream. input_stream.eof() doesn't help here because input data is buffered in InflatingStreamBuf::_buffer and there is no way to tell if this buffer is empty or not.

Possible workarounds:

  • Add a method to InflatingStreamBuf indicating that there is still input data to be read.
  • Don't reset eofbit in InflatingStreamBuf::reset() if all input data has been read.
@ztlpn
Copy link
Author

ztlpn commented Dec 12, 2016

@obiltschnig Any thoughts?

@obiltschnig
Copy link
Member

If zlib fails to process any leftover data, InflatingStreamBuf::readFromDevice() will throw an exception which should cause the stream state to go bad. So maybe checking for inflating_stream.bad() may work. Haven't tested this scenario, though.

@ztlpn
Copy link
Author

ztlpn commented Dec 15, 2016

Thanks for the reply!

maybe checking for inflating_stream.bad() may work

I think this won't work. Suppose input_stream contains just one zlib stream. Then, after we finish decoding it and unconditionally call reset(), inflating_stream.eof() will be reset to false. Then, the next attempt to read from inflating_stream will set badbit even if the input_stream is already finished. So, badbit will be set in both cases - some bad leftover data or legitimate end of input_stream.

Not resetting inflating_stream.eof() will help (this is my second suggestion for a workaround). I am thinking of the following patch to InflatingStreamBuf::reset() (can create a PR with this patch).

{
        int rc = inflateReset(&_zstr);
        if (rc == Z_OK)
-               _eof = false;
+       {
+               if (_zstr.avail_in > 0 || _pIstr->good())
+               {
+                       _eof = false;
+               }
+       }
        else
                throw IOException(zError(rc));
 }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants