Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chunked encoding + GZIP prevents keep-alive #367

Closed
eddavisson opened this issue Sep 1, 2017 · 5 comments · Fixed by #840
Closed

chunked encoding + GZIP prevents keep-alive #367

eddavisson opened this issue Sep 1, 2017 · 5 comments · Fixed by #840
Assignees

Comments

@eddavisson
Copy link

@eddavisson eddavisson commented Sep 1, 2017

GZIPInputStream.read() starts returning -1 once it finishes reading the gzip trailer. Depending on the underlying InputStream, this may occur before it has finished consuming the underlying InputStream.

In particular, if the underlying InputStream is a ChunkedInputStream, the ChunkedInputStream never reads the final chunk header (which indicates a chunk of length 0).

When ChunkedInputStream.close() is called, it is in a non-STATE_DONE state, and does not call HttpClient.finished(), which is required for the HttpClient object to get into the keep alive cache.

One possible fix is here:
https://github.com/google/google-http-java-client/blob/acdaeacc3c43402d795266d2ebcb584abe918961/google-http-client/src/main/java/com/google/api/client/http/HttpResponse.java#L363

Rather than returning a GZIPInputStream, return a proxy input stream that retains references to both the gzip stream and the underlying stream. The proxy's close() method first consumes the underlying stream by calling read() and then calls close() on the gzip stream.

@eddavisson

This comment has been minimized.

Copy link
Author

@eddavisson eddavisson commented Sep 7, 2017

It's possible this only applies for some HttpTransport types. For example, Apache's ChunkedInputStream.close() method indicates that it will consume the rest the stream automatically:
https://hc.apache.org/httpclient-3.x/apidocs/org/apache/commons/httpclient/ChunkedInputStream.html#close()

@wsh

This comment has been minimized.

Copy link

@wsh wsh commented Nov 15, 2017

We've worked around this in our code for now, but it's definitely worth fixing in the underlying library. 😄

@elharo

This comment has been minimized.

Copy link
Collaborator

@elharo elharo commented Jul 29, 2019

This is really weird. Is the issue that there's a GZIP file/blob in a much larger stream that contains extra data after the end of the gzipped part?

@eddavisson

This comment has been minimized.

Copy link
Author

@eddavisson eddavisson commented Aug 12, 2019

IIRC, the issue occurs when the GZIPed content is chunked, so there is a GZIPInputStream wrapping a ChunkedInputStream.

The GZIPInputStream reports that is has been exhausted as soon as it observes the GZIP trailer. However, the ChunkedInputStream is not exhausted until it observes the final chunk header (with length 0).

@eddavisson

This comment has been minimized.

Copy link
Author

@eddavisson eddavisson commented Oct 7, 2019

Note that the hack we put in place for google-cloud-datastore is going to become increasingly inconvenient in future JDK releases.

codyoss added a commit to codyoss/google-http-java-client that referenced this issue Oct 14, 2019
If a connection is closed and there are some bytes that have not
been read that connection can't be reused. Now GZIPInputStream
will have all of its bytes read on close automatically to promote
connection reuse.

Checkpicked from: googleapis#749
Fixes: googleapis#367
codyoss added a commit to codyoss/google-http-java-client that referenced this issue Oct 14, 2019
If a connection is closed and there are some bytes that have not
been read that connection can't be reused. Now GZIPInputStream
will have all of its bytes read on close automatically to promote
connection reuse.

Cherry-picked: googleapis#749
Fixes: googleapis#367
@codyoss codyoss closed this in #840 Oct 18, 2019
codyoss added a commit that referenced this issue Oct 18, 2019
If a connection is closed and there are some bytes that have not
been read that connection can't be reused. Now GZIPInputStream
will have all of its bytes read on close automatically to promote
connection reuse.

Cherry-picked: #749
Fixes: #367
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.