Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #8926 - HttpClient GZIPContentDecoder should remove Content-Len… #10326

Merged
merged 6 commits into from
Aug 21, 2023

Conversation

sbordet
Copy link
Contributor

@sbordet sbordet commented Aug 16, 2023

…gth and Content-Encoding: gzip

Now Content-Length and Content-Encoding are removed/modified by the decoder. In this way, applications have a correct sets of headers to decide whether to decode the content themselves.

…gth and Content-Encoding: gzip

Now Content-Length and Content-Encoding are removed/modified by the decoder.
In this way, applications have a correct sets of headers to decide whether to decode the content themselves.

Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
@joakime
Copy link
Contributor

joakime commented Aug 16, 2023

This broke the org.eclipse.jetty.http2.client.http.ContentLengthTest.testGzippedContentLengthAddedByServer() tests

@sbordet sbordet requested a review from gregw August 16, 2023 21:33
{
// Pick the last content encoding from the server.
String contentEncoding = contentEncodings.get(contentEncodings.size() - 1);
decoder = getHttpDestination().getHttpClient().getContentDecoderFactories().stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @gregw would say: don't use streams on the fast path.

@@ -87,6 +87,11 @@ public HttpFields getHeaders()
return headers.asImmutable();
}

protected HttpFields.Mutable headers()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already the public HttpResponse headers(Consumer<HttpFields.Mutable> consumer) API that allows mutating the headers. IMHO one of these two methods should go.

Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
@sbordet sbordet requested review from gregw and lorban August 17, 2023 10:25
Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
Comment on lines +302 to +312
String contentEncoding = responseHeaders.get(HttpHeader.CONTENT_ENCODING);
if (contentEncoding != null)
{
int comma = contentEncoding.indexOf(",");
if (comma > 0)
{
QuotedCSV parser = new QuotedCSV(false);
parser.addValue(contentEncoding);
List<String> values = parser.getValues();
contentEncoding = values.get(values.size() - 1);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a header can have multiple CONTENT_ENCODING fields which mean the same as a single comma separated field. But it is 1am and it could be space separate fields I'm thinking of?
So probably best to use HttpFields.getCSV

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wanted to be really nasty, we should strip "identity" encodings off the end.... but the server doesn't do that (anymore)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the times Content-Encoding is single-valued.

As you seem concerned about hot paths, I chose not to use getCSV() because in the single-value case allocates a List and a QuotedCSV for nothing, while the code above only does so for multi-values.

Doing a first get() and then getCSV() iterates over the headers twice, although should be a small list.

What do you prefer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a vexed question and now I'm concerned the server doesn't well handle the case of multiple HttpFields.

If we are struggling to come up with something that is both simple, clear and efficient, then probably the HttpFields API is not good enough.
Perhaps we need a method to ask HttpFields if it has a value as the last entry in a list of values, with the option of removing that entry. This could be implemented efficiently without having to tokenize the list twice. I'd be happy with an implementation that worked well for single valued lists, but was more expensive for multi valued lists.

Something like:

    String contentEncoding = responseHeaders.getLastCSV(HttpHeader.CONTENT_ENCODING);

and on mutable:

    String contentEncoding = responseHeaders.takeLastCSV(HttpHeader.CONTENT_ENCODING);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbordet I've created #10340 that might be useful hear as an efficient way to find the last value. I've not done anything about removing the last value until I get feedback.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbordet I've further update #10340 with a listIterator that make operating on the last item simpler. So now in the server GzipRequest, I just copy the HttpFields to a mutable, then iterate in reverse over it correcting the headers found.
This will have merge hell with #10339 and #10308

gregw
gregw previously approved these changes Aug 18, 2023
@gregw
Copy link
Contributor

gregw commented Aug 18, 2023

I'll fix up the efficiency and correctness if content-encoding handling in #10340 to be merged separately.

lorban
lorban previously approved these changes Aug 18, 2023
Copy link
Contributor

@lorban lorban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one tiny nit, otherwise LGTM.


ByteArrayOutputStream output = new ByteArrayOutputStream();
try (InputStream input = listener.getInputStream())
{
IO.copy(input, output);
}
assertArrayEquals(content, output.toByteArray());
// After the content has been coded, the length is known again.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the content has been encoded?

Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
@sbordet sbordet dismissed stale reviews from lorban and gregw via d9f9de7 August 21, 2023 06:57
@sbordet sbordet merged commit e91a689 into jetty-10.0.x Aug 21, 2023
2 of 3 checks passed
@sbordet sbordet deleted the fix/jetty-10-8926-httpclient-content-encoding branch August 21, 2023 06:58
sbordet added a commit that referenced this pull request Aug 21, 2023
Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HttpClient GZIPContentDecoder should remove Content-Length and Content-Encoding: gzip
4 participants