Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opens gzip compressed content #513

Merged
merged 8 commits into from Jan 2, 2024
Merged

Opens gzip compressed content #513

merged 8 commits into from Jan 2, 2024

Conversation

dr0i
Copy link
Member

@dr0i dr0i commented Dec 18, 2023

  • follows redirects
  • fixes misconception of "Content-Encoding"

Resolves #511.

- follows redirects
- fixes misconception of "Content-Encoding"
Copy link
Member

@blackwinter blackwinter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request does a lot, and in a single commit to boot.

  1. Handling compressed content: Generally 👍, but maybe a generic "decompressor" module would be more versatile?
  2. Following redirects: HttpUrlConnection does this by default, why replace it with a custom implementation?
  3. Fixing "encoding" vs. "charset": Good catch, 👍

I've tried to address most of the issues I have with this pull request in individual commits. Please review.

I'm approving the amended pull request for now, despite my hesitation towards the custom follow logic.

Please note that there are backwards incompatible changes here, so it would probably necessitate a major release.

private static final Pattern HEADER_FIELD_SEPARATOR_PATTERN = Pattern.compile(HEADER_FIELD_SEPARATOR);
private static final Pattern HEADER_VALUE_SEPARATOR_PATTERN = Pattern.compile(HEADER_VALUE_SEPARATOR);

private static final int ALLOWED_REDIRECTIONS = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this value be configurable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say: let's wait and implement if need arises. Would you be ok with this?

private static final Pattern HEADER_FIELD_SEPARATOR_PATTERN = Pattern.compile(HEADER_FIELD_SEPARATOR);
private static final Pattern HEADER_VALUE_SEPARATOR_PATTERN = Pattern.compile(HEADER_VALUE_SEPARATOR);

private static final int ALLOWED_REDIRECTIONS = 3;
private static final int CONNECTION_TIMEOUT = 11000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you arrive at this value? 11 seconds seems kind of arbitrary.

Should this value be configurable?

Copy link
Member Author

@dr0i dr0i Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind of arbitrary. Setting these values at least easily prevent possible infinite loops.
Re configurable: I would say: let's wait and implement if need arises. Would you be ok with this?

@blackwinter blackwinter assigned dr0i and unassigned blackwinter Dec 20, 2023
@blackwinter blackwinter mentioned this pull request Dec 21, 2023
@dr0i dr0i mentioned this pull request Jan 2, 2024
12 tasks
@dr0i
Copy link
Member Author

dr0i commented Jan 2, 2024

Unfortunately HttpURLConnection only follows redirects if they use the same protocol. As this was the initial issue (unable to open http://schema.org/) a custom implementation (or other library) was needed.

Thx for the many improvements/rewritings. Also agreed on multiple commits vs one big commit.
re "major release" : opened #515.

@dr0i dr0i merged commit db1a32b into master Jan 2, 2024
1 check passed
@dr0i dr0i deleted the 511-enableGzipInHttpOpener branch January 2, 2024 12:22
@blackwinter
Copy link
Member

As this was the initial issue (unable to open http://schema.org/) a custom implementation (or other library) was needed.

Unfortunately, this motivation wasn't made explicit. Both this pull request and the referenced issue are primarily concerned with compression. Wouldn't it have been possible to adjust the URL being opened client-side?

Unfortunately HttpURLConnection only follows redirects if they use the same protocol.

Is this custom implementation affected by the same concerns then? ("serious security consequences")

And what about the objections against URLDecoder.decode in the provided workaround? (which seems to have been the basis for your implementation)

@dr0i
Copy link
Member Author

dr0i commented Jan 11, 2024

Sorry to not have been explicit in the issue. I will remember to do better.

"serious security consequences": java network developers decided to treat protocol switches as a possible threat. We don't.

Re objections: you mean https://stackoverflow.com/questions/1884230/httpurlconnection-doesnt-follow-redirect-from-http-to-https/26046079#comment105077439_26046079 ? Is the location header really needed? There is no test for it, so maybe not. If it brings trouble we also could wait for a report here and then fix it.

@blackwinter
Copy link
Member

java network developers decided to treat protocol switches as a possible threat. We don't.

Was this discussed anywhere?

Is the location header really needed?

Of course it's needed, that's where the redirect URL comes from. The objection was against decoding that URL.

There is no test for it, so maybe not.

That's because you only added a test for the compression handling, not for the redirect logic. I missed it in my review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HttpOpener shall be able to handle GZIP encoding
2 participants