New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8255244: HttpClient: Response headers contain incorrectly encoded Unicode characters #1169
Conversation
…code characters The HTTP/1.1 Header Parser is updated to support ISO-8859-1 encoding for backward compatibility, in conformance with RFC 7230.
👋 Welcome back dfuchs! A progress list of the required criteria for merging this PR into |
@dfuch This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 37 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/integrate |
@dfuch Since your change was applied there have been 37 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit 1c47244. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
…code characters Reviewed-by: chegar, michaelm
The HTTP/1.1 Header Parser of the new HttpClient currently assumes that all headers (names and value) are US-ASCII and as a result mis-decode any byte whose value is > 127; For instance, 0x80 (128) gets decoded as a U+FF80 (65408) instead of being either rejected or decoded as U+0080.
Historically, HTTP has allowed field content with text in the ISO-8859-1 charset. The ISO-8859-1 charset is also supported by
HttpURLConnection
.We could decide to reject responses whose headers contain non US-ASCII characters out of hand, but for compatibility reasons, it seems preferable to interpret and accept any byte > 127 in header values as an ISO-8859-1 (Latin 1) character.
For backward compatibility, this change proposes to update the HTTP/1.1 Header Parser to support ISO-8859-1 encoding.
The HTTP/1.1 Header Parser will now apply the same validation than is already applied by the HTTP/2 stack.
Progress
Testing
Issue
Reviewers
Download
$ git fetch https://git.openjdk.java.net/jdk pull/1169/head:pull/1169
$ git checkout pull/1169