Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: Request.Form field data may include HTML entities #45479

Closed
adonovan opened this issue Apr 9, 2021 · 3 comments
Closed

net/http: Request.Form field data may include HTML entities #45479

adonovan opened this issue Apr 9, 2021 · 3 comments

Comments

@adonovan
Copy link

@adonovan adonovan commented Apr 9, 2021

I'm no HTTP expert, so this could be working as intended, but I notice that when an HTML form is posted, the document's encoding is also used to encode the form data. So, if the document is Latin-1, and a text field contains "馃槂", which cannot be represented as Latin-1, then the client (Chrome, in my case) produces a URL ?f=%26%23128515%3B, which is the URL-encoding of "😃", which is the HTML entity reference for U+1f603, a smiley face.

Is Chrome's behavior, of using an HTML entity reference &#...; when the form encoding cannot express the form data,
(a) normal for a client, or
(b) Chrome attempting to fail gracefully when asked to do the impossible?

If (a), are servers expected to handle HTML entity references in form data? I could see no mention of this expectation in the net/http package code or docs. I also can't see how a server could distinguish a Chrome-introduced HTML entity reference from a form that literally contained that sequence, which makes me think (a) is not the answer.
If (b) , is the usual advice "don't let that happen", in other words, make sure the HTML document's encoding is explicitly set to something like UTF-8?

A word of documentation in net/http might help. StackOverflow and the usual sources were surprisingly unhelpful.

@dmitshur dmitshur added this to the Backlog milestone Apr 9, 2021
@dmitshur
Copy link
Contributor

@dmitshur dmitshur commented Apr 9, 2021

@empijei
Copy link
Contributor

@empijei empijei commented Apr 20, 2021

Thanks for reporting this!

The HTTP form fields transport features are completely HTML agnostic. The HTTP spec never mentions such a feature as far as I know.

Here Go is just parsing the HTTP and relaying the information to the user code, which (if they want to support it) has to then decode this bit too (which I would advise against for security reasons).

My suggestion would be to not let it happen: always make sure you're using utf-8. Note that the stdlib always tries to do this when it has the power to choose. Moreover I'd suggest to not try to decode anything that comes from the client that might have been weirdly encoded. Either have some JS on the client (or the browser itself) that applies a well defined encoding or just reject anything you don't recognize.

On the documentation side of things: Go has an implementation of the HTTP spec, which doesn't meddle with HTML entity encoding in any way, so I'm not sure about adding a doc line about this. I fear it might create more confusion than solving issues.

@adonovan
Copy link
Author

@adonovan adonovan commented Apr 20, 2021

OK, thanks for explaining. In that case I'll close this as: net/http working as intended, Chrome working under impossible constraints.

@adonovan adonovan closed this Apr 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants