New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove “Content-Type: application/x-www-form-urlencoded; charset” advice #69762
Comments
I understand using a “charset” parameter with “Content-Type: application/x-www-form-urlencoded” is not standardized. Since bpo-11082, the documentation advises to use it, but I propose to remove this advice. HTML 5 mentions setting a _charset_ parameter, and mentions decoding with a default of UTF-8 (not Latin-1!), but does not mention any Content-Type parameters. There seems to be confusion about what encoding it actually represents. According to <https://bugzilla.mozilla.org/show_bug.cgi?id=7533\>, Mozilla briefly set this “charset” parameter a long time ago, but it would have corresponded to the urlencode(encoding=...) argument. The Python documentation currently suggests calling data.encode("utf-8"), which is misleading, because the urlencode() output is already guaranteed to be ASCII text. Any non-ASCII characters and bytes will already be character-encoded and percent-encoded by urlencode(). So I also propose to change the examples to data.encode("ascii"). |
Although I didn't read through the whole thing, the mozilla bug discussion indicates this is the correct way to specify the charset, it's just that there was lots of buggy software that didn't handle setting it to latin-1. Is the same true for setting it to utf-8? Agreed about the encode call. |
I think the server bugs referenced by the Mozilla bug are mainly about servers that do not recognize the content type at all, due the the presence of any charset parameter. They probably do something like “if headers['Content-Type'] == 'application/x-www-form-urlencoded' ” without checking for parameters first. So it wouldn’t matter if it was charset=latin-1 or charset=utf-8. A couple comments in the Mozilla bug say that including “charset” is specified by a HTTP standard, but I suspect this may be a mistake. Perhaps this is the best evidence for my argument, from <http://www.w3.org/TR/html/forms.html#url-encoded-form-data\>: ''' |
OK, I'll accept that as authoritative :) One very minor comment in the review, otherwise looks good to me. |
The second version of the patch changes some more examples in the how-to to data.encode("ascii"). I’ll leave this open for a bit in case Senthil is around and wants to comment (seeing as he added the text I am removing). |
New changeset 16fec577fd8b by Martin Panter in branch '3.4': New changeset 95ae5262d27c by Martin Panter in branch '3.5': New changeset d52521d13a64 by Martin Panter in branch 'default': New changeset 671429cc1d96 by Martin Panter in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: