Remove “Content-Type: application/x-www-form-urlencoded; charset” advice #69762

vadmium · 2015-11-07T08:43:41Z

BPO	25576
Nosy	@orsenthil, @bitdancer, @vadmium
Files	urlencoded-charset.patch urlencoded-charset.2.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2015-11-24.23:38:26.470>
created_at = <Date 2015-11-07.08:43:40.644>
labels = ['docs']
title = 'Remove \xe2\x80\x9cContent-Type: application/x-www-form-urlencoded; charset\xe2\x80\x9d advice'
updated_at = <Date 2015-11-24.23:38:26.469>
user = 'https://github.com/vadmium'

bugs.python.org fields:

activity = <Date 2015-11-24.23:38:26.469>
actor = 'martin.panter'
assignee = 'docs@python'
closed = True
closed_date = <Date 2015-11-24.23:38:26.470>
closer = 'martin.panter'
components = ['Documentation']
creation = <Date 2015-11-07.08:43:40.644>
creator = 'martin.panter'
dependencies = []
files = ['40970', '40983']
hgrepos = []
issue_num = 25576
keywords = ['patch']
message_count = 6.0
messages = ['254263', '254316', '254332', '254347', '254361', '255302']
nosy_count = 5.0
nosy_names = ['orsenthil', 'r.david.murray', 'docs@python', 'python-dev', 'martin.panter']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue25576'
versions = ['Python 3.4', 'Python 3.5', 'Python 3.6']

vadmium · 2015-11-07T08:43:39Z

I understand using a “charset” parameter with “Content-Type: application/x-www-form-urlencoded” is not standardized. Since bpo-11082, the documentation advises to use it, but I propose to remove this advice.

HTML 5 mentions setting a _charset_ parameter, and mentions decoding with a default of UTF-8 (not Latin-1!), but does not mention any Content-Type parameters.

There seems to be confusion about what encoding it actually represents. According to <https://bugzilla.mozilla.org/show_bug.cgi?id=7533\>, Mozilla briefly set this “charset” parameter a long time ago, but it would have corresponded to the urlencode(encoding=...) argument. The Python documentation currently suggests calling data.encode("utf-8"), which is misleading, because the urlencode() output is already guaranteed to be ASCII text. Any non-ASCII characters and bytes will already be character-encoded and percent-encoded by urlencode(). So I also propose to change the examples to data.encode("ascii").

bitdancer · 2015-11-08T01:16:54Z

Although I didn't read through the whole thing, the mozilla bug discussion indicates this is the correct way to specify the charset, it's just that there was lots of buggy software that didn't handle setting it to latin-1. Is the same true for setting it to utf-8?

Agreed about the encode call.

vadmium · 2015-11-08T10:56:36Z

I think the server bugs referenced by the Mozilla bug are mainly about servers that do not recognize the content type at all, due the the presence of any charset parameter. They probably do something like “if headers['Content-Type'] == 'application/x-www-form-urlencoded' ” without checking for parameters first. So it wouldn’t matter if it was charset=latin-1 or charset=utf-8.

A couple comments in the Mozilla bug say that including “charset” is specified by a HTTP standard, but I suspect this may be a mistake. Perhaps this is the best evidence for my argument, from <http://www.w3.org/TR/html/forms.html#url-encoded-form-data\>:

'''
Parameters on the “application/x-www-form-urlencoded” MIME type are ignored. In particular, this MIME type does not support the “charset” parameter.
'''

bitdancer · 2015-11-08T17:25:28Z

OK, I'll accept that as authoritative :)

One very minor comment in the review, otherwise looks good to me.

vadmium · 2015-11-08T23:25:05Z

The second version of the patch changes some more examples in the how-to to data.encode("ascii"). I’ll leave this open for a bit in case Senthil is around and wants to comment (seeing as he added the text I am removing).

python-dev · 2015-11-24T23:07:24Z

New changeset 16fec577fd8b by Martin Panter in branch '3.4':
Issue bpo-25576: Remove application/x-www-form-urlencoded charset advice
https://hg.python.org/cpython/rev/16fec577fd8b

New changeset 95ae5262d27c by Martin Panter in branch '3.5':
Issue bpo-25576: Merge www-form-urlencoded doc from 3.4 into 3.5
https://hg.python.org/cpython/rev/95ae5262d27c

New changeset d52521d13a64 by Martin Panter in branch 'default':
Issue bpo-25576: Merge www-form-urlencoded doc from 3.5
https://hg.python.org/cpython/rev/d52521d13a64

New changeset 671429cc1d96 by Martin Panter in branch 'default':
Issue bpo-25576: Apply fix to new urlopen() doc string
https://hg.python.org/cpython/rev/671429cc1d96

vadmium assigned docspython Nov 7, 2015

vadmium added the docs Documentation in the Doc dir label Nov 7, 2015

vadmium closed this as completed Nov 24, 2015

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove “Content-Type: application/x-www-form-urlencoded; charset” advice #69762

Remove “Content-Type: application/x-www-form-urlencoded; charset” advice #69762

vadmium commented Nov 7, 2015

vadmium commented Nov 7, 2015

bitdancer commented Nov 8, 2015

vadmium commented Nov 8, 2015

bitdancer commented Nov 8, 2015

vadmium commented Nov 8, 2015

python-dev mannequin commented Nov 24, 2015

Remove “Content-Type: application/x-www-form-urlencoded; charset” advice #69762

Remove “Content-Type: application/x-www-form-urlencoded; charset” advice #69762

Comments

vadmium commented Nov 7, 2015

vadmium commented Nov 7, 2015

bitdancer commented Nov 8, 2015

vadmium commented Nov 8, 2015

bitdancer commented Nov 8, 2015

vadmium commented Nov 8, 2015

python-dev mannequin commented Nov 24, 2015