Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include charset in Content-Type header for all applicable standard types #1439

Merged
merged 2 commits into from Jan 20, 2019

Conversation

Projects
None yet
2 participants
@alexhenrie
Copy link
Contributor

commented Jan 18, 2019

The charset parameter is valid for 6 standard MIME types plus any type ending in +xml (whether or not it is an application type). Inclusion of this parameter is necessary to look at non-ASCII text files in a web browser.

See https://www.iana.org/assignments/media-types/media-types.xhtml for more information.

@davidism

This comment has been minimized.

Copy link
Member

commented Jan 18, 2019

I'm having trouble finding the rules about this in the link you provided. Can you provide a more specific reference?

@alexhenrie

This comment has been minimized.

Copy link
Contributor Author

commented Jan 18, 2019

I came up with the list of application/ecmascript, application/javascript, application/news-checkgroups, application/news-groupinfo, application/sql, and application/xml through a 2-step process:

  1. Searched https://cgit.freedesktop.org/xdg/shared-mime-info/tree/freedesktop.org.xml.in for sub-class-of type="text/plain"

  2. Searched the templates and RFCs of each of those types on https://www.iana.org/assignments/media-types/media-types.xhtml for the word "charset"

Originally I was just going to add application/javascript because that's the kind of file that I was having trouble with, but then I realized that several other types would have the same problem.

@davidism davidism added this to the 0.15 milestone Jan 20, 2019

@davidism

This comment has been minimized.

Copy link
Member

commented Jan 20, 2019

I tried to verify your list, and while it looks like all the ones you listed have a charset parameter, you also missed "application/xml-dtd" and "application/xml-external-parsed-entity", where the subclass was "application/xml", since that's a subclass of "text/plain".

Then I discovered that the types listed in your first and second links don't match up very well. The XDG list has 786 types, the IANA list has 1746 detail pages, and they only share 190, 108 of which start with "application/" and don't end with "+xml", and 6 of which specify a charset parameter.

But if you only look at the IANA list, ignoring the fact that the types aren't in the XDG list (like you did with "application/news-checkgroups"), there are 40 "application" sub-types that have a charset parameter:

iotp
fhir+json
ecmascript
javascript
news-checkgroups
news-groupinfo
nss
prs.alvestrand.titrax-sheet
sgml-open-catalog
smil
sql
vnd.3gpp.mcdata-payload
vnd.3gpp.mcdata-signalling
vnd.adobe.xfdf
vnd.ah-barcode
vnd.dpgraph
vnd.globalplatform.card-content-mgt-response
vnd.msign
vnd.picsel
vnd.syncml.dm+wbxml
vnd.uplanet.alert
vnd.uplanet.alert-wbxml
vnd.uplanet.bearer-choice-wbxml
vnd.uplanet.bearer-choice
vnd.uplanet.cacheop
vnd.uplanet.cacheop-wbxml
vnd.uplanet.channel
vnd.uplanet.channel-wbxml
vnd.uplanet.list
vnd.uplanet.list-wbxml
vnd.uplanet.listcmd
vnd.uplanet.listcmd-wbxml
vnd.wap.sic
vnd.wap.slc
vnd.wap.wbxml
vnd.wap.wmlc
vnd.wap.wmlscriptc
xml
xml-dtd
xml-external-parsed-entity

So you missed a few 😉 (how did you discover "application/news-groupinfo" but not the others?), but this list is clearly not useful for us to maintain. So I'm proposing we just use the XDG list, 6 items. If anyone needs special handling for one of these other types, they can add it to the set for their app.

add changelog for GH-1439
move charset list closer to get_content_type
add comment about source of charset list

@davidism davidism changed the base branch from 0.14-maintenance to master Jan 20, 2019

@davidism

This comment has been minimized.

Copy link
Member

commented Jan 20, 2019

Rebased to master, added changelog and comments.

@davidism davidism merged commit 5464ec5 into pallets:master Jan 20, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@alexhenrie

This comment has been minimized.

Copy link
Contributor Author

commented Jan 20, 2019

Oh wow, I missed a lot. Thanks for catching application/xml-dtd and application/xml-external-parsed-entity: Both are subclasses of text/plain according to Freedesktop, but I completely missed them. I think the list that you came up with is reasonable, especially since application/smil is obsolete in favor of application/smil+xml, charset is redundant for application/fhir+json where the encoding must be UTF-8, and the rest are all quite rare.

Freedesktop may add some of these to their list in the future though (application/vnd.adobe.xfdf looks likely). In that case we'd probably want to add them to Werkzeug too.

Thanks for working on this!

nrejack added a commit to roncanepa/idb-backend that referenced this pull request May 21, 2019

Modify assertion in tests/idb/test_data_api_media.py so test correctl…
…y passes. Test previously passing using werkzeug 0.14.1. werkzeug >=0.15.0 appends charset to the Content-Type header for any xml type. See pallets/werkzeug#1439.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.