use all uppercase UTF-8 as encoding value in accordance with XML specification #2

Merged
merged 1 commit into from Jul 13, 2011

3 participants

@tsgit

use all uppercase UTF-8 as encoding value in accordance with XML specification

The normative reference for XML is the W3C specification, which says that UTF-8 and UTF-16 are the encodings that all XML processors must accept

http://www.w3.org/TR/2008/REC-xml-20081126/#charsets
http://www.w3.org/TR/2008/REC-xml-20081126/#charencoding

The XML spec does not define this value as case insenstive so I think
that means that "utf-8" is different (and undefined). The spec says
that IANA approved values should be used, as defined at:

http://www.iana.org/assignments/character-sets

which defines "UTF-8" yet makes no mention of "utf-8". "UTF-8" is
defined by RFC3629:

http://www.ietf.org/rfc/rfc3629.txt

but that RFC defers to the Unicode spec for ultimate authority:

http://www.unicode.org/versions/Unicode6.0.0/ch02.pdf

which dicsusses UTF-8 in sections 2.5, 2.6, 2.7 and always refers to
UTF-8 with no mention of case insensitivity or of "utf-8".

My conclusion is that "UTF-8" and NOT "utf-8" should be used in XML
documents.

In practice this does not appear to make much difference with commonly used tools, but for example I found that firefox 5 does not properly display a feed with high code points unless the declaration is UTF-8 instead of utf-8. Perl does appear to handle UTF-8 and utf-8 synonymously, see e.g. "UTF-8 vs. utf8 vs. UTF8" in the Encode man page: perldoc Encode

I think it is desirable to follow the spec closely and consistently use "UTF-8".

Cheers
Thorsten

@miyagawa miyagawa merged commit 92089d3 into miyagawa:master Jul 13, 2011
@gray gray added a commit that referenced this pull request Aug 20, 2011
@gray gray fix test cases from merge request #2 0140f62
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment