$ go fmt x.go #attached
You see no error.
$ go build x.go
You get an error:
./x.go:4: Unicode (UTF-8) BOM in middle of file
The error is correct: this is an illegal Go source file. I suspect the parser isn't
rejecting BOMs properly. They are allowed only as the first code point in a source file.
It's a minor point but consistency among tools would be good.
The Go Programming Language Specification
Version of September 4, 2012
Source code representation
Source code is Unicode text encoded in UTF-8. The text is not canonicalized, so a single
accented code point is distinct from the same character constructed from combining an
accent and a letter; those are treated as two code points. For simplicity, this document
will use the unqualified term character to refer to a Unicode code point in the source
Each code point is distinct; for instance, upper and lower case letters are different
Implementation restriction: For compatibility with other tools, a compiler may disallow
the NUL character (U+0000) in the source text.
The Unicode Standard, Version 6.2
Chapter 3 Conformance
When represented in UTF-8, the byte order mark [U+FEFF] turns into the byte sequence
<EF BB BF>.
In a Unicode encoding form: A Unicode string is said to be in a particular Unicode
encoding form if and only if it consists of a well-formed Unicode code unit sequence of
that Unicode encoding form.
• A Unicode string consisting of a well-formed UTF-8 code unit sequence is said to be
in UTF-8. Such a Unicode string is referred to as a valid UTF-8 string, or a UTF-8
string for short.
• Any UTF-8 byte sequence that does not match the patterns listed in Table 3-7 is
Table 3-7 lists all of the byte sequences that are well-formed in UTF-8.
Table 3-7. Well-Formed UTF-8 Byte Sequences [in pertinent part]
Code Points First Byte Second Byte Third Byte Fourth Byte
U+E000..U+FFFF EE..EF 80..BF 80..BF
The Unicode specification defines UTF-8. It looks to me as if the UTF-8 byte sequence
<EF BB BF>, for the BOM U+FEFF code point, is defined by Unicode as a well-formed
sequence of UTF-8 bytes. Therefore, I'm surprised that Go does not accept it. Are there
any other well-formed sequences of UTF-8 bytes does Go not accept, apart from the NUL
Does this break the Go 1 guarantee that "Source code is Unicode text encoded in UTF-8.",
except that "a compiler may disallow the NUL character (U+0000)"?