-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Description
What version of Go are you using (go version)?
go version go1.9rc2_cl165246139 linux/amd64
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (go env)?
linux/amd64
What did you do?
html.UnescapeString treats HTML character references that are missing a final ; as valid character references and escapes them. For example, : is unescaped to :.
https://play.golang.org/p/oyPAjmj0s_
The HTML5 specification states that all valid character references must be terminated by a ; character.
https://www.w3.org/TR/html5/syntax.html#character-references
Therefore, character references such as : that are missing this semicolon should not be unescaped.
Note: the authors of this function probably intended to accept unterminated character references (see this test case). This was probably to handle an edge case mentioned in the HTML4 spec (https://www.w3.org/TR/html4/charset.html#entities):
In SGML, it is possible to eliminate the final ";" after a character reference in some cases (e.g., at a line break or immediately before a tag). In other circumstances it may not be eliminated (e.g., in the middle of a word). We strongly suggest using the ";" in all cases to avoid problems with user agents that require this character to be present.