html/template: investigate using x/net/html for parsing #63392
Labels
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone
In a similar vein as #63371.
There have been a number of issues with html/template related to mis-parsing HTML (e.g. http://go.dev/cl/526157, https://go-review.googlesource.com/c/go/+/526156, etc). In part this is because the package implements a rather naive, lightweight parser, which does not implement all of the rules of html/template.
There are two particular problems this creates: (1) html/template does not catch syntactic errors in templates, meaning a user can create a template which cannot be reasonably parsed by downstream consumers (i.e. browsers), and (2) because html/template does not implement HTML normalization rules, a user can create a template which will be interpreted differently by html/template and a downstream consumer (i.e. because nodes may be moved around, or tags normalized, etc). I haven't found a particular case of this yet (mostly because I haven't looked too closely), but it is plausible that (2) could cause contextual escaping to operate incorrectly, since the context that html/template derives an action is in may be different from the context the action actually ends up in after HTML normalization rules are applied.
One option to fix this would be to use the golang.org/x/net/html parser in html/template instead of the lightweight parser we currently use. x/net/html contains a (mostly) fully specification complaint parser that applies all of the expected normalization rules when constructing the DOM tree. Using this parser we could normalize the template before doing contextual escaping, resulting in outputted HTML which will match how browsers parse the document (assuming the user isn't directly inserting additional HTML into the document).
The argument against using x/net/html is precisely because it does normalization, which could result in output which does not strictly match the template as written (i.e. because missing tags could be inserted, or nodes reordered according to the specification). I think this is arguably acceptable, since it should be clear to the user how the HTML they are producing will be interpreted (i.e. normalizing/catching syntax errors before consumption by a browser gives the user a chance to catch that something is broken, and possibly fix it). That said this arguably would be a breaking change, so if we do consider doing this it may only be viable as a v2 API of the existing package .
The main upside of this change would be no longer needing to maintain two parsers (of wildly varying quality), and providing significantly more "correct" HTML output for users.
The text was updated successfully, but these errors were encountered: