Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html/template: investigate using x/net/html for parsing #63392

Open
rolandshoemaker opened this issue Oct 5, 2023 · 0 comments
Open

html/template: investigate using x/net/html for parsing #63392

rolandshoemaker opened this issue Oct 5, 2023 · 0 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.

Comments

@rolandshoemaker
Copy link
Member

In a similar vein as #63371.

There have been a number of issues with html/template related to mis-parsing HTML (e.g. http://go.dev/cl/526157, https://go-review.git.corp.google.com/c/go/+/526156, etc). In part this is because the package implements a rather naive, lightweight parser, which does not implement all of the rules of html/template.

There are two particular problems this creates: (1) html/template does not catch syntactic errors in templates, meaning a user can create a template which cannot be reasonably parsed by downstream consumers (i.e. browsers), and (2) because html/template does not implement HTML normalization rules, a user can create a template which will be interpreted differently by html/template and a downstream consumer (i.e. because nodes may be moved around, or tags normalized, etc). I haven't found a particular case of this yet (mostly because I haven't looked too closely), but it is plausible that (2) could cause contextual escaping to operate incorrectly, since the context that html/template derives an action is in may be different from the context the action actually ends up in after HTML normalization rules are applied.

One option to fix this would be to use the golang.org/x/net/html parser in html/template instead of the lightweight parser we currently use. x/net/html contains a (mostly) fully specification complaint parser that applies all of the expected normalization rules when constructing the DOM tree. Using this parser we could normalize the template before doing contextual escaping, resulting in outputted HTML which will match how browsers parse the document (assuming the user isn't directly inserting additional HTML into the document).

The argument against using x/net/html is precisely because it does normalization, which could result in output which does not strictly match the template as written (i.e. because missing tags could be inserted, or nodes reordered according to the specification). I think this is arguably acceptable, since it should be clear to the user how the HTML they are producing will be interpreted (i.e. normalizing/catching syntax errors before consumption by a browser gives the user a chance to catch that something is broken, and possibly fix it). That said this arguably would be a breaking change, so if we do consider doing this it may only be viable as a v2 API of the existing package .

The main upside of this change would be no longer needing to maintain two parsers (of wildly varying quality), and providing significantly more "correct" HTML output for users.

@rolandshoemaker rolandshoemaker added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

1 participant