Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: html/template: escape unquoted attributes by first quoting them #43224

Open
empijei opened this issue Dec 16, 2020 · 8 comments
Open

proposal: html/template: escape unquoted attributes by first quoting them #43224

empijei opened this issue Dec 16, 2020 · 8 comments
Labels
Projects
Milestone

Comments

@empijei
Copy link
Contributor

@empijei empijei commented Dec 16, 2020

Current behavior

We allow actions to appear in "Unquoted Attribute" contexts.

This means this template is valid:

<img src={{.}}>

To support this when we escape we have to take some extra care to escape all whitespace (which would break the token) and we rely on some niche behaviors of the spec.

For example if we render the template above with "da ta/" we get

<img src=da&#32;ta/>

Which looks an awful lot like an ambiguous syntax. (Playground link)

It turns out that if you follow the spec this should parse as

<img src="data/">

But I find this behavior very brittle, especially considering that the escaping character set that we use currently was retrieved by using a bit of the spec and a custom script we never re-run:

// The set of runes escaped is the union of the HTML specials and
// those determined by running the JS below in browsers:
// <div id=d></div>
// <script>(function () {
// var a = [], d = document.getElementById("d"), i, c, s;
// for (i = 0; i < 0x10000; ++i) {
// c = String.fromCharCode(i);
// d.innerHTML = "<span title=" + c + "lt" + c + "></span>"
// s = d.getElementsByTagName("SPAN")[0];
// if (!s || s.title !== c + "lt" + c) { a.push(i.toString(16)); }
// }
// document.write(a.join(", "));
// })()</script>

Proposal

Change this escaping to run two passes:

  • First, escape the attribute as if it was double-quoted
  • Then emit the value inside a pair of quotes

This would remove the need for htmlNospace(Norm)ReplacementTable and would make the logic more robust and simple.

Notes

I found this while looking into #42506

/cc @katiehockman @FiloSottile @rolandshoemaker @rsc

@gopherbot gopherbot added this to the Proposal milestone Dec 16, 2020
@ianlancetaylor ianlancetaylor changed the title proposal: html/template: Escape unquoted attributes by first quoting them. proposal: html/template: escape unquoted attributes by first quoting them Dec 16, 2020
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals Dec 16, 2020
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 16, 2020

There doesn't seem to be an API change here, so I don't think this has to go through the proposal process. Unless I'm missing something.

@antichris
Copy link

@antichris antichris commented Dec 17, 2020

How about detecting first, whether quoting and/or escaping is needed at all, and foregoing it otherwise?

One of the first pieces of Go code I ever wrote was for this exact purpose:

  1. Replace all ampersands with &amp; entities.

  2. Determine, whether the attribute value needs to be quoted, i.e. whether it contains quotable (any of " ' ` = < or >) and/or whitespace characters.

    If not, we're done here, return the string as it is (after having escaped &amp;s).

  3. Pick the quote (single or double) that is less prevalent in data, escape that.

    This avoids turning foo with value "b""a""r" into foo="&#34;b&#34;&#34;a&#34;&#34;r&#34;" when it can be foo='"b""a""r"'.

  4. Wrap the escaped value in the opposite quote and return.

It does take at most two passes over the string (one to detect quotables and to pick a quote, other to emit escaped quote), but IMO the benefit balances the cost.

@empijei
Copy link
Contributor Author

@empijei empijei commented Dec 21, 2020

@ianlancetaylor

There doesn't seem to be an API change here, so I don't think this has to go through the proposal process. Unless I'm missing something.

This will definitely break some tests so I am proposing this change to weigh with you all if this looks like it is worth the cost. If you think it would make it clearer that this is not an API change I can rename this to be a normal feature request rather than a proposal.

@antichris Thanks for the suggestion. One of the reasons I am proposing this is to simplify the logic in this package and make it more consistent. While your proposal would indeed make escaped strings shorter it would definitely make the logic more complex and the rendered HTML less consistent. Unless you can find a common use case that foresee a big amount of quotes in an HTML attribute value I'd rather keep it simpler and always use the same quotes. Does this make sense to you?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 21, 2020

@empijei OK, thanks.

@FiloSottile
Copy link
Contributor

@FiloSottile FiloSottile commented Dec 21, 2020

I support this. html/template is meant to be secure and easy to use, not to allow ultimate control over how the generated HTML looks like (as opposed to how it behaves). If we can generate output that is equivalent but is safer and simpler, sounds great.

@antichris
Copy link

@antichris antichris commented Dec 22, 2020

@empijei

Unless you can find a common use case that foresee a big amount of quotes in an HTML attribute value, I'd rather keep it simpler and always use the same quotes.

I got two right off the top of my head: data:image/svg+xml URIs and JSON data. For an example of the latter, you can view the source of this very page and look for the optimizely-datafile meta tag: nearly 40% of its content are &quot; entities. I'd say that's significant. The SVG URIs also suffer under current implementation escaping < and > (which don't mean anything special when quoted, therefore require no escaping).

And then there's also inline style and scripts (on<event> and such), of course, but people are supposed to be using single quotes in CSS and JS. Also, people are supposed to avoid using inline stuff in general, but it's not like many care to.

@Dynom
Copy link

@Dynom Dynom commented Jan 2, 2021

[..]
And then there's also inline style and scripts (on<event> and such), of course, but people are supposed to be using single quotes in CSS and JS. Also, people are supposed to avoid using inline stuff in general, but it's not like many care to.

You're, of course, right. However inlining resources is unavoidable for certain common cases, e.g.: e-mails.

Does it make sense to introduce a new function to the API to be more opinionated in these specific cases? Providing more context by exposing a slightly richer API might: keep the logic simpler, allow to stay backwards compatible yet provide to the proposal.

The reason I'm pitching in is because of #42506 which might be solved by hitchhiking on this issue

@FiloSottile
Copy link
Contributor

@FiloSottile FiloSottile commented Jan 5, 2021

To make sure I understand, is the concern purely a matter of readability and size of the generated HTML, or are there cases where this would cause semantic changes or breakage?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Proposals
Incoming
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants