Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html/template: escapes + to + #42506

Dynom opened this issue Nov 11, 2020 · 9 comments

html/template: escapes + to + #42506

Dynom opened this issue Nov 11, 2020 · 9 comments


Copy link

@Dynom Dynom commented Nov 11, 2020

As far as I know, the + character has no special meaning when used as: <p> This is a plus "+" sign</p>. However html/template's escaping analysis thinks otherwise. What strikes me as odd is that template.HTMLEscaper("+") doesn't provide the same behaviour, even when skipping the allocation optimisation check.

What version of Go are you using (go version)?

$ go version
go1.15.4 darwin/amd64

Does this issue reproduce with the latest release?

It reproduces with 1.15.4, haven't tried tip.

What did you do?

package main

import (

func main() {
	var buf strings.Builder

	tpl := template.Must(template.New("foo").Parse(`{{ "+" }}`))
	tpl.Execute(&buf, nil)
	fmt.Printf("HTML Escaper's result %q\n", template.HTMLEscaper("+"))
	fmt.Printf("html/template         %q\n", buf.String())

Playground with the problem I'm having:

(I've also tried variations of html elements to create a different escaping context (e.g.: surround the action with e.g. <li> tags), which I thought would be the problem initially. That produced the same result however.)

What did you expect to see?

The same output as if the value ran through template.HTMLEscaper()

HTML Escaper's result "+"
html/template         "+"

What did you see instead?

HTML Escaper's result "+"
html/template         "&#43;"
@Dynom Dynom changed the title html/template escapes + to &#43; html/template: escapes + to &#43; Nov 11, 2020
@cagedmantis cagedmantis added the NeedsInvestigation label Nov 30, 2020
@cagedmantis cagedmantis added this to the Backlog milestone Nov 30, 2020
Copy link

@cagedmantis cagedmantis commented Nov 30, 2020

/cc @empijei

Copy link

@empijei empijei commented Dec 16, 2020

Hi, do you mind expanding on what is the issue? The browser should unescape the "+" in all HTML contexts so it everything should still work, right?

Is this an issue on the efficiency of the escaper?

Copy link

@empijei empijei commented Dec 16, 2020

The reason that gets escaped is that it is in the following map and the two defined below it:

// htmlReplacementTable contains the runes that need to be escaped
// inside a quoted attribute value or in a text node.
var htmlReplacementTable = []string{
// U+0000 NULL Parse error. Append a U+FFFD REPLACEMENT
// CHARACTER character to the current attribute's value.
// "
// and similarly
0: "\uFFFD",
'"': "&#34;",
'&': "&amp;",
'\'': "&#39;",
'+': "&#43;",
'<': "&lt;",
'>': "&gt;",

According to the doc the escape set should be a composition of special chars in these states:

None of these includes a "+" sign.

I don't honestly know why it was there in the first place and, to my knowledge, it should not be.

But before we stop escaping it and we introduce XSS and break tests in the community I would like to do some additional checks on this.

Copy link

@tmthrgd tmthrgd commented Dec 16, 2020

Digging through git blame shows it was first introduced 9 years ago in CL 4968058. This is the justification given at the time:

// The set of runes escaped is the union of the HTML specials and
// those determined by running the JS below in browsers:
// <div id=d></div>
// <script>(function () {
// var a = [], d = document.getElementById("d"), i, c, s;
// for (i = 0; i < 0x10000; ++i) {
// c = String.fromCharCode(i);
// d.innerHTML = "<span title=" + c + "lt" + c + "></span>"
// s = d.getElementsByTagName("SPAN")[0];
// if (!s || s.title !== c + "lt" + c) { a.push(i.toString(16)); }
// }
// document.write(a.join(", "));
// })()</script>

Copy link

@empijei empijei commented Dec 16, 2020

This looks like it is an extra-cautious escaping to take into account URLs and such in html attributes and other contexts.

I would be inclined to leave it as it is unless you think this is causing some misbehavior.

Copy link

@Dynom Dynom commented Dec 16, 2020

Thanks for your time!

Given a <script> context, it makes sense. I would expect the JS escape analysis to do something with it. But with risk of stating the obvious, JS escaping probably shouldn't apply to HTML.

I ran into an issue when generating HTML with timestamps and where this character got escaped.

  • html/template.HTMLEscaper() does escape < to &lt;, but not +
  • html/template.JSEscaper() does escape < to \u003C, but not +

I'm not entirely sure about the behaviour similarities between the public and private variants, but it feels like these should at least be aligned. I understand that risking up XSS is undesirable. My work-around and expectations were that, if my test data would be wrapped by HTMLEscaper() that it would result in exactly the same result, so that I can rely on that in my tests, e.g.:

patternsToMatch := []string{
for _, p := range patternsToMatch {
    if !strings.Contains(tplResult, template.HTMLEscaper(p)) {
        t.Errorf("expected %q to occur in the template, but it didn't", p)

The work-around now is to

if !strings.Contains(tplResult, func(p string) string {
	return strings.ReplaceAll(template.HTMLEscaper(p), "+", "&#43;")
}(p)) {
	t.Errorf("expected %q to occur in the template, but it didn't", p)

So I suppose it's one of:

  1. Bring HTMLEscaper() in line with htmlEscaper()
  2. Improve the escape analysis so that + is only escaped within a JS context (e.g.: <script />), although I suspect this is nearly impossible to do perfectly if obfuscation is considered as well.
  3. Leave as is

@empijei empijei added NeedsDecision and removed NeedsInvestigation labels Dec 16, 2020
Copy link

@empijei empijei commented Dec 16, 2020

I agree and I am leaning towards 1 and 3.

Asking @kele for a second opinion.

Copy link

@kele kele commented Dec 21, 2020

I'm leaning towards option 3 (leave as it is), because I think that the benefit of doing 1 (aligning HTMLEscaper with htmlEscape) is rather small, whereas making the change might break folks' tests.

If we had it documented somewhere that HTMLEscaper will always behave exactly the same as template.Template, I'd lean stronger towards 1.

Copy link

@prattmic prattmic commented Mar 2, 2021

More generally than just html/template's HTMLEscaper vs template execution escaping, there are really a bunch of different HTML escapers in the (extended) standard library.

Grouped by the same underlying implementation, I've found:

html/template template execution:;l=42;drc=2cd2ff6f564dce5be0c4fb7f06338ff7af3fc9a9




Every single one of these implementations has a slightly different set of escaped characters. As a user, particularly one unfamiliar with HTML safety, it is not at all clear which of these is "correct" or "safest" to use, or why they are different in the first place.

cc @katiehockman @empijei

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet

No branches or pull requests

6 participants