Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html/template: add support for template strings (backticks) #9200

Open
gopherbot opened this issue Dec 3, 2014 · 31 comments
Open

html/template: add support for template strings (backticks) #9200

gopherbot opened this issue Dec 3, 2014 · 31 comments

Comments

@gopherbot
Copy link

@gopherbot gopherbot commented Dec 3, 2014

by opennota:

ES6 specifies a new language feature called "Template Strings" (often also
referred to as "Quasi Literals" alongside multi-line strings and others). This
allows to execute arbitrary JavaScript code without using parenthesis but
back-ticks instead. Inside back-tick delimited strings, placeholders such as
${} can wrap executable code.

http://play.golang.org/p/nBEneuxHNj

If you open the output of the above program in a modern browser (e.g., recently released
Mozilla Firefox 34 supports template strings), it will happily execute alert(1).

References:

https://people.mozilla.org/~jorendorff/es6-draft.html
https://html5sec.org/#140
https://html5sec.org/#141
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 3, 2014

Comment 1:

Labels changed: added repo-main, release-go1.5.

@robpike
Copy link
Contributor

@robpike robpike commented Dec 4, 2014

Comment 2:

Just what JavaScript needed, a new way to execute arbitrary code on the client machine.

Owner changed to mikesamuel.

Status changed to Accepted.

@gopherbot
Copy link
Author

@gopherbot gopherbot commented Dec 4, 2014

Comment 3 by mikesamuel:

I really should have seen that coming since I wrote the spec strawman:
https://code.google.com/p/js-quasis-libraries-and-repl/
This doesn't actually introduce a new execution vector for client-side code since the
backticks are significant inside a block of JavaScript code, not inside HTML.  Automatic
interpretation of '{{' inside <template> elements does that.  (sigh)
Will fix.

@bradfitz bradfitz added this to the Go1.5 milestone Dec 16, 2014
@bradfitz bradfitz added this to the Go1.5 milestone Dec 16, 2014
@rsc rsc removed accepted labels Apr 14, 2015
@rsc rsc added this to the Go1.6Early milestone Jul 21, 2015
@rsc rsc removed this from the Go1.5 milestone Jul 21, 2015
@ianlancetaylor ianlancetaylor added this to the Go1.6 milestone Dec 11, 2015
@ianlancetaylor ianlancetaylor removed this from the Go1.6Early milestone Dec 11, 2015
@rsc rsc added this to the Unplanned milestone Dec 17, 2015
@rsc rsc removed this from the Go1.6 milestone Dec 17, 2015
@mcfedr
Copy link

@mcfedr mcfedr commented Feb 9, 2017

Might also be an issue inside of JS attributes, as in this demo: https://play.golang.org/p/iahSK9oqLc - pops up alert in modern browsers

@mvdan mvdan changed the title html/template: doesn't properly escape content inside backticks html/template: add support for template strings (backticks) Dec 25, 2018
@mvdan
Copy link
Member

@mvdan mvdan commented Dec 25, 2018

#29406 was about malformed template output since backticks aren't parsed properly. I've re-titled this issue to be about the overall support of this ES6 feature, and closed the newer issue as a duplicate.

I'm marking it as NeedsDecision, just in case anyone disagrees with adding this feature. It's part of the JS standard, so as long as html/template follows that, it seems to me like this issue should be fixed.

@hundt
Copy link

@hundt hundt commented Mar 19, 2019

I'm not sure if anyone else is working on this, but I have given it some thought and I thought I'd share in case anyone else has some insight.

First, some notes about template literals:

  • They are enclosed by backticks.
  • They can contain "placeholders", which are marked by "${" on the left side and "}" on the right side, and which can contain any valid JS expression.
  • "Valid JS expressions" include template literals and things containing them, so they can be nested: `${`${`${1}`}`}` is valid. And this is a common use case, not an obscure edge case.
  • The non-placebolder parts of the template literals are treated as string literals.

I am not super-familiar with this codebase but to me there are two main issues that need to be solved:

  1. Detecting when we are in a template literal or placeholder expression in a Go template, so that we can escape properly, and also so that we don't get confused and parse the rest of the Go template incorrectly (as in #29406)
  2. Detecting when the data passed to Execute will be interpreted as a placeholder, allowing arbitrary code execution (as in the example given when this issue was opened).

I think if we can solve problem 1 then problem 2 is straightforward: simply replace "$" with "\x24" when inserting into a template-literal context. In the regular javascript context (including the placeholder context) any data with a backtick would already have quotes wrapped around it so it would not be considered a template literal.

Solving problem 1 is a harder given the approach of the existing code, which steps through the characters of the JS code but doesn't really parse the JS. In the code, the state of being in regular Javascript code is called stateJS. Let's suppose we add a new state stateTL for being in a template literal. Then "`" marks a transition stateJS -> stateTL or stateTL -> stateJS. And "${" also marks a transition stateTL -> stateJS. But "}" may mark a transition from stateJS -> stateTL (if it is marking the end of a placeholder) or it might just be the end of a code block. So we will need to know whether our current stateJS state is part of a placeholder inside of a template literal or not. And since template literals can nest we'll need to know how many levels deep we are.

I implemented the above as my initial solution, but then I realized it still isn't quite right, because the placeholder expression could contain "}"s. For example, `${function(){return 1}()}` is a valid template literal. So when inside a placeholder expression we also need to keep track of how many blocks deep we are, so that we know when the expression finally ends. And, since we could be multiple template literals deep, I suppose we need a stack of block depths representing how deep we are inside each placeholder expression along the way. Otherwise, given the expression `${function(){if (1>0) {return `${1}`} else {return 2}}()}`, when we exit out of the innermost template literal ("`${1}`") into the enclosing placeholder expression, we won't know if the following "}" closes that placeholder expression or not.

At this point it seems to me that I am thinking of a level of complexity beyond how the JS-handing code currently works. So I'm wondering if I'm missing a simpler solution, or if these template literals complicate the situation enough that it is simply necessary to have a more complex Javascript "parser" for html/template to work with them.

@FiloSottile
Copy link
Contributor

@FiloSottile FiloSottile commented Mar 19, 2019

/cc @empijei

@empijei
Copy link
Contributor

@empijei empijei commented Mar 20, 2019

Thanks @hundt for your analysis.

It seems to me that fixing this issue would require this package to implement a JS parser to keep track of the current context. If not a fully-fledged JS parser it would at least need to correctly understand the context-switch between Template Literals and JavaScript context.

While this is possible, it would complicate the current logic more than I'm comfortable with, and seems very tricky to get right. I played with this a little and stuff like the following line came up:

var tmpl = `"${a="`${"}}`;

My opinion would be to not support this and disallow JS template literals. This would need the documentation to be updated accordingly.

I have some ideas on how to do this:

  1. Accept backtick-delimited strings just to have multiline strings or easier quotes like Go does, and encode ${ to prevent execution;
  2. Bail out and warn users that backticks are not supported at parse time;
  3. Do not sanitize, nor change change anything that is in a template literal, but this is trickier and more dangerous and still needs decision on what to do when a variable is interpolated in such context;

I would honestly already update the docs to say that we currently do not support template literals.

We also need to decide what to do for cases like

alert`1`

Which are valid JS function calls.

@mikesamuel do you have a different opinion?

@hundt and @eternal-flame-AD what is the use case for JS template literals inside html templates? (I can guess hypothetical ones but I'm interested in your concrete one if you can publish it)

@hundt
Copy link

@hundt hundt commented Mar 20, 2019

A few notes:

I played with this a little and stuff like the following line came up:

var tmpl = `"${a="`${"}}`;

I think this kind of thing is pretty well handled by the internal state machine once it understands the backticks, because it already knows that characters inside quotes lose the special meaning they have outside quotes. I tried this input with my initial implementation that I mentioned above, with some logging to indicate state transitions, and it did the right thing:

Parse "var tmpl = `": {stateJSTLStr delimNone urlPartNone jsCtxRegexp jsTLDepth=1 attrNone elementScript <nil>}
Parse ""${": {stateJS delimNone urlPartNone jsCtxDivOp jsTLDepth=1 attrNone elementScript <nil>}
Parse "a="": {stateJSDqStr delimNone urlPartNone jsCtxRegexp jsTLDepth=1 attrNone elementScript <nil>}
Parse "`${"": {stateJS delimNone urlPartNone jsCtxDivOp jsTLDepth=1 attrNone elementScript <nil>}
Parse "}": {stateJSTLStr delimNone urlPartNone jsCtxRegexp jsTLDepth=1 attrNone elementScript <nil>}
Parse "}`": {stateJS delimNone urlPartNone jsCtxDivOp jsTLDepth=0 attrNone elementScript <nil>}

Here stateJSTLStr is the new state I added for being in a template literal string, and jsTLDepth indicates how many template literals deep you are.

We also need to decide what to do for cases like

alert`1`

Which are valid JS function calls.

I'm not sure it would be necessary to explicitly handle this. As long as backticks are implemented, I believe the current code will correctly understand that the "alert" part is in the normal Javascript context and we leave that context with the backtick.

@hundt and @eternal-flame-AD what is the use case for JS template literals inside html templates? (I can guess hypothetical ones but I'm interested in your concrete one if you can publish it)

Honestly in my case it was just a case of being lazy and using an embedded script with rendering logic inside of it on a page that also used html/template to inject in some server-side variables. My experience with this issue and my later reading of #27926 has convinced me that this was a bad approach, and I refactored my code to put substantially all of the Javascript into separately loaded modules which no longer pass through any server-side templating, which is what I probably should have done in the first place.

Given that it is probably not good practice to do what I had been doing, in my opinion it would be reasonable not to support HTML templates with JS template literals (and fail when it encounters them), or to support it only with template literals formatted in the subset of possible ways that html/template can understand (provided it can reliably detect and error out when it encounters an unsupported situation). To me the biggest problem is that the API currently promises "safe" HTML but does not deliver it. Better to simply fail in that situation.

In case it is of interest: I gathered that Google's closure-templates library is well-regarded, so I checked what they do. Just like html/template, they do a character-by-character "context" analysis of Javascript that falls short of fully parsing it. Their approach to template literals appears to be identical to the initial implementation I described above, including the failure to deal with braces inside ${. However, their template parser fails at the parsing stage of such a template, either due to an additional bug or by design (I'm not sure which).

@empijei
Copy link
Contributor

@empijei empijei commented Mar 20, 2019

my initial implementation that I mentioned above

Do you want to share a link to the patched code?

To me the biggest problem is that the API currently promises "safe" HTML but does not deliver it. Better to simply fail in that situation.

I agree. Plus I see little to no reason to nest Go and JS templates.

The solution I'm more inclined to implement and document is to just have a stateJSTLStr and encode $`\ and other potentially dangerous characters (this would probably require some thoughts on what to encode/filter and how).

@hundt if we can get to an agreement on this would you like to work on a CL to address it?

@hundt
Copy link

@hundt hundt commented Mar 20, 2019

my initial implementation that I mentioned above

Do you want to share a link to the patched code?

Ah, sorry, wasn't sure the right way to share it without requesting a PR. Here's a patch, does that work for you? https://gist.github.com/hundt/fc27e649a01722a8466987b0f102fe5d

The solution I'm more inclined to implement and document is to just have a stateJSTLStr and encode $`\ and other potentially dangerous characters (this would probably require some thoughts on what to encode/filter and how).

I think if we are going to support template literals then we would need to support ${} and nested template literals (in the HTML template, not in the input variables), since my impression is that many (most?) practical uses of template literals involve nesting (because that is how you do conditionals and loops). I believe we can do that without making my implementation much more complicated if we just fail when we encounter { (without the $) inside ${}, to avoid the problem of not knowing when the ${} ends. (I'm assuming there's a way to error out and abort when we see certain characters instead of doing a state transition but I haven't spent enough time looking at the code to be sure.)

@hundt if we can get to an agreement on this would you like to work on a CL to address it?

I'm interested but I will likely not be available in the short term. After another day or so I will likely have little time to work on this for at least two weeks. But if you don't mind waiting I can probably pick it up then.

@hundt
Copy link

@hundt hundt commented Mar 20, 2019

In case it helps illustrate some of the issues, here are some test cases indicating potential failure modes that allow code injection:

https://play.golang.org/p/61h5vn53A4H

The current code fails the first, second, and fourth tests because it does not understand template literals.

My patch fails the fifth test because it understands template literals but not braces inside ${}

@empijei
Copy link
Contributor

@empijei empijei commented Mar 21, 2019

Thanks for this.

To be honest I'd be more in favor of not supporting template literals at all instead of just partially supporting them (as much as you patch cover a good share of cases, aborting on [^$]{ seems like a hack and counts as partial support).

I was proposing to drop all ${} expressions to have some backtick-delimited-multiline-strings with easier quoting but not templating features (after all we are in a template package).

If this is not a common use case and wouldn't be useful I would remove TL completely. The approach would then be to just filter them or bail out when we see a backtick outside of strings.

@hundt
Copy link

@hundt hundt commented Mar 21, 2019

Ah okay, I misunderstood your proposal. I'm not familiar enough with how browsers parse things to know how feasible it is to "drop all ${} expressions" in a reliable enough way that nothing could sneak in, but assuming it can be done that seems okay to me. I would not use such a feature though; I think there is not much point in using template literals if they can't be templates.

I think it may be better not to support template literals at all (i.e. fail when we encounter them). #27926 has kind of convinced me that anything that discourages people from passing JS of any complexity through html/template might be a good idea. To quote from that proposal:

JS and CSS parsing is error-prone. JS and CSS are rapidly evolving languages that our parser might not always handle correctly; layering parsers for these two languages on top of our mixed HTML-template language parser introduces more complexity and points of failure to the package.

This is an example of an issue that did not exist when html/template was written, arose years later, and then continued on unfixed for more than four additional years, potentially leaving users vulnerable during that time. So I think it is good evidence in favor of the argument quoted, and against encouraging people to rely on html/template to understand their Javascript. So for that reason I am tentatively in favor of just not adding support for new Javascript features like this one. (Edit: but I do think we should add code to detect them, or anything that looks like them, and fail the template compilation.)

@empijei
Copy link
Contributor

@empijei empijei commented Apr 16, 2019

You have a good point.

+1 on bailing out if we see a template literal.

If @mikesamuel agrees we can start working on it.

@mikesamuel
Copy link
Contributor

@mikesamuel mikesamuel commented Apr 18, 2019

I'm not familiar enough with how browsers parse things to know how feasible it is to "drop all ${} expressions"

It can be done purely based on lexical analysis.
There are a few oddities because of JS is irregular around '/'.

let myRegExp = / `${2}`.length /   // No interpolation here.
2;
let myNum = 10 / `${2}`.length /   // An interpolation here
2;

i ++
   / `${2}`.length /  // Interpolation here
['index']

i =
++ / `${2}`.length /  // No interpolation here
['index']

${2} is not useful in regular expression literals; it means zero-width match of end-of-input twice.

So matching ${ outside comments and single- or double-quoted strings that are not preceded by an odd-number of \s will reliably catch all interpolations.

#27926 has kind of convinced me that anything that discourages people from passing JS of any complexity through html/template might be a good idea.

I agree. I included it because (JS was very stable at that time) and I wanted to enable

<script>intiialize({{ .InitialClientSideState }})</script>

If @mikesamuel agrees we can start working on it.

Fwiw, +1

@hundt
Copy link

@hundt hundt commented Apr 18, 2019

#27926 has kind of convinced me that anything that discourages people from passing JS of any complexity through html/template might be a good idea.

I agree. I included it because (JS was very stable at that time) and I wanted to enable

<script>intiialize({{ .InitialClientSideState }})</script>

Yeah, that seems very useful! My issue as a user who was not familiar with the codebase was that it was not clear to me that I should stick to very simple things like that and not try to pass any non-trivial scripts through html/template.

If @mikesamuel agrees we can start working on it.

Fwiw, +1

Just to be clear, you're talking about failing for any JS with template literals, right? And the earlier part of your comment (about interpolation) was for my edification (thanks!) but not relevant to the solution that would be implemented?

@ijt
Copy link
Contributor

@ijt ijt commented May 22, 2020

Hi all, I found that Go's template package seems to be treating // within JavaScript template strings as comments, so for example it turns

<script>let u = `https://foo.com`;</script>

into

<script>let u = `https:</script>

Here's a playground link for this: https://play.golang.org/p/dvW0mCby1ED. Is it intentional?

@eternal-flame-AD
Copy link

@eternal-flame-AD eternal-flame-AD commented May 22, 2020

@ijt That certainly shouldn't be the ideal behavior but that looks like what it does right now. The exact problem was discussed in #29406. Go does not know what backtick means and guessed that the // means comment and just discarded the rest.

@empijei
Copy link
Contributor

@empijei empijei commented Jun 3, 2020

Providing a backward-compatible fix to this issue has demonstrated to be harder than initially thought.

There is a change that is being experimented but it has breaking potential and I'm trying to evaluate the benefits and the issues with it.

Thanks for adding more issues that this is causing, it helps building a case to address this, even if it may break some existing code.

@karelbilek
Copy link

@karelbilek karelbilek commented Oct 29, 2020

@hundt breaking javascript randomly if URL is inserted into backticks, as @ijt posted (and we were hit now), is really "discouraging".

I am sure that is not what you meant though :)

@karelbilek
Copy link

@karelbilek karelbilek commented Oct 29, 2020

As a consumer of html/template - I am fine for disallowing literals and that go would just return error on executing the template when it sees a backtick string literal. However it's not fine to produce an invalid javascript with no indication of error anywhere (other than broken JS in user's browser).

@hundt
Copy link

@hundt hundt commented Oct 29, 2020

@karelbilek Yes, I agree with you that having Parse return an error when it sees javascript template literals is better than the current behavior from an API perspective, and that's what I've been suggesting in this discussion.

I think this would be a relatively simple fix and I would be willing to help with it, but @empijei has mentioned the issue of backwards compatibility, so I assume that is why progress has been slow. Certainly the approach of "fail upon encountering a template literal" would cause some existing, correct code to stop working, and unfortunately there would be no way to tell at compile time whether a given program is affected.

@karelbilek
Copy link

@karelbilek karelbilek commented Oct 29, 2020

Oh. I can see how go backwards compat promise can bite it sometimes.

@empijei
Copy link
Contributor

@empijei empijei commented Oct 29, 2020

I think this is worth mentioning: this issue has been fixed in safehtml/template. That package is a fork of html/template and since that doesn't have to respect the back-comp guarantee it was much easier to implement a fix there.

If you are interested in seeing the fix implementation you can take a look here.

The overall fix was 45 lines (excluding tests, but still not huge) and addresses the concerns: it ensures all ES6 template strings that appear in a Go safehtml/template are balanced and do not contain template actions.
This actually prevents inserting SafeScript snippets which contain template expressions (in an escaped context).

Should we consider to do the same here and accept to break all existing users that rely on this flaw? @FiloSottile I know that there is an escape hatch for the Go 1 guarantee for security, do you think we have a strong enough case here to apply it?

/cc @katiehockman @rolandshoemaker

@hundt
Copy link

@hundt hundt commented Oct 29, 2020

@empijei That implementation is broken in the same way as my implementation from #9200 (comment) and the one in closure-templates: it fails to distinguish between a closing brace that ends an interpolation and one that ends a code block. For example:

`${(function(){return 1;})()`

is considered not to be an unclosed template literal by that code.

@empijei
Copy link
Contributor

@empijei empijei commented Oct 29, 2020

Right. I didn't notice that, good catch (I reported it to the maintainers of that package).

@empijei
Copy link
Contributor

@empijei empijei commented Oct 29, 2020

A recap for this issue so far.

We have several possibilities in order of backward-compatibility breakage and security improvement:

  1. Do nothing and clearly document this issue.
  2. Implement a best-effort fix that tries to detect unbalanced templates (but will get it wrong in a small set of cases). By doing so we allows some JS templating, but we would still forbid Go interpolation in JS templates. Some XSS might still slip through but it would be a very remote corner case.
  3. Implement a fully-baked fix, which will require us to implement a somewhat complex parser for this syntax(note that this package parser is already quite complicated). This would also allow some JS templating but we would still forbid Go interpolation in JS templates. This solution might look like a good one but it requires a significant amount of work (and maintenance) and still has some breaking potential.
  4. Implement something simple that just matches opening and closing backticks, but that detects mixed templates and just errors if a ${ is encountered after an opening `. This was described here and would still allow backticks to be used to delimit strings (no templating functionality).
  5. Straight out ban template literals and bail out as soon as we see a backtick.

All of the last four would address the "url breaks if in backticks" case.

Did I miss anything?

@hundt
Copy link

@hundt hundt commented Oct 29, 2020

To be a little more precise, your solution (4) would ban interpolation/placeholders inside template literals but would allow simple template literals (like `string`). I had proposed a solution (5) which was to ban ALL template literals, i.e. bail as soon as you see the `. That would cause further breakage but would also remove the risk of the parser failing to detect an interpolation or being incorrect about where a template literal ends. I don't know enough about JS to know whether that is a real risk (or may become a real risk with future changes to the language). @mikesamuel did not seem very concerned.

@empijei
Copy link
Contributor

@empijei empijei commented Oct 30, 2020

Updated, thanks.

@a1gemmel
Copy link

@a1gemmel a1gemmel commented Aug 2, 2021

I ran into this today, took me a while to figure it out since one template with javascript backticks was impacted and another was not. It would be great if the html/template documentation mentioned that javascript template strings aren't currently supported. I can open a PR for that if it's wanted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet