Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: html/template: add a hardened version of it to the standard library #27926

Closed
stjj89 opened this issue Sep 28, 2018 · 11 comments

Comments

@stjj89
Copy link
Contributor

@stjj89 stjj89 commented Sep 28, 2018

Overview

Add a hardened version of html/template to the Go standard library. This new template package will incorporate security engineering best practices employed within Google to guarantee–with high degree of confidence–that the HTML rendered by the template system is safe against code injection.

Background

Package html/template implements data-driven templates for generating HTML output that is safe against code injection. html/template is a "contextually auto-escaping template engine": it treats template data as untrusted plain text and escapes them so that they can be safely embedded in its HTML output. The kind of escaping applied to the data depends on the context that the data appears in (e.g. HTML, JS, CSS, URI).

Issues with html/template

While html/template is significantly better than text/template, string-formatting functions (e.g. fmt.Sprintf), and ad-hoc string concatenation for generating HTML safe against code injection, it has several shortcomings which I describe in the following sections.

Typed strings

html/template provides developers a set of typed strings (e.g. template.HTML, template.JS) to flag known-safe template data that are intended to be used without escaping or validation. This mechanism is necessary to accommodate use cases in real-world applications with complex dataflows, where developers want HTML markup, trusted data (e.g. programmer-controlled strings), or already-sanitized content that are generated in one part of their application to be preserved when it is rendered in HTML templates in remote parts of the application.

Unfortunately, these typed strings are easily misused. The most obvious way to misuse these types is create them from arbitrary, dynamic strings. This essentially disables contextual auto-escaping, thereby negating the benefit of using html/template in the first place. For example:

func RenderHTML(head, url string, b *bytes.Buffer) {
  t := template.Must(template.New("").Parse(`
<head>{{.Head}}</head>
<body><a href="{{.URL}}">Link</a></body>`))
  type data struct {
    Head template.HTML
    URL  template.URL
  }
  // This is dangerous if either head or url are not properly validated or sanitized.
  t.Execute(b, data{template.HTML(head), template.URL(url)})
}

See here and here for a real-world examples of code that explicitly opt out of auto-escaping.

In other cases, developers make more of an effort to sanitize strings before converting them into typed strings. Unfortunately, this is an error-prone process that often incorrectly duplicates the work that html/template already does under the hood. For example:

func createButton(msg string) template.HTML {
  jsEscapedMsg := template.JSEscapeString(msg)
  return template.HTML(
    fmt.Sprintf(`<button onclick="alert('%s')">Click me!</button>`, jsEscapedMsg))
}

While the generated button element might appear to be safe since msg is JavaScript-escaped, it is vulnerable to XSS due to the lack of HTML-escaping. When the browser evaluates this markup, it first HTML-unescapes the value of the onclick attribute before evaluating the JavaScript expression. Therefore, a value of msg like &#39;);attackScript();// will be HTML-unescaped and
evaluated as alert('');attackScript();//'), which results in the execution of the attacker's script.1

See here for a real-world example of code that does not validate or escape untrusted URLs before embedding them into a template.HTML value.

The lack of constraints on producing html/template typed string values seems to encourage developers to move more HTML-generation logic outside of templates into error-prone, hand-written routines. See real-world examples here and here.

Each conversion into a html/template typed string represents a potential vulnerability, and therefore must be carefully reviewed by a reviewer knowledgeable about the subtleties of HTML-injection bugs. The reviewer must ensure that the string being converted is in fact safe to use in the type's
corresponding context for all possible values of that string. Asserting this property is difficult when the data flow into the typed string conversion is sufficiently complex. Moreover, these typed string conversions make it possible for future changes in one (upstream) part of the application to cause security bugs in another (downstream, remote) part of the application. Therefore, the more often typed strings are used in a Go program, the more difficult it is to guarantee that it produces HTML that is free of code-injection vulnerabilities.

Single URL context

html/template does not differentiate between URLs that load code and those that do not. This has significant security implications. For example, the template:

<a href="{{.URL}}">Click me!</a>
<script src="{{.URL}}"></script>

will produce the following HTML output when URL is "http://www.untrustedsite.com/script.js":

<a href="http://www.untrustedsite.com/script.js">Click me!</a>
<script src="http://www.untrustedsite.com/script.js"></script>

html/template did not filter out the URL value because it contains the benign http scheme. While http://www.untrustedsite.com/script.js is safe to navigate to as a link (i.e. since the navigation will not cause untrusted, same-origin script execution in the browser), it is not safe to load an executable script from (i.e. since it will be loaded over HTTP and the contents of the script are not trusted).

JavaScript and CSS parsing

html/template allows template data to be interpolated into JavaScript (JS) and Cascading Style Sheet (CSS) contexts. It parses the JS and CSS surrounding the template data in order to contextually escape the data. For example, the following template:

<style>
{{".my_class"}} { font: {{"Arial; height: 200px"}}; background: url('{{"/search?img=foo&size=icon"}}'); }
</style>
<script>var x = 1 + {{42}}; alert({{");</script>"}});</script>

is rendered by html/template as:

<style>
.my_class { font: ZgotmplZ; background: url('/search?img=foo&size=icon'); }
</style>
<script> var x = 1 +  42 ; alert(");\u003c/script\u003e");</script>

This functionality is problematic for two reasons. The first is that JS and CSS parsing is error-prone. JS and CSS are rapidly evolving languages that our parser might not always handle correctly; layering parsers for these two languages on top of our mixed HTML-template language parser introduces more
complexity and points of failure to the package.

The second issue is that this feature encourages the security anti-pattern of using inline scripts and stylesheets. This prevents the adoption of strict Content Security Policy (CSP), where all scripts loaded by the browser undergo explicit validation before being executed. See here for more details on why inline scripts are dangerous.

Blacklist-based sanitization

html/template only understands the semantics of a certain subset of HTML elements and attributes. Elements and attributes outside of this set are assumed to have no special semantics.2

This sanitization policy is too permissive. Elements or attributes that are not understood by the html/template escaper may have semantics that are security-sensitive, particularly
custom elements and those introduced in future revisions of the HTML standard. Properly sanitizing these elements and attributes in the future will require backward-incompatible changes to html/template.

Dynamic template sources

html/template allows templates to be parsed from arbitrary strings and filenames. This makes the templates themselves susceptible to injection attacks. For example,

t1 := template.Must(template.New("").Parse(`<html>`+bodyTmpl+`</html>`))
t2 := template.Must(template.New("").ParseFiles(filename))

If an attacker can fully or partially control the values of bodyTmpl or filename, then the attacker can control the templates being loaded and hence the HTML output. Such an attack completely undermines the assumption that template authors are trustworthy.

Proposed solution

Add a new safehtml/template library that addresses the above issues. In particular, this package will:

  1. Replace html/template typed strings with a set of types with a richer, but constrained API. These types will live in a separate package safehtml, which will provide a safe-by-design API for constructing values of these types. The following sketch illustrates a subset of this API:

    package safehtml
    
    // An HTML is an immutable string-like type that is safe to use in HTML
    // contexts in DOM APIs and HTML documents.
    type HTML struct { str string }
    // HTMLEscaped returns an HTML whose value is text, with the characters [&<>"'] escaped.
    func HTMLEscaped(text string) HTML { ... }
    // HTMLConcat returns an HTML which contains, in order, the string representations
    // of the given htmls.
    func HTMLConcat(htmls ...HTML) HTML { ... }
    
    // A URL is an immutable string-like type that is safe to use in URL contexts in
    // DOM APIs and HTML documents.
    type URL struct { str string }
    // URLSanitized returns a URL whose value is url, validating that the input string matches
    // a pattern of commonly used safe URLs. If url fails validation, this method returns a
    // URL containing InnocuousURL.
    func URLSanitized(url string) URL { ... }
    
    // A TrustedResourceURL is an immutable string-like type referencing the
    // application’s own, trusted resources. It can be used to safely load scripts,
    // CSS and other sensitive resources without the risk of untrusted code execution.
    type TrustedResourceURL struct { str string }
    // TrustedResourceURLFromConstant constructs a TrustedResourceURL with its underlying
    // URL set to the given url, which must be an untyped string constant.
    func TrustedResourceURLFromConstant(url stringConstant) TrustedResourceURL { ... }
    
    // stringConstant is an unexported string type. Users of this package cannot
    // create values of this type except by passing an untyped string constant to
    // functions which expect a stringConstant. This type must be used only in
    // function and method parameters.
    type stringConstant string

    Package safehtml should provide constructors that satisfy most common use cases for constructing known-safe values outside of a template sytem. Values of these types carry strong security guarantees about strings they encapsulate; when passed around a Go program, they enable developers to depend on these properties without having to reason about whole-program dataflows. Not surprisingly, safehtml/template will also produce safehtml.HTML. This allows values from separate-evaluated HTML templates to be composed, while maintaining strong security guarantees.

    // ExecuteToHTML applies a parsed template to the specified data object,
    // returning the output as a safehtml.HTML value.
    // A template may be executed safely in parallel.
    func (t *Template) ExecuteToHTML(data interface{}) (safehtml.HTML, error) {

    For the minority of use cases that the safehtml API does not accommodate, we will provide a separate safehtml/uncheckedconversions package that converts plain strings into safe types:

    // Package uncheckedconversions contains functions that convert arbitrary strings
    // into values of package safehtml types.
    //
    // Using this package may undermine the security guarantees provided by package safehtml.
    // Users of this package are responsible for ensuring that the strings being
    // converted comply with the respective safehtml type contracts.
    package uncheckedconversions
    
    func HTMLFromStringKnownToSatisfyTypeContract(s string) safehtml.HTML { ... }
    func URLFromStringKnownToSatisfyTypeContract(s string) safehtml.URL { ... }
    func TrustedResourceURLFromStringKnownToSatisfyTypeContract(s string) safehtml.TrustedResourceURL { ... }

    These unsafe constructors will live their own separate package, much like how memory-unsafe operations all live in package unsafe, and crypto APIs prone to misuse live in package cypto/subtle. This makes code easier to security-review (i.e. "if the program doesn't import uncheckedconversions, the HTML produced by safehtml/template is definitely safe"), makes it easier to restrict the use of these functions (e.g. build systems like Bazel allow package-level visibility restrictions), and will hopefully discourage developers from unnecessarily reaching for these conversions (i.e. by requiring them to import the package and call a dangerous-sounding functions).

  2. Add different sanitization contexts for URLs that load code and those that do not. These contexts map to the safehtml.TrustedResourceURL and safehtml.URL types described above. The former type of URL will be validated more strictly than the latter.

  3. Disallow template data from being interpolated into JS and CSS contexts. The template parser will no longer attempt to parse JS or CSS. We will allow safehtml.Script and safehtml.StyleSheet values to be used in these contexts, but the constructor API around these types will be deliberately constrained. We might potentially add a switch that causes safehtml/template to disallow inline scripts and stylesheets completely, even if they appear in the programmer-controlled template text. This switch will help developers ensure that all the markup by their template is CSP-compatible.

  4. Use whitelist-based sanitization. Elements or attributes not explicitly understood by the sanitizer will be disallowed by default. Developers must explicitly whitelist these attributes using an API along the lines of:

    // Sanitize sanitizes the given element body or attribute value so that
    // it is safe to include in the template output. It returns an error only
    // if template execution must halt.
    type Sanitize func(val interface{}) (string, error)
    
    // The following Allow* methods apply to all templates associated with t.
    
    // AllowElement allows actions to appear in the body of the named element.
    // The given function will be called to sanitize the output of these actions.
    func (t *Template) AllowElement(elem string, s Sanitize) error { ... }
    // AllowAttribute allows actions to appear in value of the named attribute.
    // The given function will be called to sanitize the output of these actions.
    func (t *Template) AllowAttribute(attr string, s Sanitize) error { ... }
    // AllowAttributeInElement allows actions to appear in value of the named
    // attribute when that attribute appears in the named element.
    // The given function will be called to sanitize the output of these actions.
    func (t *Template) AllowAttributeInElement(attr, elem string, s Sanitize) error { ... }
  5. Provide a safe-by-design API for loading template text. This API will only allow templates to be loaded from programmer-controlled strings (i.e. untyped string constants) or resources under application control (e.g. environment variables, command-line flags). The following is a snippet of this new template-loading API:

    func (t *Template) ParseFiles(filenames ...stringConstant) (*Template, error) { ... }
    func (t *Template) ParseFilesFromTrustedSources(filenames ...TrustedSource) (*Template, error) { ... }
    
    // A TrustedSource is an immutable string-like type referencing trusted template files under application control.
    type TrustedSource struct { str string }
    func TrustedSourceFromConstant(src stringConstant) TrustedSource { return TrustedSource{string(src)} }
    func TrustedSourceFromFlag(value flag.Value) TrustedSource { return TrustedSource{fmt.Sprint(value.Get())} }
    func TrustedSourceFromEnvVar(key stringConstant) TrustedSource { return TrustedSource{os.Getenv(string(key))} }
    func TrustedSourceJoin(elem ...TrustedSource) TrustedSource { return TrustedSource{filepath.Join(trustedSourcesToStrings(elem)...)} }

The APIs described above are explained in great detail in Christoph Kern's Securing the Tangled Web (see "Strictly Contextually Auto-Escaping Template Engines", "Security Type Contracts", and "Unchecked Conversions"). Within Google, the security team has deployed strict contextual-autoescaping template systems (e.g. Closure templates) and safe HTML type implementations (e.g. Java and JavaScript types) across several different languages and frameworks. These packages have significantly decreased the incidence of XSS bugs without significantly impacting developer workflow.

Internal implementations of package safehtml and safehtml/template were deployed within Google a year ago. Currently, approximately 691 and 1153 Go packages use safehtml/template and safehtml respectively. Only about 15 of these packages use uncheckedconversions, all of which have been manually reviewed by our security team. Consequently, we have developed a high degree of confidence that the current API is usable and meets most developers' use cases.

Since safehtml/template is stricter than html/template by design, the features of the former cannot be easily integrated into the latter without making backward-incompatible changes.

Open Questions

  • Should safehtml/template even be in the standard library? We already have html/template and text/template. Perhaps adding yet another template library will confuse users and bloat the standard library. safehtml/template could potentially live in golang.org/x/tools or a separate GitHub repo altogether. Yet another option is to add all of this as optional functionality in html/template that can be enabled by a flag.
  • What does a robust API for creating values of safehtml types look like? The current internal implementation of safehtml is customized for the ways that Google Go programmers generate HTML and HTML-related values, which might not translate well to external use cases. Since adding new constructors to package safehtml is backwards-compatible, we can start by releasing the library with its current, field-tested API and respond to feature requests as external users adopt the package.
  • Will uncheckedconversions be misused? The functions we propose in package uncheckedconversions are essentially equivalent to the easily-misused typed strings in html/template, except with spookier-sounding names. Within Google, we use our build system to restrict the use of uncheckedconversions; any developers attempting to import the package must receive code-review approval from a security team member. Without these restrictions, will most external Go developers follow the path of least resistance and misuse uncheckedconversions, or will they adopt the principled approach and carefully review each use of those functions, preferring package safehtml constructors wherever possible?
  • Is the work of migrating from html/template to safehtml/template automatable? safehtml/template is inherently stricter than html/template, so I expect that many migrations will require manual refactoring and reasoning about program dataflows. However, we might be able to write a go fix that performs all automatable migrations, and lists the remaining areas that require manual attention.

  1. See also "A Subtle XSS Bug" in Securing the Tangled Web for a similar example where HTML- and JavaScript-escaping in the wrong order causes the same XSS vulnerability.
  2. html/template actually uses heuristics to infer the semantics of non-blacklisted attributes. If this inference fails, the attribute is assumed to have no special semantics.
@gopherbot gopherbot added this to the Proposal milestone Sep 28, 2018
@gopherbot gopherbot added the Proposal label Sep 28, 2018
@balasanjay

This comment has been minimized.

Copy link
Contributor

@balasanjay balasanjay commented Sep 29, 2018

This looks great!

One other feature I miss from Soy is automatic CSP nonce-injection for <script> tags. Any chance safehtml/template could support this? Or at the very least, take the opportunity to ensure that we don't rule it out in the API.

@stjj89

This comment has been minimized.

Copy link
Contributor Author

@stjj89 stjj89 commented Sep 29, 2018

One other feature I miss from Soy is automatic CSP nonce-injection for <script> tags. Any chance safehtml/template could support this? Or at the very least, take the opportunity to ensure that we don't rule it out in the API.

We could certainly consider doing this for safehtml/template. In fact, @mikesamuel has already proposed something like this for html/template, and @empijei is interested in making sure that package http coordinates with the template package and automatically adds the appropriate CSP headers with the right nonce values.

@empijei

This comment has been minimized.

Copy link
Contributor

@empijei empijei commented Sep 29, 2018

As @stjj89 pointed out we are also working on the auto-noncing step and on a package that ensures a strict CSP policy is always set. Since nonces are per-request contextual and all the template packages don't currently have an API that uses context we are discussing how would this best fit in the go ecosystem.

CC @FiloSottile that is reviewing the design for that.

@agnivade agnivade changed the title proposal: add a hardened version of html/template to the Go standard library proposal: html/template: add a hardened version of it to the standard library Sep 29, 2018
@bradfitz

This comment has been minimized.

Copy link
Member

@bradfitz bradfitz commented Oct 3, 2018

I feel like there's a new Best Way to generate HTML every year or so. Why does this need to live in the standard library where we can't change anything?

It's probably best if it just lives on GitHub somewhere and is properly versioned so people can upgrade to new best practices over time and you could make breaking changes.

@stjj89

This comment has been minimized.

Copy link
Contributor Author

@stjj89 stjj89 commented Oct 5, 2018

I can't say with certainty how the best practices for preventing HTML code-injection attacks will change a few years from now, but historically, large paradigm changes like this haven't happened that often.

@mikesamuel wrote html/template in 2011 when it was clear that relying on the template system, rather than the template author, to properly escape untrusted template data was a good idea. It was only years later that security engineering folks realized that a solution was required to address the issues mentioned in my proposal. The "safe types" and "strict autoescaping template system" idea was born out of that realization, and took years of field-testing within Google to prove out. Now, 7 years since html/template was written, we have accumulated enough experience with this new approach to strongly recommend it to all developers. (@xtofian and @mikesamuel , please keep my historical narrative honest.)

I don't foresee this paradigm (i.e. safe-by-design string wrappers recognized by an autoescaping template system) changing drastically any time soon. We might expand the safehtml API to fit new use cases, but these additions will be backwards-compatible. Any breaking changes to the existing safehtml constructors, safehtml/template sanitization logic, and so on will only be made for security reasons, which the Go compatibility contract allows for.

That all being said, I don't think it will be a terrible idea to place safehtml and safehtml/template in a separate GitHub repo or even in golang.org/x/tools. However, html/template is already in the standard library, and having it there allows and encourages developers to adopt security best-practices when rendering HTML. It seems to make sense to update the best practices prescribed by the standard library by adding safehtml and safehtml/template. Perhaps these two libraries can completely replace html/template some day (in Go 2?).

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Oct 10, 2018

Even if safehtml/template is intended for the standard library eventually, it should start somewhere else. That could be in the Go project or not, depending on who is going to do the core development. If this was a security team project it could just be github.com/google/safehtml. Or we could put it in a Go repo somewhere (x/net/html/...?). Probably the latter is better.

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Oct 10, 2018

Marking this declined but really it just needs to happen somewhere else first.

@rsc rsc closed this Oct 10, 2018
@stjj89

This comment has been minimized.

Copy link
Contributor Author

@stjj89 stjj89 commented Oct 10, 2018

Ok, that makes sense. I'll talk to the Go team internally to decide if this is a good fit for a Go repo.

@mikesamuel

This comment has been minimized.

Copy link
Contributor

@mikesamuel mikesamuel commented Oct 13, 2018

Sorry for being late to this party. Fwiw, I think safehtml is great and would recommend people use it however it is provided.

I don't know what @bradfitz has in mind re "best new ways", but we in security engineering have mostly managed to avoid large-scale breaking changes, and Samuel has put a lot of thought into making safehtml/template a good migration target while updating to the realities of the modern web.

Any code that aims to preserve broad security properties has to change as the threat environment changes.

Since html/template shipped, there have been changes to the threat environment:

  • it has become apparent that Content-Security-Policy has largely failed,
  • relatively immature client-side frameworks have become the rule more than the exception and almost all interpret expression strings in dangerous ways,
  • a steady stream of new URL schemes provide new vectors for abuse,
  • JS changes much more rapidly,
  • HTML has a small number of new syntactic corner cases like <template> and <iframe srcdoc>,
  • CSS has remained a vector for leaks but is no longer a vector for arbitrary code execution,
  • browsers are often embedded by mobile apps

There's a tendency to try to separate code into core library parts and user-configurable policy parts. I did that with the safe strings in html/template which turned out to be a source of footguns.

How might we manage change when

  • there are core library parts that won't change, and
  • policy parts that need to change as the threat environment changes but where the set of sensible policies is small
    ?

Might it be easier to avoid breaking changes if DSLs like templates were go-fixable?
Samuels change does encourage patterns that make it easier, in theory, to statically find template files.

@lweichselbaum

This comment has been minimized.

Copy link

@lweichselbaum lweichselbaum commented Oct 29, 2018

I agree that the model of white-list based CSPs has failed. Nonce-based CSPs on the other hand can provide a strong mitigation and can be rolled out with very little effort if the templating system supports auto-noncing.

@mikesamuel

This comment has been minimized.

Copy link
Contributor

@mikesamuel mikesamuel commented Nov 1, 2018

@lweichselbaum I agree that nonce-based CSP are a useful mitigation.
That's why I implemented auto-noncing for Pug last week.

When I call it a failure, I'm talking about compared to the expectations years ago when many including I thought it would end XSS as the most common web security vulnerability.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
8 participants
You can’t perform that action at this time.