Skip to content

Commit

Permalink
exp/template/html: pre-sanitized content
Browse files Browse the repository at this point in the history
Not all content is plain text.  Sometimes content comes from a trusted
source, such as another template invocation, an HTML tag whitelister,
etc.

Template authors can deal with over-escaping in two ways.

1) They can encapsulate known-safe content via
   type HTML, type CSS, type URL, and friends in content.go.
2) If they know that the for a particular action never needs escaping
   then they can add |noescape to the pipeline.
   {{.KnownSafeContent | noescape}}
   which will prevent any escaping directives from being added.

This CL defines string type aliases: HTML, CSS, JS, URI, ...
It then modifies stringify to unpack the content type.
Finally it modifies the escaping functions to use the content type and
decline to escape content that does not require it.

There are minor changes to escapeAction and helpers to treat as
equivalent explicit escaping directives such as "html" and "urlquery"
and the escaping directives defined in the contextual autoescape module
and to recognize the special "noescape" directive.

The html escaping functions are rearranged.  Instead of having one
escaping function used in each {{.}} in

    {{.}} : <textarea title="{{.}}">{{.}}</textarea>

a slightly different escaping function is used for each.
When {{.}} binds to a pre-sanitized string of HTML

    `one < <i>two</i> &amp; two < "3"`

we produces something like

     one < <i>two</i> &amp; two < "3" :
     <textarea title="one &lt; two &amp; two &lt; &#34;3&#34;">
       one &lt; &lt;i&gt;two&lt;/i&gt; &amp; two &lt; "3"
     </textarea>

Although escaping is not required in <textarea> normally, if the
substring </textarea> is injected, then it breaks, so we normalize
special characters in RCDATA and do the same to preserve attribute
boundaries.  We also strip tags since developers never intend
typed HTML injected in an attribute to contain tags escaped, but
do occasionally confuse pre-escaped HTML with HTML from a
tag-whitelister.

R=golang-dev, nigeltao
CC=golang-dev
https://golang.org/cl/4962067
  • Loading branch information
mikesamuel committed Sep 15, 2011
1 parent f41ab6c commit ce008f8
Show file tree
Hide file tree
Showing 11 changed files with 639 additions and 144 deletions.
1 change: 1 addition & 0 deletions src/pkg/exp/template/html/Makefile
Expand Up @@ -7,6 +7,7 @@ include ../../../../Make.inc
TARG=exp/template/html
GOFILES=\
clone.go\
content.go\
context.go\
css.go\
doc.go\
Expand Down
83 changes: 83 additions & 0 deletions src/pkg/exp/template/html/content.go
@@ -0,0 +1,83 @@
// Copyright 2011 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.

package html

import (
"fmt"
)

// Strings of content from a trusted source.
type (
// CSS encapsulates known safe content that matches any of:
// (1) The CSS3 stylesheet production, such as `p { color: purple }`.
// (2) The CSS3 rule production, such as `a[href=~"https:"].foo#bar`.
// (3) CSS3 declaration productions, such as `color: red; margin: 2px`.
// (4) The CSS3 value production, such as `rgba(0, 0, 255, 127)`.
// See http://www.w3.org/TR/css3-syntax/#style
CSS string

// HTML encapsulates a known safe HTML document fragment.
// Should not be used for HTML from a third-party, or HTML with
// unclosed tags or comments. The outputs of a sound HTML sanitizer
// and a template escaped by this package are fine for use with HTML.
HTML string

// JS encapsulates a known safe EcmaScript5 Expression, or example,
// `(x + y * z())`.
// Template authors are responsible for ensuring that typed expressions
// do not break the intended precedence and that there is no
// statement/expression ambiguity as when passing an expression like
// "{ foo: bar() }\n['foo']()", which is both a valid Expression and a
// valid Program with a very different meaning.
JS string

// JSStr encapsulates a sequence of characters meant to be embedded
// between quotes in a JavaScript expression.
// The string must match a series of StringCharacters:
// StringCharacter :: SourceCharacter but not `\` or LineTerminator
// | EscapeSequence
// Note that LineContinuations are not allowed.
// JSStr("foo\\nbar") is fine, but JSStr("foo\\\nbar") is not.
JSStr string

// URL encapsulates a known safe URL as defined in RFC 3896.
// A URL like `javascript:checkThatFormNotEditedBeforeLeavingPage()`
// from a trusted source should go in the page, but by default dynamic
// `javascript:` URLs are filtered out since they are a frequently
// exploited injection vector.
URL string
)

type contentType uint8

const (
contentTypePlain contentType = iota
contentTypeCSS
contentTypeHTML
contentTypeJS
contentTypeJSStr
contentTypeURL
)

// stringify converts its arguments to a string and the type of the content.
func stringify(args ...interface{}) (string, contentType) {
if len(args) == 1 {
switch s := args[0].(type) {
case string:
return s, contentTypePlain
case CSS:
return string(s), contentTypeCSS
case HTML:
return string(s), contentTypeHTML
case JS:
return string(s), contentTypeJS
case JSStr:
return string(s), contentTypeJSStr
case URL:
return string(s), contentTypeURL
}
}
return fmt.Sprint(args...), contentTypePlain
}
196 changes: 196 additions & 0 deletions src/pkg/exp/template/html/content_test.go
@@ -0,0 +1,196 @@
// Copyright 2011 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.

package html

import (
"bytes"
"strings"
"template"
"testing"
)

func TestTypedContent(t *testing.T) {
data := []interface{}{
`<b> "foo%" O'Reilly &bar;`,
CSS(`a[href =~ "//example.com"]#foo`),
HTML(`Hello, <b>World</b> &amp;tc!`),
JS(`c && alert("Hello, World!");`),
JSStr(`Hello, World & O'Reilly\x21`),
URL(`greeting=H%69&addressee=(World)`),
}

// For each content sensitive escaper, see how it does on
// each of the typed strings above.
tests := []struct {
// A template containing a single {{.}}.
input string
want []string
}{
{
`<style>{{.}} { color: blue }</style>`,
[]string{
`ZgotmplZ`,
// Allowed but not escaped.
`a[href =~ "//example.com"]#foo`,
`ZgotmplZ`,
`ZgotmplZ`,
`ZgotmplZ`,
`ZgotmplZ`,
},
},
{
`<div style="{{.}}">`,
[]string{
`ZgotmplZ`,
// Allowed and HTML escaped.
`a[href =~ &#34;//example.com&#34;]#foo`,
`ZgotmplZ`,
`ZgotmplZ`,
`ZgotmplZ`,
`ZgotmplZ`,
},
},
{
`{{.}}`,
[]string{
`&lt;b&gt; &#34;foo%&#34; O&#39;Reilly &amp;bar;`,
`a[href =~ &#34;//example.com&#34;]#foo`,
// Not escaped.
`Hello, <b>World</b> &amp;tc!`,
`c &amp;&amp; alert(&#34;Hello, World!&#34;);`,
`Hello, World &amp; O&#39;Reilly\x21`,
`greeting=H%69&amp;addressee=(World)`,
},
},
{
`<a title={{.}}>`,
[]string{
`&lt;b&gt;&#32;&#34;foo%&#34;&#32;O&#39;Reilly&#32;&amp;bar;`,
`a[href&#32;&#61;~&#32;&#34;//example.com&#34;]#foo`,
// Tags stripped, spaces escaped, entity not re-escaped.
`Hello,&#32;World&#32;&amp;tc!`,
`c&#32;&amp;&amp;&#32;alert(&#34;Hello,&#32;World!&#34;);`,
`Hello,&#32;World&#32;&amp;&#32;O&#39;Reilly\x21`,
`greeting&#61;H%69&amp;addressee&#61;(World)`,
},
},
{
`<a title='{{.}}'>`,
[]string{
`&lt;b&gt; &#34;foo%&#34; O&#39;Reilly &amp;bar;`,
`a[href =~ &#34;//example.com&#34;]#foo`,
// Tags stripped, entity not re-escaped.
`Hello, World &amp;tc!`,
`c &amp;&amp; alert(&#34;Hello, World!&#34;);`,
`Hello, World &amp; O&#39;Reilly\x21`,
`greeting=H%69&amp;addressee=(World)`,
},
},
{
`<textarea>{{.}}</textarea>`,
[]string{
`&lt;b&gt; &#34;foo%&#34; O&#39;Reilly &amp;bar;`,
`a[href =~ &#34;//example.com&#34;]#foo`,
// Angle brackets escaped to prevent injection of close tags, entity not re-escaped.
`Hello, &lt;b&gt;World&lt;/b&gt; &amp;tc!`,
`c &amp;&amp; alert(&#34;Hello, World!&#34;);`,
`Hello, World &amp; O&#39;Reilly\x21`,
`greeting=H%69&amp;addressee=(World)`,
},
},
{
`<script>alert({{.}})</script>`,
[]string{
`"\u003cb\u003e \"foo%\" O'Reilly &bar;"`,
`"a[href =~ \"//example.com\"]#foo"`,
`"Hello, \u003cb\u003eWorld\u003c/b\u003e &amp;tc!"`,
// Not escaped.
`c && alert("Hello, World!");`,
// Escape sequence not over-escaped.
`"Hello, World & O'Reilly\x21"`,
`"greeting=H%69&addressee=(World)"`,
},
},
{
`<button onclick="alert({{.}})">`,
[]string{
`&#34;\u003cb\u003e \&#34;foo%\&#34; O&#39;Reilly &amp;bar;&#34;`,
`&#34;a[href =~ \&#34;//example.com\&#34;]#foo&#34;`,
`&#34;Hello, \u003cb\u003eWorld\u003c/b\u003e &amp;amp;tc!&#34;`,
// Not JS escaped but HTML escaped.
`c &amp;&amp; alert(&#34;Hello, World!&#34;);`,
// Escape sequence not over-escaped.
`&#34;Hello, World &amp; O&#39;Reilly\x21&#34;`,
`&#34;greeting=H%69&amp;addressee=(World)&#34;`,
},
},
{
`<script>alert("{{.}}")</script>`,
[]string{
`\x3cb\x3e \x22foo%\x22 O\x27Reilly \x26bar;`,
`a[href =~ \x22\/\/example.com\x22]#foo`,
`Hello, \x3cb\x3eWorld\x3c\/b\x3e \x26amp;tc!`,
`c \x26\x26 alert(\x22Hello, World!\x22);`,
// Escape sequence not over-escaped.
`Hello, World \x26 O\x27Reilly\x21`,
`greeting=H%69\x26addressee=(World)`,
},
},
{
`<button onclick='alert("{{.}}")'>`,
[]string{
`\x3cb\x3e \x22foo%\x22 O\x27Reilly \x26bar;`,
`a[href =~ \x22\/\/example.com\x22]#foo`,
`Hello, \x3cb\x3eWorld\x3c\/b\x3e \x26amp;tc!`,
`c \x26\x26 alert(\x22Hello, World!\x22);`,
// Escape sequence not over-escaped.
`Hello, World \x26 O\x27Reilly\x21`,
`greeting=H%69\x26addressee=(World)`,
},
},
{
`<a href="?q={{.}}">`,
[]string{
`%3cb%3e%20%22foo%25%22%20O%27Reilly%20%26bar%3b`,
`a%5bhref%20%3d~%20%22%2f%2fexample.com%22%5d%23foo`,
`Hello%2c%20%3cb%3eWorld%3c%2fb%3e%20%26amp%3btc%21`,
`c%20%26%26%20alert%28%22Hello%2c%20World%21%22%29%3b`,
`Hello%2c%20World%20%26%20O%27Reilly%5cx21`,
// Quotes and parens are escaped but %69 is not over-escaped. HTML escaping is done.
`greeting=H%69&amp;addressee=%28World%29`,
},
},
{
`<style>body { background: url('?img={{.}}') }</style>`,
[]string{
`%3cb%3e%20%22foo%25%22%20O%27Reilly%20%26bar%3b`,
`a%5bhref%20%3d~%20%22%2f%2fexample.com%22%5d%23foo`,
`Hello%2c%20%3cb%3eWorld%3c%2fb%3e%20%26amp%3btc%21`,
`c%20%26%26%20alert%28%22Hello%2c%20World%21%22%29%3b`,
`Hello%2c%20World%20%26%20O%27Reilly%5cx21`,
// Quotes and parens are escaped but %69 is not over-escaped. HTML escaping is not done.
`greeting=H%69&addressee=%28World%29`,
},
},
}

for _, test := range tests {
tmpl := template.Must(Escape(template.Must(template.New("x").Parse(test.input))))
pre := strings.Index(test.input, "{{.}}")
post := len(test.input) - (pre + 5)
var b bytes.Buffer
for i, x := range data {
b.Reset()
if err := tmpl.Execute(&b, x); err != nil {
t.Errorf("%q with %v: %s", test.input, x, err)
continue
}
if want, got := test.want[i], b.String()[pre:b.Len()-post]; want != got {
t.Errorf("%q with %v:\nwant\n\t%q,\ngot\n\t%q\n", test.input, x, want, got)
continue
}
}
}
}
14 changes: 9 additions & 5 deletions src/pkg/exp/template/html/css.go
Expand Up @@ -146,7 +146,7 @@ func skipCSSSpace(c []byte) []byte {

// cssEscaper escapes HTML and CSS special characters using \<hex>+ escapes.
func cssEscaper(args ...interface{}) string {
s := stringify(args...)
s, _ := stringify(args...)
var b bytes.Buffer
written := 0
for i, r := range s {
Expand Down Expand Up @@ -218,7 +218,11 @@ var mozBindingBytes = []byte("mozbinding")
// It filters out unsafe values, such as those that affect token boundaries,
// and anything that might execute scripts.
func cssValueFilter(args ...interface{}) string {
s, id := decodeCSS([]byte(stringify(args...))), make([]byte, 0, 64)
s, t := stringify(args...)
if t == contentTypeCSS {
return s
}
b, id := decodeCSS([]byte(s)), make([]byte, 0, 64)

// CSS3 error handling is specified as honoring string boundaries per
// http://www.w3.org/TR/css3-syntax/#error-handling :
Expand All @@ -231,14 +235,14 @@ func cssValueFilter(args ...interface{}) string {
// So we need to make sure that values do not have mismatched bracket
// or quote characters to prevent the browser from restarting parsing
// inside a string that might embed JavaScript source.
for i, c := range s {
for i, c := range b {
switch c {
case 0, '"', '\'', '(', ')', '/', ';', '@', '[', '\\', ']', '`', '{', '}':
return filterFailsafe
case '-':
// Disallow <!-- or -->.
// -- should not appear in valid identifiers.
if i != 0 && '-' == s[i-1] {
if i != 0 && '-' == b[i-1] {
return filterFailsafe
}
default:
Expand All @@ -251,5 +255,5 @@ func cssValueFilter(args ...interface{}) string {
if bytes.Index(id, expressionBytes) != -1 || bytes.Index(id, mozBindingBytes) != -1 {
return filterFailsafe
}
return string(s)
return string(b)
}
39 changes: 4 additions & 35 deletions src/pkg/exp/template/html/doc.go
Expand Up @@ -313,19 +313,16 @@ plain text string in the appropriate context.
When a data value is not plain text, you can make sure it is not over-escaped
by marking it with its type.
A value that implements interface TypedStringer can carry known-safe content.
type safeHTML struct{}
func (s safeHTML) String() string { return `<b>World</b>` }
func (s safeHTML) ContentType() ContentType { return ContentTypeHTML }
Types HTML, JS, URL, and others from content.go can carry safe content that is
exempted from escaping.
The template
Hello, {{.}}!
can be invoked with
tmpl.Execute(out, safeHTML{})
tmpl.Execute(out, HTML(`<b>World</b>`))
to produce
Expand All @@ -335,35 +332,7 @@ instead of the
Hello, &lt;b&gt;World&lt;b&gt;!
which would have been produced if {{.}} did not implement TypedStringer.
ContentTypeHTML attaches to a well-formed HTML DocumentFragment.
Do not use it for HTML from a third-party, or HTML with unclosed tags or
comments. The outputs of a sound HTML sanitizer and a template escaped by
this package are examples of ContentTypeHTML.
ContentTypeCSS attaches to a well-formed safe content that matches:
(1) The CSS3 stylesheet production, for example `p { color: purple }`
(2) The CSS3 rule production, for example `a[href=~"https:"].foo#bar`
(3) CSS3 declaration productions, for example `color: red; margin: 2px`
(4) The CSS3 value production, for example `rgba(0, 0, 255, 127)`
ContentTypeJS attaches to a well-formed JavaScript (EcmaScript5) Expression
production, for example `(x + y * z())`. Template authors are responsible
for ensuring that typed expressions do not break the intended precedence and
that there is no statement/expression ambiguity as when passing an expression
like "{ foo: bar() }\n['foo']()" which is both a valid Expression and a valid
Program with a very different meaning.
ContentTypeJSStr attaches to a snippet of \-escaped characters that could be
quoted to form a JavaScript string literal. For example, foo\nbar with quotes
around it makes a valid JavaScript string literal.
ContentTypeURL attaches to a URL fragment from a trusted source.
A URL like `javascript:checkThatFormNotEditedBeforeLeavingPage()`
from a trusted source should go in the page, but by default dynamic
`javascript:` URLs are filtered out since they are a frequently
successfully exploited injection vector.
that would have been produced if {{.}} was a regular string.
Security Model
Expand Down

0 comments on commit ce008f8

Please sign in to comment.