Allow all body/head/title and only do xss removal #29

mstaack · 2016-06-21T11:53:12Z

Whats the easiest way to allow all default/basic html tags
especially body/head/title is always removed, even if i allow them.

    p.AllowElements("html", "head", "title")

looking for a quick&dirty xss remover

The text was updated successfully, but these errors were encountered:

buro9 · 2016-06-21T12:52:10Z

Use the default policy p := bluemonday.UGCPolicy() and after you have sanitised the code, the structure and form of the HTML should be consistent.

If I was being quick and dirty, that would be where I would consider reading the document as a string and checking the prefix and suffix of the string for the outer HTML tags and then replacing them with nothing.

However, that's not recommended as ideally you shouldn't do any processing after the sanitisation has been performed, and in my own code I essentially pre-process input which leads to a consistent structure before sanitisation, and at that point I strip that out and sanitise just the fragment that is the body:

https://github.com/microcosm-cc/microcosm/blob/master/models/markdown.go#L80-L88

const htmlCruft = `<html><head></head><body>`

    // The treewalking leaves behind a stub root node
    if bytes.HasPrefix(src, []byte(htmlCruft)) {
        src = src[len([]byte(htmlCruft)):]
    }

    // Scrub the generated HTML of anything nasty
    // NOTE: This *MUST* always be the last thing to avoid introducing a
    // security vulnerability
    src = SanitiseHTML(src)

Where SanitiseHTML is just my wrapper for bluemonday:

https://github.com/microcosm-cc/microcosm/blob/master/models/sanitise.go#L34-L45

var (
    textPolicy     = bluemonday.StripTagsPolicy()
    htmlPolicy     = bluemonday.UGCPolicy()
    initHTMLPolicy bool
)

// SanitiseHTML sanitizes HTML
// Leaving a safe set of HTML intact that is not going to pose an XSS risk
func SanitiseHTML(b []byte) []byte {
    if !initHTMLPolicy {
        htmlPolicy.RequireNoFollowOnLinks(false)
        htmlPolicy.RequireNoFollowOnFullyQualifiedLinks(true)
        htmlPolicy.AddTargetBlankToFullyQualifiedLinks(true)
        initHTMLPolicy = true
    }

    return htmlPolicy.SanitizeBytes(b)
}

Oh, and if you're wondering what my pre-processing was, I linkify all @ and + mentions of other users, which required building and modifying a HTML document and that has a side effect of both balancing the HTML tree as well as to produce a consistent output: https://github.com/microcosm-cc/microcosm/blob/master/models/mentions.go#L55

Answer: with pre-processing and a consistent structure it's very safe and easy to just string process it out before you sanitise, but it is also possible with post-processing though that isn't recommended.

mstaack · 2016-06-23T13:50:33Z

@buro9 thanks for your help!
btw: how would you turn this code into a single standalone executable where i could pass the allowed attributes / tags via args?

buro9 · 2016-06-23T14:19:13Z

You could encapsulate everything that is a policy as a JSON file and treat that as a configuration to be loaded by a flag, and then construct the policy and execute it against either stdin or a file input.

mstaack · 2016-06-23T16:33:04Z

can you help me a little? especially how would you map/assign the json vars to a policy

right now i am passing two comma separated lists as args.
This is where i am right now (using https://github.com/alecthomas/kingpin):

var (
    tags = kingpin.Arg("tags", "Comma separated tags/elements to allow").String()
    attributes = kingpin.Arg("attributes", "Comma separated tag-attributes to allow").String()
)

func main() {

    kingpin.Parse()

    tags := strings.Split(*tags, ",")
    attributes := strings.Split(*attributes, ",")

    p := bluemonday.UGCPolicy()

    p.AllowAttrs(attributes...).Globally()
    p.AllowElements(tags...)

buro9 · 2016-06-23T16:44:19Z

That is not what I would do. The args would be massive.

I'd have a single arg, that was the path to a JSON file.

The JSON file should be structured such that you can loop through and construct the policy.

That's it.

mstaack · 2016-06-23T18:49:38Z

okay sounds good, will follow that approach.

given this html:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
    <title>Demystifying Email Design</title>
    <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
</head>
</html>

how would you allow head/meta ?

right now i am trying this, but meta is always removed:

    tags := []string{"html", "head", "meta", "title", "body"}
    attributes := []string{"xmlns" }

    p := bluemonday.NewPolicy()
    p.AllowDocType(true)
    p.AllowNoAttrs()

    p.AllowAttrs(attributes...).Globally()
    p.AllowElements(tags...)

and result:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>

    Demystifying Email Design

</head>
</html>

buro9 closed this as completed Jun 21, 2016

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow all body/head/title and only do xss removal #29

Allow all body/head/title and only do xss removal #29

mstaack commented Jun 21, 2016

buro9 commented Jun 21, 2016

mstaack commented Jun 23, 2016

buro9 commented Jun 23, 2016

mstaack commented Jun 23, 2016 •

edited

Loading

buro9 commented Jun 23, 2016

mstaack commented Jun 23, 2016

Allow all body/head/title and only do xss removal #29

Allow all body/head/title and only do xss removal #29

Comments

mstaack commented Jun 21, 2016

buro9 commented Jun 21, 2016

mstaack commented Jun 23, 2016

buro9 commented Jun 23, 2016

mstaack commented Jun 23, 2016 • edited Loading

buro9 commented Jun 23, 2016

mstaack commented Jun 23, 2016

mstaack commented Jun 23, 2016 •

edited

Loading