Skip to content

Commit

Permalink
Added two new link methods, one restricts the use of rel='nofollow' t…
Browse files Browse the repository at this point in the history
…o fully qualified links only, and the other allows you to define a policy in which fully qualified links are forced into target='blank'
  • Loading branch information
buro9 committed Aug 11, 2014
1 parent 0dee5a7 commit 63440de
Show file tree
Hide file tree
Showing 4 changed files with 174 additions and 20 deletions.
5 changes: 2 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,11 +138,10 @@ func main() {
}
```

We ship three default policies:
We ship two default policies:

1. `bluemonday.StrictPolicy()` which can be thought of as equivalent to stripping all HTML elements and their attributes as it has nothing on it's whitelist. An example usage scenario would be blog post titles where HTML tags are not expected at all and if they are then the elements *and* the content of the elements should be stripped. This is a *very* strict policy.
2. `bluemonday.StripTagsPolicy()` which will strip all HTML elements and their attributes the text, but will preserve the textual content between the start and end tag of an element in the vast majority of cases. This also has a usage scenario similar to blog titles, but now expects HTML in the input and acts to preserve as much of the displayable text as possible whilst discarding the HTML.
3. `bluemonday.UGCPolicy()` which allows a broad selection of HTML elements and attributes that are safe for user generated content. Note that this policy does *not* whitelist iframes, object, embed, styles, script, etc. An example usage scenario would be blog post bodies where a variety of formatting is expected along with the potential for TABLEs and IMGs.
2. `bluemonday.UGCPolicy()` which allows a broad selection of HTML elements and attributes that are safe for user generated content. Note that this policy does *not* whitelist iframes, object, embed, styles, script, etc. An example usage scenario would be blog post bodies where a variety of formatting is expected along with the potential for TABLEs and IMGs.

## Policy Building

Expand Down
36 changes: 36 additions & 0 deletions policy.go
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,16 @@ type Policy struct {
// When true, add rel="nofollow" to HTML anchors
requireNoFollow bool

// When true, add rel="nofollow" to HTML anchors
// Will add for href="http://foo"
// Will skip for href="/foo" or href="foo"
requireNoFollowFullyQualifiedLinks bool

// When true add target="_blank" to fully qualified links
// Will add for href="http://foo"
// Will skip for href="/foo" or href="foo"
addTargetBlankToFullyQualifiedLinks bool

// When true, URLs must be parseable by "net/url" url.Parse()
requireParseableURLs bool

Expand Down Expand Up @@ -216,6 +226,32 @@ func (p *Policy) RequireNoFollowOnLinks(require bool) *Policy {
return p
}

// RequireNoFollowOnFullyQualifiedLinks will result in all <a> tags that point
// to a non-local destination (i.e. starts with a protocol and has a host)
// having a rel="nofollow" added to them if one does not already exist
//
// Note: This requires p.RequireParseableURLs(true) and will enable it.
func (p *Policy) RequireNoFollowOnFullyQualifiedLinks(require bool) *Policy {

p.requireNoFollowFullyQualifiedLinks = require
p.requireParseableURLs = true

return p
}

// AddTargetBlankToFullyQualifiedLinks will result in all <a> tags that point
// to a non-local destination (i.e. starts with a protocol and has a host)
// having a target="_blank" added to them if one does not already exist
//
// Note: This requires p.RequireParseableURLs(true) and will enable it.
func (p *Policy) AddTargetBlankToFullyQualifiedLinks(require bool) *Policy {

p.addTargetBlankToFullyQualifiedLinks = require
p.requireParseableURLs = true

return p
}

// RequireParseableURLs will result in all URLs requiring that they be parseable
// by "net/url" url.Parse()
// This applies to:
Expand Down
76 changes: 59 additions & 17 deletions sanitize.go
Original file line number Diff line number Diff line change
Expand Up @@ -271,49 +271,91 @@ func (p *Policy) sanitizeAttrs(
cleanAttrs = tmpAttrs
}

if linkable(elementName) && p.requireNoFollow && len(cleanAttrs) > 0 {
if linkable(elementName) &&
(p.requireNoFollow ||
p.requireNoFollowFullyQualifiedLinks ||
p.addTargetBlankToFullyQualifiedLinks) &&
len(cleanAttrs) > 0 {

// Add rel="nofollow" if a "href" exists
switch elementName {
case "a", "area", "link":
var hrefFound bool
var externalLink bool
for _, htmlAttr := range cleanAttrs {
if htmlAttr.Key == "href" {
hrefFound = true

u, err := url.Parse(htmlAttr.Val)
if err != nil {
continue
}
if u.Host != "" {
externalLink = true
}

continue
}
}

if hrefFound {
tmpAttrs := []html.Attribute{}
var relFound bool
var noFollowFound bool
var targetFound bool

addNoFollow := (p.requireNoFollow ||
externalLink && p.requireNoFollowFullyQualifiedLinks)
addTargetBlank := (externalLink &&
p.addTargetBlankToFullyQualifiedLinks)

for _, htmlAttr := range cleanAttrs {
if htmlAttr.Key == "rel" {
relFound = true

if htmlAttr.Key == "rel" &&
addNoFollow {

if strings.Contains(htmlAttr.Val, "nofollow") {
noFollowFound = true
continue
} else {
htmlAttr.Val += " nofollow"
noFollowFound = true
tmpAttrs = append(tmpAttrs, htmlAttr)
}

htmlAttr.Val += " nofollow"
tmpAttrs = append(tmpAttrs, htmlAttr)
} else {
tmpAttrs = append(tmpAttrs, htmlAttr)
}

if elementName == "a" &&
htmlAttr.Key == "target" &&
addTargetBlank {

if strings.Contains(htmlAttr.Val, "_blank") {
targetFound = true
} else {
htmlAttr.Val = "_blank"
targetFound = true
tmpAttrs = append(tmpAttrs, htmlAttr)
}
} else {
tmpAttrs = append(tmpAttrs, htmlAttr)
}
}
if noFollowFound {
break
}
if relFound {
if noFollowFound || targetFound {
cleanAttrs = tmpAttrs
break
}

rel := html.Attribute{}
rel.Key = "rel"
rel.Val = "nofollow"
cleanAttrs = append(cleanAttrs, rel)
if addNoFollow && !noFollowFound {
rel := html.Attribute{}
rel.Key = "rel"
rel.Val = "nofollow"
cleanAttrs = append(cleanAttrs, rel)
}

if element == "a" && addTargetBlank && !targetFound {
rel := html.Attribute{}
rel.Key = "target"
rel.Val = "_blank"
cleanAttrs = append(cleanAttrs, rel)
}
}
default:
}
Expand Down
77 changes: 77 additions & 0 deletions sanitize_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,83 @@ func TestLinks(t *testing.T) {
wg.Wait()
}

func TestLinkTargets(t *testing.T) {

tests := []test{
test{
in: `<a href="http://www.google.com">`,
expected: `<a href="http://www.google.com" rel="nofollow" target="_blank">`,
},
test{
in: `<a href="//www.google.com">`,
expected: `<a href="//www.google.com" rel="nofollow" target="_blank">`,
},
test{
in: `<a href="/www.google.com">`,
expected: `<a href="/www.google.com">`,
},
test{
in: `<a href="www.google.com">`,
expected: `<a href="www.google.com">`,
},
test{
in: `<a href="javascript:alert(1)">`,
expected: ``,
},
test{
in: `<a href="#">`,
expected: ``,
},
test{
in: `<a href="#top">`,
expected: `<a href="#top">`,
},
test{
in: `<a href="?">`,
expected: ``,
},
test{
in: `<a href="?q=1">`,
expected: `<a href="?q=1">`,
},
test{
in: `<img src="" alt="Red dot" />`,
expected: `<img alt="Red dot"/>`,
},
test{
in: `<img src="giraffe.gif" />`,
expected: `<img src="giraffe.gif"/>`,
},
}

p := UGCPolicy()
p.RequireParseableURLs(true)
p.RequireNoFollowOnLinks(false)
p.RequireNoFollowOnFullyQualifiedLinks(true)
p.AddTargetBlankToFullyQualifiedLinks(true)

// These tests are run concurrently to enable the race detector to pick up
// potential issues
wg := sync.WaitGroup{}
wg.Add(len(tests))
for ii, tt := range tests {
go func(ii int, tt test) {
out := p.Sanitize(tt.in)
if out != tt.expected {
t.Errorf(
"test %d failed;\ninput : %s\noutput : %s\nexpected: %s",
ii,
tt.in,
out,
tt.expected,
)
}
wg.Done()
}(ii, tt)
}
wg.Wait()
}

func TestStyling(t *testing.T) {

tests := []test{
Expand Down

6 comments on commit 63440de

@dmitshur
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit seems to be over-aggressive at adding things, even if they already exist.

The tests of github.com/shurcooL/go/github_flavored_markdown pass with 0dee5a7, but with this commit I get:

--- FAIL: ExampleHeader (607.06us)
got:
<h2><a name="git-diff" name="git-diff" class="anchor" class="anchor" href="#git-diff" href="#git-diff" rel="nofollow" aria-hidden="true" aria-hidden="true"><span class="octicon octicon-link"></span></a>git diff</h2>
want:
<h2><a name="git-diff" class="anchor" href="#git-diff" rel="nofollow" aria-hidden="true"><span class="octicon octicon-link"></span></a>git diff</h2>
FAIL
exit status 1

Note that it duplicates name, class, href, and aria-hidden.

@dmitshur
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts? Or do you not consider this to be a problem?

@buro9
Copy link
Member Author

@buro9 buro9 commented on 63440de Aug 30, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely a regression, not the feature but the implementation.

I shall resolve it, but am not presently near any of my dev machines (I'm at a conference), so this will be done as soon as I'm back.

@dmitshur
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, that plan works great for me. Thank you.

@dmitshur
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created #9 to track this and let you have the satisfaction of closing an issue when you do fix it.

@buro9
Copy link
Member Author

@buro9 buro9 commented on 63440de Aug 30, 2014 via email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.