Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Allow strong filtering of linked URLs #156

Closed
wants to merge 3 commits into from

3 participants

@iamcal

Currently, apps using this library for turning user-entered Markdown into HTML are vulnerable to XSS attacks via allowing javascript (and others) as the protocol for links. For example:

[CLICK ME](javascript:alert(document.cookie))

This PR passes all link URLs through the filterUrl() method. By default, this function will allow through URls using a whitelist of protocols, or local/relative URLs without a protocol. There is some heavy decoding involved to make sure URLs like javascript%3Aalert(document.cookie) don't slip through.

This change breaks two tests in the current suite ("Links, inline style" & "Quotes in attributes") since they both involve putting broken URLs inside links. Changing the test to have URLs prefixed with http:// makes these test pass correctly.

@michelf
Owner

By default, PHP Markdown lets HTML snippets pass through unchanged. This is part of the design of Markdown. Markdown is not designed for dealing with unfiltered user input or a substitute for an XSS filter.

Assuming we add one, I don't think an URL filtering system should be active by default, because it'd break many current use cases (starting with my own website) where the input is trusted with relative links as well as HTML snippets.

But the question is: is this enough to prevent XSS? Assuming I wanted the feature, I wouldn't accept anything less than a complete solution, otherwise it'll lead to people thinking they're safe when they're not. Before we can pretend we can filter XSS properly, we should make sure we really do, with some auditing ideally. A glaring omission I see immediately in your patch is the filtering of image URLs. I wouldn't be surprised there are other ways to leak javascript I haven't thought about.

My recommendation is to run an external XSS filter on Markdown's output. This way you know that whatever weird bug the parser has you're still safe. Bolting XSS filtering in a parser that does all those other unrelated complicated string manipulations is a recipe for security holes. I believe it's better to keep XSS filtering as a separate step.

On the subject:
http://michelf.ca/blog/2010/markdown-and-xss/

@iamcal

All good points!

I'm using Markdown with markup and entities disabled, so regular HTML is not preserved. I can certainly understand not wanting to get into having to deal with all the possible transform exploits as security vulnerabilities rather than just plain old bugs.

Worth noting that the patch wont break your current usage - it just breaks some non real-world tests in the suite:

url://with spaces
/"style="color:red
/'style='color:red
@iamcal iamcal closed this
@barryvdh barryvdh referenced this pull request in LaravelIO/laravel.io
Closed

XSS vulnerability #120

@markseu

I want to add my interest.

An external XSS filter like HTML purifier is big and slow. I wonder what would happen if you switch off HTML code and apply a strong filter to all Markdown generated links (anywhere JavaScript can leak trough). Like Michel said, it must be a complete solution, not just pretending to do the job.

Is this something url_filter_func from issue #85 can be used for?

@michelf
Owner

@markseu The new url_filter_func should allow transforming suspicious URLs into innocuous ones. Feel free to experiment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
This page is out of date. Refresh to see the latest.
Showing with 106 additions and 5 deletions.
  1. +106 −5 Michelf/Markdown.php
View
111 Michelf/Markdown.php
@@ -59,6 +59,9 @@ public static function defaultTransform($text) {
public $predef_urls = array();
public $predef_titles = array();
+ # Protocols allows for link targets
+ public $links_protocols = array('http', 'https', 'ftp', 'mailto');
+ public $links_relative = true;
### Parser Implementation ###
@@ -593,6 +596,7 @@ protected function _doAnchors_reference_callback($matches) {
if (isset($this->urls[$link_id])) {
$url = $this->urls[$link_id];
+ $url = $this->filterUrl($url);
$url = $this->encodeAttribute($url);
$result = "<a href=\"$url\"";
@@ -617,6 +621,7 @@ protected function _doAnchors_inline_callback($matches) {
$url = $matches[3] == '' ? $matches[4] : $matches[3];
$title =& $matches[7];
+ $url = $this->filterUrl($url);
$url = $this->encodeAttribute($url);
$result = "<a href=\"$url\"";
@@ -698,7 +703,8 @@ protected function _doImages_reference_callback($matches) {
$alt_text = $this->encodeAttribute($alt_text);
if (isset($this->urls[$link_id])) {
- $url = $this->encodeAttribute($this->urls[$link_id]);
+ $url = $this->filterUrl($this->urls[$link_id]);
+ $url = $this->encodeAttribute($url);
$result = "<img src=\"$url\" alt=\"$alt_text\"";
if (isset($this->titles[$link_id])) {
$title = $this->titles[$link_id];
@@ -722,6 +728,7 @@ protected function _doImages_inline_callback($matches) {
$title =& $matches[7];
$alt_text = $this->encodeAttribute($alt_text);
+ $url = $this->filterUrl($url);
$url = $this->encodeAttribute($url);
$result = "<img src=\"$url\" alt=\"$alt_text\"";
if (isset($title)) {
@@ -1307,13 +1314,15 @@ protected function doAutoLinks($text) {
return $text;
}
protected function _doAutoLinks_tel_callback($matches) {
- $url = $this->encodeAttribute($matches[1]);
+ $url = $this->filterUrl($matches[1]);
+ $url = $this->encodeAttribute($url);
$tel = $this->encodeAttribute($matches[2]);
$link = "<a href=\"$url\">$tel</a>";
return $this->hashPart($link);
}
protected function _doAutoLinks_url_callback($matches) {
- $url = $this->encodeAttribute($matches[1]);
+ $url = $this->filterUrl($matches[1]);
+ $url = $this->encodeAttribute($url);
$link = "<a href=\"$url\">$url</a>";
return $this->hashPart($link);
}
@@ -1516,6 +1525,95 @@ protected function _unhash_callback($matches) {
return $this->html_hashes[$matches[0]];
}
+
+ protected function filterUrl($url){
+ #
+ # There is lots of trickery that can be done to disguise 'javascript:'. We only
+ # need to worry about that if we're allowing relative URLs, otherwise we can just
+ # ensure it starts with a protocol we explicitly allow.
+ #
+ $allowed = implode('|', $this->links_protocols);
+ if (preg_match("!^({$allowed}):!i", $url)) return $url;
+
+ if (!$this->links_relative) return '#'.$url;
+
+ $test = $url;
+ $test = preg_replace_callback('!(&)#(\d+);?!', array($this, '_filterUrl_dec_entity'), $test);
+ $test = preg_replace_callback('!(&)#x([0-9a-f]+);?!i', array($this, '_filterUrl_hex_entity'), $test);
+ $test = preg_replace_callback('!(%)([0-9a-f]{2});?!i', array($this, '_filterUrl_hex_entity'), $test);
+ $test = preg_replace_callback('!&([^&;]*)(?=(;|&|$))!', array($this, '_filterUrl_named_entity'), $test);
+
+ if (strpos($test, ':') !== false) return '#'.$url;
+
+ return $url;
+ }
+
+ function _filterUrl_hex_entity($m){
+
+ return $this->_filterUrl_num_entity($m[1], hexdec($m[2]));
+ }
+
+ function _filterUrl_dec_entity($m){
+
+ return $this->_filterUrl_num_entity($m[1], intval($m[2]));
+ }
+
+ function _filterUrl_num_entity($orig_type, $d){
+
+ if ($d < 0){ $d = 32; } # treat control characters as spaces
+
+ #
+ # don't mess with high characters - what to replace them with is
+ # character-set independant, so we leave them as entities. besides,
+ # you can't use them to pass 'javascript:' etc (at present)
+ #
+
+ if ($d > 127){
+ if ($orig_type == '%'){ return '%'.dechex($d); }
+ if ($orig_type == '&'){ return "&#$d;"; }
+ }
+
+
+ #
+ # we want to convert this escape sequence into a real character.
+ # we call HtmlSpecialChars() incase it's one of [<>"&]
+ #
+
+ return HtmlSpecialChars(chr($d));
+ }
+
+ function _filterUrl_named_entity($m){
+
+ $preamble = $m[1];
+ $term = $m[2];
+
+ #
+ # if the terminating character is not a semi-colon, treat
+ # this as a non-entity
+ #
+
+ if ($term != ';'){
+
+ return '&amp;'.$preamble;
+ }
+
+
+ #
+ # if it's an allowed entity, go for it
+ #
+
+ if (in_array(StrToLower($entity), array('lt','gt','quot','amp'))){
+
+ return '&'.$preamble;
+ }
+
+
+ #
+ # not an allowed antity, so escape the ampersand
+ #
+
+ return '&amp;'.$preamble;
+ }
}
@@ -2290,6 +2388,7 @@ protected function _doAnchors_reference_callback($matches) {
if (isset($this->urls[$link_id])) {
$url = $this->urls[$link_id];
+ $url = $this->filterUrl($url);
$url = $this->encodeAttribute($url);
$result = "<a href=\"$url\"";
@@ -2317,7 +2416,7 @@ protected function _doAnchors_inline_callback($matches) {
$title =& $matches[7];
$attr = $this->doExtraAttributes("a", $dummy =& $matches[8]);
-
+ $url = $this->filterUrl($url);
$url = $this->encodeAttribute($url);
$result = "<a href=\"$url\"";
@@ -2401,7 +2500,8 @@ protected function _doImages_reference_callback($matches) {
$alt_text = $this->encodeAttribute($alt_text);
if (isset($this->urls[$link_id])) {
- $url = $this->encodeAttribute($this->urls[$link_id]);
+ $url = $this->filterUrl($this->urls[$link_id]);
+ $url = $this->encodeAttribute($url);
$result = "<img src=\"$url\" alt=\"$alt_text\"";
if (isset($this->titles[$link_id])) {
$title = $this->titles[$link_id];
@@ -2428,6 +2528,7 @@ protected function _doImages_inline_callback($matches) {
$attr = $this->doExtraAttributes("img", $dummy =& $matches[8]);
$alt_text = $this->encodeAttribute($alt_text);
+ $url = $this->filterUrl($url);
$url = $this->encodeAttribute($url);
$result = "<img src=\"$url\" alt=\"$alt_text\"";
if (isset($title)) {
Something went wrong with that request. Please try again.