-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent various XSS attacks [rebase and update of #276] #495
Conversation
131ba75 addresses a small case study. I noticed that someone had written an extension for Parsedown extra here https://github.com/tovic/parsedown-extra-plugin One of the things it does it expand parsedown extra's attribute block feature (i.e. The extension allows arbitrary attributes to be set, as well as being vulnerable itself to xss via attribute encoding. So running this extension on current master, we see that it's vulnerable because it does it's own element parsing. Applying everything upto af04ac9 gets us this far: Which is great IMO! This feature is doing what it's meant to – sanitising all attributes (even ones Parsedown base didn't create). But because this extension allows arbitrary attributes to be set, we can still achieve something like this: Which is XSS without needing to break out of the quote. So 131ba75 does a little better and drops onevent style attributes before outputting the element. We can't solve every problem an extension will encounter in base, but I think this extra check is definitely a worthwhile since we can do it 😃 ! |
Btw, incase anyone is a little dubious about just checking for the https://www.w3.org/TR/html5/webappapis.html#event-handler-attributes
|
…ity, or very unlikely
27a5481
to
ba22e5b
Compare
Code-blocks like will remain untouched, right? |
Parsedown.php
Outdated
|
||
protected $safeLinksEnabled = true; | ||
|
||
protected $safeLinksWhitelist = array( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have to be an instance member? Seems like it can be moved to filterUnsafeUrlInAttribute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory it would let an extension define additional protocols without having to worry about implementation of how to whitelist them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I.e. you could fix a bug in the filter implementation that wouldn't have to be reflected in the extension if they're just adding data.
On a high level yes, they would go untouched. As-in the same markdown will be converted to the same HTML. For this reason it would technically be non backwards compatible on the Something @erusev might want to consider when deciding the version level to increment. The above assuming that the question was about code blocks and not HTML blocks. HTML blocks will be unaffected by this change, unless This just to emphasise again that |
* add gif and jpg as allowed data images * ensure that user controlled content fall only in the "data section" of the data URI (and does not intersect content-type definition in any way (best to be safe than sorry ;-))) "data section" as defined in: https://tools.ietf.org/html/rfc2397#section-3
Parsedown.php
Outdated
'data:image/png;', | ||
'data:image/png;base64,', | ||
'data:image/gif;base64,', | ||
'data:image/jpg;base64,', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually image/jpeg
😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cheers ;-)
Is jpg allowed as a variant? (i.e. should I add both?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, ty! 😃
@erusev I've included the closes keyword at the bottom of the initial PR description on a few issues and PRs that should be made obsolete on merge. Might want to double check they close automatically though, since they were added by editing the message (not sure of the mechanics of doing that). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All in all: Looks great @aidantwoods 👍
Parsedown.php
Outdated
return $this; | ||
} | ||
|
||
protected $safeLinksEnabled = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this shouldn't be enabled by default, it breaks BC. Enabling this without setMarkupEscaped(true)
makes no sense anyway - it prevents absolutely nothing without escaped markup. Never confuse people by letting them do pointless things.
How about something like this (disallow enabling safeLinksEnabled
without markupEscaped
and always enable/disable safeLinksEnabled
and markupEscaped
at the same time; however, disabling safeLinksEnabled
when markupEscaped
is enabled is still possible - escaping markup doesn't necessarily imply a "safe mode", but in contrast, a "safe mode" isn't possible without escaping markup)?
function setMarkupEscaped($markupEscaped)
{
$this->markupEscaped = (bool) $markupEscaped;
$this->safeLinksEnabled = (bool) $markupEscaped;
return $this;
}
function setSafeLinksEnabled($safeLinksEnabled)
{
// or throw a exception instead
$this->safeLinksEnabled = $this->markupEscaped && $safeLinksEnabled;
return $this;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, so some of this is spoken to by decisions made in the original PR.
I think the default was @erusev's suggestion as far as I can tell though, see #276 (comment)
If it was up to me I'd make markupEscaped
default to on too 😉
I'm happy to go with the flow here though, so I'll await some more feedback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it was up to me I'd make markupEscaped default to on too 😉
My 👎 for this, BC is IMO very important 😝
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hehe, I mean as an initial decision (of course too late now! ;p)
Parsedown.php
Outdated
|
||
foreach ($this->safeLinksWhitelist as $scheme) | ||
{ | ||
if (stripos($Element['attributes'][$attribute], $scheme) === 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't test this, but I think it treats relative links like [some link](relative/path/to/page)
as unsafe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Furthermore: Don't use stripos()
here, it forces PHP to walk through the whole string even when the first character already didn't match. Use substr()
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, it'll treat that as unsafe. Problem with allowing relative/path/to/page
will be that you'll additionally have to check that no colons are present (also note that :
is equivalent to a colon in a HTML attribute, and contains no colons).
Also see https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet#Embedded_newline_to_break_up_XSS for some examples of getting XSS in a HTML attribute via some obscenely permissive browser parsing :(
e.g. javas\0cript:
and jav\tascript:
are valid :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an alternate method for what's done here, one idea I've had is selectively partially url encoding attributes that can't be guaranteed safe by the whitelist.
I've written an implementation here. One thing I wasn't sure about was how permissive to be with which characters to allow though. I think holding back on encoding/#?&=%
(so URLs will still function) (but definitely encoding ;
, :
and all other non-alphanumerics) should work. I'll have to have a think about it a bit more though as to whether you could bypass that and specify a protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevertheless are relative paths an important use case. It might be a good idea to look into the specs to find out how URI schemes must look like to be valid. Just guessing here, but I think the :
as scheme "indicator" works only until a /
is encountered. As an alternative: If safe url parsing isn't enabled by default, it might be sufficient to document this behavior and to tell people that they should use ./
for relative paths (what afaik isn't whitelisted yet).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not entirely convinced that this is best solution in every detail, but you've convinced me that the approach is right. There might still be some open detail questions, I unfortunately don't have time to look through everything in detail at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we used your approach https://github.com/erusev/parsedown/pull/495/files/aee3963e6b97186b1e5526c118bf5d2d872cd8ee#r114438464 to make safeLinks
be off by default, and strongly coupled safeLinks
and setMarkupEscaped
to be either both on, or both off?
That way custom protocols are supported for authors (who trust their own content), but will be off for user generated content (that they do not trust).
Perhaps we could ease the complication of having to enable two options in sync by introducing a new option that does both behind the scenes: preventXss
or setContentUntrusted
– that way we'd be giving a more explicit representation to the user about what this option actually does, and would afford us the assumption that while in this mode we should treat content as untrusted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one has nothing to do with the other, never ever introduce BC breaking changes without incrementing the first digit of the version (i.e. when this feature is enabled by default, it must be released as Parsedown 2.0). You'll break all dependent applications otherwise - and Parsedown is no end user project, it's a library, thus you'll break hundreds of applications with thousands of users. See http://semver.org. The only exception is behavior that never was expected (like some weird edge cases when parsing markup), neither officially, nor how it was practically used by others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never ever introduce BC breaking changes without incrementing the first digit of the version (i.e. when this feature is enabled by default, it must be released as Parsedown 2.0)
I'd tend to agree with that. In-fact even if we change no public default behaviour, IMO it would be valuable to consider how much of a version jump changes to the protected API would warrant (see: #495 (comment))
What do you think about introducing a new option though – in theory if we introduced something like preventXss
or setContentUntrusted
then there would be no BC breaking changes to the public API (we then wouldn't need to strongly couple safeLinks
and setMarkupEscaped
into a both or none scenario – perhaps we wouldn't even need to expose safeLinks
publically, but just the new option).
Having this off by default would probably ease concerns about custom protocols (and as you said, having safeLinks
enabled without setMarkupEscaped
or vice versa doesn't make much sense anyway). So IMO either both should be off by default, or both should be on – both off seems most friendly to expected behaviour (even though my preference is that things are secure by default 😉)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is disabled by default - sure, why not.
However, as I've said before (#495 (comment)), please keep in mind that people might want to escape markup without having security/user-generated content in mind. Thus there might still be users who do want to escape markup, but don't want to have safeLinks
enabled (i.e. what Parsedown currently does when markupEscaped
is enabled). Just the other way round (safeLinks
enabled without markupEscaped
) makes no sense.
Nevertheless, this is no crucial question for me. Important is that this isn't enabled by default, as this would break BC in an absolute catastrophic way.
Parsedown.php
Outdated
unset($Element['attributes'][$att]); | ||
} | ||
# filter out badly parsed attribute | ||
elseif (strpbrk($att, $badAttributeChars) !== false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for this? Letting badly parsed attributes produce gibberish is IMO better than silently ignoring it, it's more explicit and gibberish actually helps people to understand that something went wrong. 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'd rather throw an exception here TBH, but there isn't really precedent so for in Parsedown for throwing them (so I'll defer that decision to @erusev)
In my experience coming across this stuff in the wild, there's a pretty high chance that people will not test their applications with bad data (especially if they don't know it could even be valid).
Just for reference, the only chars in that list that would be "valid"-(ish) in the attribute name are "'\0
, the rest are whitespace (so impossible).
All the whitespace characters are invalid in an HTML tag's attribute name, and would certainly create two attributes (so very likely there is an attack).
By filtering out attributes containing these characters we only need check for a prefix at the start of the string since we guarantee that there can not be multiple attributes (as interpreted by the browser) contained in the single attribute we are considering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw this isn't technically exhaustive if it happens to be the first attribute, see https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet#Non-alpha-non-digit_XSS
<SCRIPT/XSS SRC="http://xss.rocks/xss.js"></SCRIPT>
is interpreted as <script xss="" src="http://xss.rocks/xss.js"></script>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sigh I guess we'll also have to ensure that the attribute name starts with an alphanumeric.
Perhaps it's best to switch over to a whitelist.
I'd think [a-zA-Z0-9]
for starting char and [a-zA-Z0-9-]
for the rest is probably reasonable for normal use.
E.g. Chrome parses <a /onmouseover="alert(1)">xss</a>
as <a onmouseover="alert(1)">xss</a>
-
appears to correctly attributed to the beginning of the attribute name though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So 4bae1c9 implements this via a whitelist (which is safer), though @erusev should decide whether this is worth the performance hit given Parsedown base doesn't allow user generated attributes vs. allowing extensions todo basically no checking on what attributes they allow, but Parsedown handling them safely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Parsedown.php
Outdated
# clear out nulls | ||
if ($val === null) | ||
{ | ||
unset($Element['attributes'][$att]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was needed when I used array_flip
in place of where array_keys
is now, can remove that now though :)
Parsedown.php
Outdated
} | ||
} | ||
|
||
$onEventAttributeKeys = preg_grep('/^on/i', array_keys($Element['attributes'])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expensive and unnecessary approach. You're iterating through all attributes anyway, why don't you simply test for attributes starting with "on" using substr()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup you're right about that, needs to be case insensitive though. Is strlen
+ substr
+ strtolower
going to be faster than stripos
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitly! preg_match
(as in 4bae1c9) is very slow either. Starting the PCRE engine is expensive!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So are we proposing?
# dump onevent attribute
elseif (strlen($att) > 1 and strtolower(substr($att, 0, 2)) === 'on')
{
unset($Element['attributes'][$att]);
}
instead of
# dump onevent attribute
elseif (stripos($att, 'on') === 0)
{
unset($Element['attributes'][$att]);
}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How's this 2e4afde ?
no chars that could form a delimiter allowed
this should break urls that attempt to include a protocol, or port (these are absolute URLs and should have a whitelisted protocol for use) but URLs that are relative, or relative from the site root should be preserved (though characters non essential for the URL structure may be urlencoded) this approach has significant advantages over attempting to locate something like `javascript:alert(1)` or `javascript:alert(1)` (which are both valid) because browsers have been known to ignore ridiculous characters when encountered (meaning something like `jav\ta\0\0script:alert(1)` would be xss :( ). Instead of trying to chase down a way to interpret a URL to decide whether there is a protocol, this approach ensures that two essential characters needed to achieve a colon are encoded `:` (obviously) and `;` (from `:`). If these characters appear in a relative URL then they are equivalent to their URL encoded form and so this change will be non breaking for that case.
Great work man! |
Parsedown.php
Outdated
|
||
if ( ! $safe) | ||
{ | ||
$Element['attributes'][$attribute] = preg_replace_callback( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we find a simpler alternative to preg_replace_callback
? Something that doesn't require passing a function definition as a parameter - perhaps preg_match
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current way it's set out is to have a whitelist of characters we don't want to encode (and encode everything else).
I could achieve the same result with strpbrk
and encoding done on ranges of the string, not sure that would be faster though. Unless you have another idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, didn't see your edit.
Could do it with preg_match_all with match positions returned and applying encoding to those ranges, again, not sure it'll be faster?
P.S. the regex will match as many chars as it can in one go before making the function call at present
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erusev this would be an alternative using preg_match_all
if (
preg_match_all(
'/[^\/#?&=%]++/',
$Element['attributes'][$attribute],
$matches,
PREG_OFFSET_CAPTURE
)
) {
$offset = 0;
foreach ($matches[0] as $match)
{
$len = strlen($match[0]);
$encoded = urlencode($match[0]);
$Element['attributes'][$attribute] = substr_replace(
$Element['attributes'][$attribute],
$encoded,
$offset + $match[1],
$len
);
$offset += strlen($encoded) - $len;
}
}
More likely to introduce bugs with this though IMO, and I'm not sure it'll save that much on performance just looking at it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure preg_replace_callback()
is the easiest and fastest solution here 😉
054ba3c is borrowed from one of my projects, which isn't something that's got a release version yet – so we should add some tests to verify that this code does indeed have non breaking behaviour. |
@h4ckninja my bad. i think i owe a beer for @aidantwoods :) |
Not sure why you thought I was raising an argument? I was simply asking how complex is this implemention and does it pass known XSS problems... I certainly don't want to stamp on your hard work here as you've put a lot of effort into the commit. I just wanted to highlight that XSS is a complex problem and people should be aware of all the exploits and that this PR might not solve every possible one (your more qualified to answer that question though) 😄 |
All good. :) This is why I think this discussion was useful - to make sure we all understand each other and what's going on. |
I apologise for the rushed messaged, I meant to tack the bit about arguing on at the end as as a more general point (but not directed at you) and realised it looked otherwise after posting. I sort of rephrased my comment in an edit afterward, which I hope it is a little friendlier. Again, sorry for the apparent hostility – my bad :) @h4ckninja is probably right though, as long as we're making progress then there's value in having the discussion :) |
As far as I am aware this patch should solve all XSS issues in Parsedown (when safe mode is enabled), but I invite you (and others) to prove me wrong, and in that case I'll try my best fix whatever issues you find. The idea behind safe mode is so that it is clear to the user when output is safe from untrusted user input (as opposed to introducing lots of confusing options like: remember to enable markup escaping, and link destination validation, plus all the new features we didn't realise we need to make the output safe that we're going to add in bug fixes etc..., or by just leaving the user to figure it out themselves and go wrong). I prefer it be simple: If you can achieve XSS with safe mode enabled, it is a bug in safe mode. It should be fixed (and is also thus fixable without breaking backwards compatibility). As a defence in depth measure, as a user you can also run something like HTMLPurifier on Parsedown's output, and deploy a CSP. But these should be backup measures, safe mode should do the heavy lifting. |
This issue is now causing a SensioLabs Composer Vuinerability Check to fail at our organization.
Is there any potential ETA on merging this and tagging a new release so this can be cleared? Thank you! |
@erusev click merge. tag release. 👏 |
@aidantwoods are these fixes merged on your fork? is your fork available on Packagist? Do you plan to maintain it? |
Thank you @erusev for merging this and your hard work on the utility as a whole! |
Done. I apologize for being so slow to react to this. I'll do my best to react more quickly in future and I'll also consider adding more maintainers, @aidantwoods, you've been so helpful, are you interested? |
Thanks @erusev! |
I just got notification and I was asking myself - why? Then I notice that I was asking about the progress around 2015 :-) Good job! Still I moved away from this project long ago. Parsedown is not safe and should not be used on production. |
@erusev you are a hero 👍 |
Thank you for merging this update 👍 Also should we not add to the readme about using safe mode? |
Also update erusev/parsedown to ^1.7 as ^1.6 had a XSS vulnerability (see erusev/parsedown#495). This allows dependencies to be installed successfully on CircleCI.
Also update erusev/parsedown to ^1.7 as ^1.6 had a XSS vulnerability (see erusev/parsedown#495). This allows dependencies to be installed successfully on CircleCI.
* Run phpunit on circleci * Update phpro/grumphp to ^0.14 Also update erusev/parsedown to ^1.7 as ^1.6 had a XSS vulnerability (see erusev/parsedown#495). This allows dependencies to be installed successfully on CircleCI. * Disable all library hooks besides curl for tests
I've picked up on the work started over at #276 and rebased on
erusev/master
.Since this is rebased on master, I can't point at PR at
naNuke/master
without running into the merge conflicts that I've already resolved manually.I've implemented what I suggested earlier so that all attributes are properly encoded (and not just the specific ones we remember).
I've also added some tests, so @erusev's concern here should hopefully now be resolved, albeit a year later 😉
#276 (comment)
One thing to note is that all this can be circumvented if you forget to turn on$Parsedown->setMarkupEscaped(true);
(which is off by default) as you could just write a script tag manually for xss (even though the attributes and link destinations will be safe). So let's all remember to enable this setting 😉
Attributes are now always escaped properly (this speaks to just outputting things correctly), but link based XSS or XSS from writing plain old script tags will only be prevented only if the new
setSafeMode
is enabled.Closes #161
Closes #497
Closes #276
Closes #403
Closes #530
The following CVE has been assigned to the vulnerability specific to bypassing
->setMarkupEscaped(true)
: CVE-2018-1000162.