Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible XSS Vulnerability in Image and Hyperlinks (Markdown -> HTML) #1037

Closed
Preole opened this issue Oct 27, 2013 · 3 comments
Closed

Possible XSS Vulnerability in Image and Hyperlinks (Markdown -> HTML) #1037

Preole opened this issue Oct 27, 2013 · 3 comments

Comments

@Preole
Copy link

Preole commented Oct 27, 2013

I have noticed that Pandoc allows the javascript: and data: URI schemes in the Markdown dialect. Namely, I can use the Javascript and Data:URI schemes in place of a valid URL for the hyperlink and image elements. As a consequence, the HTML back-end (Both strict mode and non-strict mode) can produce output capable of XSS attacks.

Below is my input fed through Babelmark 2 @ http://johnmacfarlane.net/babelmark2/

[JSLink](javascript:alert("XSS");)

![JSImage](javascript:alert("XSS");)

[DataLink](data:text/html;base64,PHNjcmlwdD5hbGVydCgiSGVsbG8iKTs8L3NjcmlwdD4=)

![DataImage](data:text/html;base64,PHNjcmlwdD5hbGVydCgiSGVsbG8iKTs8L3NjcmlwdD4=)

Output:

<p><a href="javascript:alert(&quot;XSS&quot;);">JSLink</a>
</p>
<p>
    <img src="javascript:alert(&quot;XSS&quot;);" alt="JSImage" />
</p>
<p><a href="data:text/html;base64,PHNjcmlwdD5hbGVydCgiSGVsbG8iKTs8L3NjcmlwdD4=">DataLink</a>
</p>
<p>
    <img src="data:text/html;base64,PHNjcmlwdD5hbGVydCgiSGVsbG8iKTs8L3NjcmlwdD4="
    alt="DataImage" />
</p>

When I clicked on the hyperlink with the Javascript payload, both in data:uri form and javascript:, an alert box pops up immediately, which means the embedded payload has been executed. (On Firefox 24)

The image element appears to be safe from this kind of XSS attack, at least on modern web browsers that disallow javascript: directives.

If a malicious writer distributes an HTML file with payload encoded using the above technique, the HTML file may be used for a phishing attack against the recipient.

I personally recommend disabling these two URI schemes altogether, but at the same time, some authors would like to embed images in Markdown using Data URI, which is a perfectly legitimate use for these schemes.

@dashed
Copy link

dashed commented Oct 27, 2013

I believe pandoc creates self-contained HTML documents using this technique. So, rather than disallowing the URI schemes altogether, it'll be better to convert all characters which have HTML character entity into those respective entities.

From babelmark2, it seems some markdown flavours do this by default.

So, pandoc may implement an option to escape characters into HTML entities: --escape-URLs, --escape-images.

The type of escaping I recommend is something like encodeURI() in JavaScript: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURI

I think pandoc should leave all URLs alone by default unless options like --escape-URLs is specified.

@jgm
Copy link
Owner

jgm commented Oct 27, 2013

Yes, I'm aware of this. At one point I had a --sanitize option in
pandoc, that stripped these things (and many others) out. But then I was
convinced that the most reliable way to sanitize untrusted markdown
input is to run the HTML output of pandoc through a sanitizer. There
are many battle-tested sanitizers out there, which you can use.
(In Haskell, there is xss-sanitize, which started out as the former
sanitization code from pandoc.)

Bottom line: You should always sanitize the output of markdown
conversions before displaying them on a website.

But this isn't a bug in pandoc.

+++ Preole [Oct 26 13 19:01 ]:

I have noticed that Pandoc allows the javascript: and data: URI schemes in the Markdown dialect. Namely, I can use the Javascript and Data:URI schemes in place of a valid URL for the hyperlink and image elements. As a consequence, the HTML back-end (Both strict mode and non-strict mode) can produce output capable of XSS attacks.

Below is my input fed through Babelmark 2 @ http://johnmacfarlane.net/babelmark2/

[JSLink](javascript:alert("XSS");)

![JSImage](javascript:alert("XSS");)

[DataLink](data:text/html;base64,PHNjcmlwdD5hbGVydCgiSGVsbG8iKTs8L3NjcmlwdD4=)

![DataImage](data:text/html;base64,PHNjcmlwdD5hbGVydCgiSGVsbG8iKTs8L3NjcmlwdD4=)

Output:

<p><a href="javascript:alert(&quot;XSS&quot;);">JSLink</a>
</p>
<p>
    <img src="javascript:alert(&quot;XSS&quot;);" alt="JSImage" />
</p>
<p><a href="data:text/html;base64,PHNjcmlwdD5hbGVydCgiSGVsbG8iKTs8L3NjcmlwdD4=">DataLink</a>
</p>
<p>
    <img src="data:text/html;base64,PHNjcmlwdD5hbGVydCgiSGVsbG8iKTs8L3NjcmlwdD4="
    alt="DataImage" />
</p>

When I clicked on the hyperlink with the Javascript payload, both in data:uri form and javascript:, an alert box pops up immediately, which means the embedded payload has been executed. (On Firefox 24)

The image element appears to be safe from this kind of XSS attack, at least on modern web browsers that disallow javascript: directives.

If a malicious writer distributes an HTML file with payload encoded using the above technique, the HTML file may be used for a phishing attack against the recipient.

I personally recommend disabling these two URI schemes altogether, but at the same time, some authors would like to embed images in Markdown using Data URI, which is a perfectly legitimate use for these schemes.


Reply to this email directly or view it on GitHub:
#1037

@Preole
Copy link
Author

Preole commented Oct 29, 2013

Yes, I'm aware of this. At one point I had a --sanitize option in pandoc, that stripped these things (and many others) out. But then I was convinced that the most reliable way to sanitize untrusted markdown input is to run the HTML output of pandoc through a sanitizer. There are many battle-tested sanitizers out there, which you can use. (In Haskell, there is xss-sanitize, which started out as the former sanitization code from pandoc.)

I understand then. The verdict is to simply use an external library against the HTML output, rather than having the parser itself trying to produce safe HTML. I suppose it's safe to close this since it's not a big deal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants