Skip to content

Commit

Permalink
Fix: make the cleaner also remove javascript URLs that use escaping.
Browse files Browse the repository at this point in the history
  • Loading branch information
scoder committed Sep 9, 2018
1 parent 1f534e2 commit 6be1d08
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 5 deletions.
5 changes: 3 additions & 2 deletions src/lxml/html/clean.py
Expand Up @@ -8,9 +8,10 @@
import copy
try:
from urlparse import urlsplit
from urllib import unquote_plus
except ImportError:
# Python 3
from urllib.parse import urlsplit
from urllib.parse import urlsplit, unquote_plus
from lxml import etree
from lxml.html import defs
from lxml.html import fromstring, XHTML_NAMESPACE
Expand Down Expand Up @@ -482,7 +483,7 @@ def _kill_elements(self, doc, condition, iterate=None):

def _remove_javascript_link(self, link):
# links like "j a v a s c r i p t:" might be interpreted in IE
new = _substitute_whitespace('', link)
new = _substitute_whitespace('', unquote_plus(link))
if _is_javascript_scheme(new):
# FIXME: should this be None to delete?
return ''
Expand Down
6 changes: 3 additions & 3 deletions src/lxml/html/tests/test_clean.txt
Expand Up @@ -18,7 +18,7 @@
... <body onload="evil_function()">
... <!-- I am interpreted for EVIL! -->
... <a href="javascript:evil_function()">a link</a>
... <a href="j\x01a\x02v\x03a\x04s\x05c\x06r\x07i\x0Ep t:evil_function()">a control char link</a>
... <a href="j\x01a\x02v\x03a\x04s\x05c\x06r\x07i\x0Ep t%20:evil_function()">a control char link</a>
... <a href="data:text/html;base64,PHNjcmlwdD5hbGVydCgidGVzdCIpOzwvc2NyaXB0Pg==">data</a>
... <a href="#" onclick="evil_function()">another link</a>
... <p onclick="evil_function()">a paragraph</p>
Expand Down Expand Up @@ -51,7 +51,7 @@
<body onload="evil_function()">
<!-- I am interpreted for EVIL! -->
<a href="javascript:evil_function()">a link</a>
<a href="javascrip t:evil_function()">a control char link</a>
<a href="javascrip t%20:evil_function()">a control char link</a>
<a href="data:text/html;base64,PHNjcmlwdD5hbGVydCgidGVzdCIpOzwvc2NyaXB0Pg==">data</a>
<a href="#" onclick="evil_function()">another link</a>
<p onclick="evil_function()">a paragraph</p>
Expand Down Expand Up @@ -84,7 +84,7 @@
<body onload="evil_function()">
<!-- I am interpreted for EVIL! -->
<a href="javascript:evil_function()">a link</a>
<a href="javascrip%20t:evil_function()">a control char link</a>
<a href="javascrip%20t%20:evil_function()">a control char link</a>
<a href="data:text/html;base64,PHNjcmlwdD5hbGVydCgidGVzdCIpOzwvc2NyaXB0Pg==">data</a>
<a href="#" onclick="evil_function()">another link</a>
<p onclick="evil_function()">a paragraph</p>
Expand Down

2 comments on commit 6be1d08

@carnil
Copy link

@carnil carnil commented on 6be1d08 Dec 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue fixed by this commit was assigned CVE-2018-19787

@scoder
Copy link
Member Author

@scoder scoder commented on 6be1d08 Dec 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I added the CVE number to the changelog.

Please sign in to comment.