Skip to content

Commit

Permalink
Updated the Cleaner to support custom allowed protocols such as "cid:…
Browse files Browse the repository at this point in the history
…" and "data:".

Fixes #127
  • Loading branch information
jhy committed Aug 28, 2011
1 parent d041822 commit c98349a
Show file tree
Hide file tree
Showing 3 changed files with 15 additions and 1 deletion.
3 changes: 3 additions & 0 deletions CHANGES
Expand Up @@ -9,6 +9,9 @@ jsoup changelog
* Updated the Cleaner and whitelists to optionally preserve related links in elements, instead of converting them
to absolute links.

* Updated the Cleaner to support custom allowed protocols such as "cid:" and "data:".
<https://github.com/jhy/jsoup/issues/127>

* Fixed handling of null characters within comments.
<https://github.com/jhy/jsoup/issues/121>

Expand Down
4 changes: 3 additions & 1 deletion src/main/java/org/jsoup/safety/Whitelist.java
Expand Up @@ -336,9 +336,11 @@ boolean isSafeAttribute(String tagName, Element el, Attribute attr) {
}

private boolean testValidProtocol(Element el, Attribute attr, Set<Protocol> protocols) {
// resolve relative urls to abs, and optionally update the attribute so output html has abs.
// try to resolve relative urls to abs, and optionally update the attribute so output html has abs.
// rels without a baseuri get removed
String value = el.absUrl(attr.getKey());
if (value.length() == 0)
value = attr.getValue(); // if it could not be made abs, run as-is to allow custom unknown protocols
if (!preserveRelativeLinks)
attr.setValue(value);

Expand Down
9 changes: 9 additions & 0 deletions src/test/java/org/jsoup/safety/CleanerTest.java
Expand Up @@ -113,4 +113,13 @@ public class CleanerTest {
String clean = Jsoup.clean(html, Whitelist.basic());
assertEquals("<a rel=\"nofollow\">Link</a>", clean);
}

@Test public void handlesCustomProtocols() {
String html = "<img src='cid:12345' /> <img src='data:gzzt' />";
String dropped = Jsoup.clean(html, Whitelist.basicWithImages());
assertEquals("<img /> \n<img />", dropped);

String preserved = Jsoup.clean(html, Whitelist.basicWithImages().addProtocols("img", "src", "cid", "data"));
assertEquals("<img src=\"cid:12345\" /> \n<img src=\"data:gzzt\" />", preserved);
}
}

0 comments on commit c98349a

Please sign in to comment.