Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaning html containing the cid identifier breaks images #127

Closed
mikebell90 opened this issue Aug 18, 2011 · 4 comments
Closed

Cleaning html containing the cid identifier breaks images #127

mikebell90 opened this issue Aug 18, 2011 · 4 comments

Comments

@mikebell90
Copy link

Ok, so in mail type HTML the following is common

The item after CID: can be almost anything (US-ASCII I think) and of any length. It corresponds to an image linked elsewhere in MIME say like this

--mimebounday
Content-ID:
Content-Type: image/jpeg.....
(snip)

So, to mark a long story somewhat shorter, I use Jsoup's sanitizer extensively. However, I need these CID references to be preserved post sanitization. addProtocols does not work because the items are not valid URLs. As a result
the above becomes . Which for my purposes is not good :)

@mikebell90
Copy link
Author

see http://xml.resource.org/public/rfc/html/rfc2392.html

for description of the cid protocol schema, which is an internet standard. I suspect mailto could also not be unsanitized, but haven't tried it.

@mikebell90
Copy link
Author

Related to this the data uri is also sanitized and cannot be whitelisted.

@mikebell90
Copy link
Author

So I've worked around this in a lame fashion

  1. use jsoup to parse
  2. find all cid: refs in img tags
  3. replace them with http://contentid.com/
  4. sanitize
  5. find all http://contentid.com/ refs
  6. replace with data uris

but the issue with mailto, cid, and data really should be somehow addressable. I grant you they are abusable tags and ones that should not be whitelisted lightly

@jhy jhy closed this as completed in c98349a Aug 28, 2011
@jhy
Copy link
Owner

jhy commented Aug 28, 2011

Thanks for suggesting this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants