Made protocols configurable. #149

AndreasMalecki · 2015-03-27T14:13:44Z

For one of my projects, I required configurable protocols for the href attribute of anchor tags when using bleach.clean. This pull request exposes the allowed protocols in bleach.clean.

EmilStenstrom · 2015-11-04T14:57:54Z

This is something we need too. Would appreciate a merge of this to master.

willkg · 2015-11-04T15:05:05Z

What's the specifics of the use case here? What protocols do you want to allow through?

EmilStenstrom · 2015-11-04T15:28:55Z

In our case we parse HTML from e-mail messages. There the "cid"-prototcol is used to link to inline images.

DavidMuller · 2015-11-04T20:58:20Z

+1 for this feature as well

willkg · 2015-11-04T21:23:37Z

@AndreasMalecki @DavidMuller What's your use case?

nickburlett · 2015-11-04T21:25:28Z

I'm also looking at using bleach to clean HTML email messages. I haven't gotten to trying the cid-protocol yet, but I'll need it eventually.

willkg · 2015-11-04T21:28:09Z

@nickburlett What're you cleaning HTML messages for? Is it to display in a browser? Is it for storage? Something else?

DavidMuller · 2015-11-04T22:07:40Z

@willkg our use case involves allowing certain ios "url schemes". For example, to direct a user to the sms app on their iphone, we would like to be able to preserve "sms:" after running through bleach:

# current behavior
In [8]: sms_string = '<a href="sms:">Launch Messages App</a>'

In [9]: bleach.clean(sms_string)
Out[9]: u'<a>Launch Messages App</a>'

nickburlett · 2015-11-04T22:52:11Z

@willkg my plan is for display in a browser.

EmilStenstrom · 2015-11-05T19:42:15Z

As a general workaround, this works for now, it's just incredibly ugly: bleach.sanitizer.BleachSanitizerMixin.allowed_protocols += ['cid']

DavidMuller · 2015-11-10T17:39:26Z

@willkg is this a feature you guys are considering merging?

willkg · 2015-11-10T17:47:54Z

It's still open, so it's still in progress.

Generally with bleach I want to add as little as humanly possible. For now, every code change and every new feature needs to be very compelling and have a well defined and documented reason to exist. I'm still wrapping my head around the underlying problem. It'd help to have an issue in the issue tracker that walks through the problem, the impact of the problem, what kinds of things the problem prevents, the work-arounds and then possible solutions.

I haven't looked at this since last week. Generally, I'm wondering the following:

Are there other viable solutions?
Do the changes here make it more likely someone makes a mistake when using bleach in a security-related situation?
Is it well tested with the various compelling use cases?
Is it well documented with examples including for the compelling use cases?

That's where I'm at.

AndreasMalecki · 2015-11-11T08:29:52Z

We required some additional acceptable protocols like "smb" and had to use a monkey patch.

EmilStenstrom · 2015-11-11T08:42:00Z

Are there other viable solutions?
Yes, monkey-patching works. bleach.sanitizer.BleachSanitizerMixin.allowed_protocols += ['cid'] is what we are using right now. Problem is that this is a global list (we can't have some Sanitizers including that protocol and some not), and while we can work around that with even more monkey-patching, this quickly gets ugly.
Do the changes here make it more likely someone makes a mistake when using bleach in a security-related situation?
I see the question as: Is it more likely that people make mistakes by monkey-patching than when using a supported way of doing it. The example I give above about monkey-patching affecting all instances of Sanitizer and not only one, is an argument in favor of this patch imho.
Is it well tested with the various compelling use cases?
There are two concrete use-cases in this thread: adding the smb and adding the cid protocols. We have been running with this in production for a while with no issues.
Is it well documented with examples including for the compelling use cases?
Does this point mean you want documentation of this feature before merging it? In that case we need to add that to this patch before merging.

dstufft · 2015-11-11T12:35:47Z

I think the main question (in my mind) is if the developer needs configurable protocols or if there is just additional protocols that bleach should accept as allowed by default. If there are use cases where you need arbitrary protocols (I think mobile phones might work by having each app register a unique protocol used to open that app up?) then I think it needs to be configurable since there is no way to enumerate all possible protocols that a user of bleach may want to use. If the use cases are simply "here's some additional safe protocols that should be allowed" then I think the way forward would be to just add some more protocols to the list of allowed protocols.

nickburlett · 2015-11-11T16:48:15Z

@dstufft: I believe the developer needs configurable protocols. The set of safe protocols varies by use case. Not everyone will want the cid: protocol, but my use case needs it. However, I don't need or want smb: in my use case.

cooncesean · 2015-11-16T18:01:23Z

+1 for this feature as well. We're trying to support custom protocols (in our case, the protocol is actually more proprietary than the protocols specified in this conversation) in one of our projects and would benefit from this PR.

willkg · 2015-11-19T01:48:11Z

I'm on board for adding this feature. I see the compelling use case and I don't see other viable solutions with the current architecture. When I get a chance, I'll work through the PR and we can move forward. I'll try to do it by the end of Friday.

snide · 2015-11-19T16:52:26Z

Thanks @willkg! Looking forward to this one. Great lib and thanks for your work.

willkg · 2015-11-19T20:09:34Z

bleach/__init__.py

@@ -43,6 +44,8 @@

 ALLOWED_STYLES = []

+ALLOWED_PROTOCOLS = copy.copy(HTMLSanitizer.acceptable_protocols)


Given that we're now "owning this" and we want to document it explicitly, I think it's prudent that instead of copying it with copy.copy, that we make it explicitly defined here:

ALLOWED_PROTOCOLS = [ u'ed2k', u'ftp', u'http', u'https', u'irc', u'mailto', u'news', u'gopher', u'nntp', u'telnet', u'webcal', u'xmpp', u'callto', u'feed', u'urn', u'aim', u'rsync', u'tag', u'ssh', u'sftp', u'rtsp', u'afs', u'data' ]

That way our list won't change between versions of html5lib and bleach can explicitly declare what we think is appropriate regardless of what html5lib says.

Related to this, I'm not sure I'd consider that list a list of "safe protocols". I think I'd want to trim it down to a much smaller subset. Maybe this much more conservative set:

ALLOWED_PROTOCOLS = [u'http', u'https', u'mailto']

Anyone have thoughts on that?

If you decide to change the default protocols to a smaller list I would suggest a big BACKWARDS INCOMPATIBLE warning on the next release, as suddenly all sorts of links would stop working in people's code unless they also went through the protocols one by one and checked if anyone used them or not. We have people using at least five of the ones removed.

@EmilStenstrom Which protocols? Where would you need this "BACKWARDS INCOMPATIBLE" warning? Is a note in the CHANGES file enough?

I'm guessing based on seeing a lot of user generated posts on our platform but: ftp, rtsp, nntp, webcal, feed.

We tend to look in the version history on PyPI so that would be ideal for us, but a CHANGES file is ok too, just a bit harder to stumble over when you're doing a big "lets update some libraries" drive.

willkg · 2015-11-19T20:11:30Z

I think the copy.copy thing is the only issue I have.

This needs documentation, too. At a minimum, we should add a new section to docs/clean.rst. Maybe call it Protocol whitelist and put it towards the end with the other whitelist sections.

willkg · 2015-11-19T20:13:35Z

As a side note, I only recently took up maintenance of bleach with @jezdez. That was a couple of weeks ago. Prior to that, it was @jsocol's hard work.

willkg · 2015-12-02T20:08:03Z

@AndreasMalecki ^^^ Are these changes you want to work on? If not, I can take what you started and finish it up some time this month.

tell-k · 2015-12-04T16:03:29Z

+1 I also want this feature. Because it's possible XSS is the case, such as the following.

>>> import bleach
>>> bleach.ALLOWED_TAGS.append('iframe')
>>> bleach.ALLOWED_ATTRIBUTES.update({'iframe': ['src']})
>>> 
>>> bleach.clean('<iframe src="data:text/html;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4="></iframe>')
'<iframe src="data:text/html;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4="></iframe>'

I want to allow the iframe tag. But I can not reject the data protocol.
thx.

AndreasMalecki · 2015-12-07T07:14:01Z

@willkg Okay. I will do that within the week.

AndreasMalecki · 2015-12-10T15:24:13Z

@willkg So, what's the decision concerning the default protocols? Limit them to the three you mentioned or stay backwards compatible? My favorite would be compatibility but, as you suggested, to define the list separately for bleach.

willkg · 2015-12-10T15:32:39Z

@AndreasMalecki There were a couple of thoughts on severely limiting the ALLOWED_PROTOCOLS, but I think the only concern was making sure we mark this clearly as a backwards incompatible change. I wrote up another issue to address making that clear.

Given that, I think we should limit them. I think for now we should go with:

ALLOWED_PROTOCOLS = [u'http', u'https', u'mailto']

Having said that, I'm still curious about situations where this isn't great, so I'll spend some time talking with people I know who use bleach and see what they think. If anything comes out of that, I'll write up an issue and PR to fix the list.

Thank you for working on this!

…ation.

AndreasMalecki · 2015-12-10T16:47:27Z

@willkg You're welcome. I implemented the requested changes. Anything else I should do?

willkg · 2015-12-10T16:53:46Z

Travis is green and this looks good to me. Thank you for doing the work! I'll merge it now.

I'll also add a note to CHANGES and there's another issue to make sure that gets surfaced in all appropriate places.

Thank you again!

Made protocols configurable.

DavidMuller · 2016-02-12T23:41:06Z

Really excited to see this feature in master. Are you guys planning to publish a new release to pypi soon? Looks like version 1.4.2 does not contain the commits that power this feature

…sion of bleach containing these commits is released

cooncesean · 2016-03-14T22:55:19Z

Hey guys, was wondering if there was a plan to create a formal release (including this merged PR) on PyPi?

Our team is pretty excited for these changes 👍

AndreasMalecki added 2 commits March 27, 2015 12:37

Made protocols configurable.

a451154

Fixed usage of copy.

9395ee6

willkg reviewed Nov 19, 2015
View reviewed changes

This was referenced Dec 2, 2015

Allow protocols and svg properties to be configurable. #122

Closed

Allow Data URI Schemes #123

Closed

Limited number of protocols whitelisted by default and added document…

8262311

…ation.

willkg added a commit that referenced this pull request Dec 10, 2015

Merge pull request #149 from AndreasMalecki/master

3b5e5b1

Made protocols configurable.

willkg merged commit 3b5e5b1 into mozilla:master Dec 10, 2015

dcoleman17 pushed a commit to DataDog/bleach that referenced this pull request Feb 29, 2016

added commits from mozilla#149 . we can remove this fork when new ver…

ee7d2a5

…sion of bleach containing these commits is released

blag mentioned this pull request Feb 1, 2019

Allowed protocols marksweb/django-bleach#2

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Made protocols configurable. #149

Made protocols configurable. #149

AndreasMalecki commented Mar 27, 2015

EmilStenstrom commented Nov 4, 2015

willkg commented Nov 4, 2015

EmilStenstrom commented Nov 4, 2015

DavidMuller commented Nov 4, 2015

willkg commented Nov 4, 2015

nickburlett commented Nov 4, 2015

willkg commented Nov 4, 2015

DavidMuller commented Nov 4, 2015

nickburlett commented Nov 4, 2015

EmilStenstrom commented Nov 5, 2015

DavidMuller commented Nov 10, 2015

willkg commented Nov 10, 2015

AndreasMalecki commented Nov 11, 2015

EmilStenstrom commented Nov 11, 2015

dstufft commented Nov 11, 2015

nickburlett commented Nov 11, 2015

cooncesean commented Nov 16, 2015

willkg commented Nov 19, 2015

snide commented Nov 19, 2015

willkg Nov 19, 2015

EmilStenstrom Dec 2, 2015

willkg Dec 2, 2015

EmilStenstrom Dec 2, 2015

willkg commented Nov 19, 2015

willkg commented Nov 19, 2015

willkg commented Dec 2, 2015

tell-k commented Dec 4, 2015

AndreasMalecki commented Dec 7, 2015

AndreasMalecki commented Dec 10, 2015

willkg commented Dec 10, 2015

AndreasMalecki commented Dec 10, 2015

willkg commented Dec 10, 2015

DavidMuller commented Feb 12, 2016

cooncesean commented Mar 14, 2016

		@@ -43,6 +44,8 @@

		ALLOWED_STYLES = []

		ALLOWED_PROTOCOLS = copy.copy(HTMLSanitizer.acceptable_protocols)

Made protocols configurable. #149

Made protocols configurable. #149

Conversation

AndreasMalecki commented Mar 27, 2015

EmilStenstrom commented Nov 4, 2015

willkg commented Nov 4, 2015

EmilStenstrom commented Nov 4, 2015

DavidMuller commented Nov 4, 2015

willkg commented Nov 4, 2015

nickburlett commented Nov 4, 2015

willkg commented Nov 4, 2015

DavidMuller commented Nov 4, 2015

nickburlett commented Nov 4, 2015

EmilStenstrom commented Nov 5, 2015

DavidMuller commented Nov 10, 2015

willkg commented Nov 10, 2015

AndreasMalecki commented Nov 11, 2015

EmilStenstrom commented Nov 11, 2015

dstufft commented Nov 11, 2015

nickburlett commented Nov 11, 2015

cooncesean commented Nov 16, 2015

willkg commented Nov 19, 2015

snide commented Nov 19, 2015

willkg Nov 19, 2015

Choose a reason for hiding this comment

EmilStenstrom Dec 2, 2015

Choose a reason for hiding this comment

willkg Dec 2, 2015

Choose a reason for hiding this comment

EmilStenstrom Dec 2, 2015

Choose a reason for hiding this comment

willkg commented Nov 19, 2015

willkg commented Nov 19, 2015

willkg commented Dec 2, 2015

tell-k commented Dec 4, 2015

AndreasMalecki commented Dec 7, 2015

AndreasMalecki commented Dec 10, 2015

willkg commented Dec 10, 2015

AndreasMalecki commented Dec 10, 2015

willkg commented Dec 10, 2015

DavidMuller commented Feb 12, 2016

cooncesean commented Mar 14, 2016