Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Made protocols configurable. #149

Merged
merged 3 commits into from Dec 10, 2015
Merged

Made protocols configurable. #149

merged 3 commits into from Dec 10, 2015

Conversation

AndreasMalecki
Copy link
Contributor

For one of my projects, I required configurable protocols for the href attribute of anchor tags when using bleach.clean. This pull request exposes the allowed protocols in bleach.clean.

@EmilStenstrom
Copy link

This is something we need too. Would appreciate a merge of this to master.

@willkg
Copy link
Member

willkg commented Nov 4, 2015

What's the specifics of the use case here? What protocols do you want to allow through?

@EmilStenstrom
Copy link

In our case we parse HTML from e-mail messages. There the "cid"-prototcol is used to link to inline images.

@DavidMuller
Copy link

+1 for this feature as well

@willkg
Copy link
Member

willkg commented Nov 4, 2015

@AndreasMalecki @DavidMuller What's your use case?

@nickburlett
Copy link

I'm also looking at using bleach to clean HTML email messages. I haven't gotten to trying the cid-protocol yet, but I'll need it eventually.

@willkg
Copy link
Member

willkg commented Nov 4, 2015

@nickburlett What're you cleaning HTML messages for? Is it to display in a browser? Is it for storage? Something else?

@DavidMuller
Copy link

@willkg our use case involves allowing certain ios "url schemes". For example, to direct a user to the sms app on their iphone, we would like to be able to preserve "sms:" after running through bleach:

# current behavior
In [8]: sms_string = '<a href="sms:">Launch Messages App</a>'

In [9]: bleach.clean(sms_string)
Out[9]: u'<a>Launch Messages App</a>'

@nickburlett
Copy link

@willkg my plan is for display in a browser.

@EmilStenstrom
Copy link

As a general workaround, this works for now, it's just incredibly ugly: bleach.sanitizer.BleachSanitizerMixin.allowed_protocols += ['cid']

@DavidMuller
Copy link

@willkg is this a feature you guys are considering merging?

@willkg
Copy link
Member

willkg commented Nov 10, 2015

It's still open, so it's still in progress.

Generally with bleach I want to add as little as humanly possible. For now, every code change and every new feature needs to be very compelling and have a well defined and documented reason to exist. I'm still wrapping my head around the underlying problem. It'd help to have an issue in the issue tracker that walks through the problem, the impact of the problem, what kinds of things the problem prevents, the work-arounds and then possible solutions.

I haven't looked at this since last week. Generally, I'm wondering the following:

  1. Are there other viable solutions?
  2. Do the changes here make it more likely someone makes a mistake when using bleach in a security-related situation?
  3. Is it well tested with the various compelling use cases?
  4. Is it well documented with examples including for the compelling use cases?

That's where I'm at.

@AndreasMalecki
Copy link
Contributor Author

We required some additional acceptable protocols like "smb" and had to use a monkey patch.

@EmilStenstrom
Copy link

  1. Are there other viable solutions?
    Yes, monkey-patching works. bleach.sanitizer.BleachSanitizerMixin.allowed_protocols += ['cid'] is what we are using right now. Problem is that this is a global list (we can't have some Sanitizers including that protocol and some not), and while we can work around that with even more monkey-patching, this quickly gets ugly.
  2. Do the changes here make it more likely someone makes a mistake when using bleach in a security-related situation?
    I see the question as: Is it more likely that people make mistakes by monkey-patching than when using a supported way of doing it. The example I give above about monkey-patching affecting all instances of Sanitizer and not only one, is an argument in favor of this patch imho.
  3. Is it well tested with the various compelling use cases?
    There are two concrete use-cases in this thread: adding the smb and adding the cid protocols. We have been running with this in production for a while with no issues.
  4. Is it well documented with examples including for the compelling use cases?
    Does this point mean you want documentation of this feature before merging it? In that case we need to add that to this patch before merging.

@dstufft
Copy link

dstufft commented Nov 11, 2015

I think the main question (in my mind) is if the developer needs configurable protocols or if there is just additional protocols that bleach should accept as allowed by default. If there are use cases where you need arbitrary protocols (I think mobile phones might work by having each app register a unique protocol used to open that app up?) then I think it needs to be configurable since there is no way to enumerate all possible protocols that a user of bleach may want to use. If the use cases are simply "here's some additional safe protocols that should be allowed" then I think the way forward would be to just add some more protocols to the list of allowed protocols.

@nickburlett
Copy link

@dstufft: I believe the developer needs configurable protocols. The set of safe protocols varies by use case. Not everyone will want the cid: protocol, but my use case needs it. However, I don't need or want smb: in my use case.

@cooncesean
Copy link

+1 for this feature as well. We're trying to support custom protocols (in our case, the protocol is actually more proprietary than the protocols specified in this conversation) in one of our projects and would benefit from this PR.

@willkg
Copy link
Member

willkg commented Nov 19, 2015

I'm on board for adding this feature. I see the compelling use case and I don't see other viable solutions with the current architecture. When I get a chance, I'll work through the PR and we can move forward. I'll try to do it by the end of Friday.

@snide
Copy link

snide commented Nov 19, 2015

Thanks @willkg! Looking forward to this one. Great lib and thanks for your work.

@@ -43,6 +44,8 @@

ALLOWED_STYLES = []

ALLOWED_PROTOCOLS = copy.copy(HTMLSanitizer.acceptable_protocols)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we're now "owning this" and we want to document it explicitly, I think it's prudent that instead of copying it with copy.copy, that we make it explicitly defined here:

ALLOWED_PROTOCOLS = [
    u'ed2k', u'ftp', u'http', u'https', u'irc', u'mailto', u'news', u'gopher', u'nntp',
    u'telnet', u'webcal', u'xmpp', u'callto', u'feed', u'urn', u'aim', u'rsync', u'tag',
    u'ssh', u'sftp', u'rtsp', u'afs', u'data'
]

That way our list won't change between versions of html5lib and bleach can explicitly declare what we think is appropriate regardless of what html5lib says.

Related to this, I'm not sure I'd consider that list a list of "safe protocols". I think I'd want to trim it down to a much smaller subset. Maybe this much more conservative set:

ALLOWED_PROTOCOLS = [u'http', u'https', u'mailto']

Anyone have thoughts on that?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you decide to change the default protocols to a smaller list I would suggest a big BACKWARDS INCOMPATIBLE warning on the next release, as suddenly all sorts of links would stop working in people's code unless they also went through the protocols one by one and checked if anyone used them or not. We have people using at least five of the ones removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EmilStenstrom Which protocols? Where would you need this "BACKWARDS INCOMPATIBLE" warning? Is a note in the CHANGES file enough?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing based on seeing a lot of user generated posts on our platform but: ftp, rtsp, nntp, webcal, feed.

We tend to look in the version history on PyPI so that would be ideal for us, but a CHANGES file is ok too, just a bit harder to stumble over when you're doing a big "lets update some libraries" drive.

@willkg
Copy link
Member

willkg commented Nov 19, 2015

I think the copy.copy thing is the only issue I have.

This needs documentation, too. At a minimum, we should add a new section to docs/clean.rst. Maybe call it Protocol whitelist and put it towards the end with the other whitelist sections.

@willkg
Copy link
Member

willkg commented Nov 19, 2015

As a side note, I only recently took up maintenance of bleach with @jezdez. That was a couple of weeks ago. Prior to that, it was @jsocol's hard work.

@willkg
Copy link
Member

willkg commented Dec 2, 2015

@AndreasMalecki ^^^ Are these changes you want to work on? If not, I can take what you started and finish it up some time this month.

@tell-k
Copy link

tell-k commented Dec 4, 2015

+1 I also want this feature. Because it's possible XSS is the case, such as the following.

>>> import bleach
>>> bleach.ALLOWED_TAGS.append('iframe')
>>> bleach.ALLOWED_ATTRIBUTES.update({'iframe': ['src']})
>>> 
>>> bleach.clean('<iframe src="data:text/html;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4="></iframe>')
'<iframe src="data:text/html;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4="></iframe>'

I want to allow the iframe tag. But I can not reject the data protocol.
thx.

@AndreasMalecki
Copy link
Contributor Author

@willkg Okay. I will do that within the week.

@AndreasMalecki
Copy link
Contributor Author

@willkg So, what's the decision concerning the default protocols? Limit them to the three you mentioned or stay backwards compatible? My favorite would be compatibility but, as you suggested, to define the list separately for bleach.

@willkg
Copy link
Member

willkg commented Dec 10, 2015

@AndreasMalecki There were a couple of thoughts on severely limiting the ALLOWED_PROTOCOLS, but I think the only concern was making sure we mark this clearly as a backwards incompatible change. I wrote up another issue to address making that clear.

Given that, I think we should limit them. I think for now we should go with:

ALLOWED_PROTOCOLS = [u'http', u'https', u'mailto']

Having said that, I'm still curious about situations where this isn't great, so I'll spend some time talking with people I know who use bleach and see what they think. If anything comes out of that, I'll write up an issue and PR to fix the list.

Thank you for working on this!

@AndreasMalecki
Copy link
Contributor Author

@willkg You're welcome. I implemented the requested changes. Anything else I should do?

@willkg
Copy link
Member

willkg commented Dec 10, 2015

Travis is green and this looks good to me. Thank you for doing the work! I'll merge it now.

I'll also add a note to CHANGES and there's another issue to make sure that gets surfaced in all appropriate places.

Thank you again!

willkg added a commit that referenced this pull request Dec 10, 2015
@willkg willkg merged commit 3b5e5b1 into mozilla:master Dec 10, 2015
@DavidMuller
Copy link

Really excited to see this feature in master. Are you guys planning to publish a new release to pypi soon? Looks like version 1.4.2 does not contain the commits that power this feature

dcoleman17 pushed a commit to DataDog/bleach that referenced this pull request Feb 29, 2016
…sion of bleach containing these commits is released
@cooncesean
Copy link

Hey guys, was wondering if there was a plan to create a formal release (including this merged PR) on PyPi?

Our team is pretty excited for these changes 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants