Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rewrite bleach to support html5lib >= 0.99999999 (8 9s) #229

Closed
willkg opened this issue Oct 31, 2016 · 21 comments
Closed

rewrite bleach to support html5lib >= 0.99999999 (8 9s) #229

willkg opened this issue Oct 31, 2016 · 21 comments

Comments

@willkg
Copy link
Member

willkg commented Oct 31, 2016

I thought someone was going to write up an issue for this, but apparently no one has, yet.

html5lib changed some APIs so we pinned to specific versions of html5lib with some number of 9s.

html5lib 1.0 milestone is pretty close to done. I think they're just waiting on documenting their public APIs. I don't see any indication of what their schedule looks like, but from what I did see, it seems like it's "done" except for the documentation. Given that, we could probably be safe by adding support for html5lib 0.999999999 (that's 9 9s) in addition to the ones we support now.

There's another issue in our tracker about how 0.999 causes some errors. We might want to nix support for that version.

This issue covers adding support for html5lib 0.999999999 (that's 9 9s) and 1.0 when that releases.

@willkg willkg added this to the v2.0 milestone Oct 31, 2016
@renewfrl
Copy link

renewfrl commented Dec 6, 2016

as the supplied htmllib has XSS vulnerabilities (https://www.sourceclear.com/registry/security/cross-site-scripting-xss-/python/sid-3068) would be nice to see this issue fixed

@kitterma
Copy link

kitterma commented Dec 6, 2016

Did someone request a CVE for that?

quis added a commit to quis/WeasyPrint that referenced this issue Dec 6, 2016
We need this older version of html5lib because bleach doesn’t yet support versions greater than 0.9999999 (7 9s). It will at some point in the future: mozilla/bleach#229

Weasyprint doesn’t support versions of html5lib lower than 0.999999999 (9 9s) and isn’t going to: Kozea#353 (comment)

So this commit patches Weasyprint to use the older version.
quis added a commit to quis/WeasyPrint that referenced this issue Dec 6, 2016
We need this older version of html5lib because bleach doesn’t yet support versions greater than 0.9999999 (7 9s). It will at some point in the future: mozilla/bleach#229

Weasyprint doesn’t support versions of html5lib lower than 0.999999999 (9 9s) and isn’t going to: Kozea#353 (comment)

So this commit patches Weasyprint to use the older version.
quis added a commit to quis/WeasyPrint that referenced this issue Dec 6, 2016
We need this older version of html5lib because bleach doesn’t yet support versions greater than 0.9999999 (7 9s). It will at some point in the future: mozilla/bleach#229

Weasyprint doesn’t support versions of html5lib lower than 0.999999999 (9 9s) and isn’t going to: Kozea#353 (comment)

So this commit patches Weasyprint to use the older version.
@lsaffre
Copy link

lsaffre commented Dec 7, 2016

Besides the security issues it is currently not possible to use bleach together with weasyprint because the latter already migrated to the new html5lib and does not support the old one. That's why I currently cannot use bleach!

@willkg
Copy link
Member Author

willkg commented Dec 7, 2016

This is the first time I've heard of that XSS. While it's a problem with html5lib, has anyone looked into whether it affects bleach, too? If so, can someone write up a test case and toss that in a pull request?

Regarding supporting html5lib 0.99999999 (8 9s) and up, I'm happy to look at pull requests. If anyone is interested in working on it, let me know.

@renewfrl
Copy link

renewfrl commented Dec 9, 2016

Not sure, however what they write on sourceclear:

This issue was fixed in version 0.99999999. That version is currently considered safe, we suggest that you upgrade to the fixed version.

However, that version breaks other things, right?

@willkg
Copy link
Member Author

willkg commented Dec 19, 2016

I looked into it. The security vulnerability is about characters that need to be quoted for legacy browsers when used in attributes.

I think the relevant test in the patch for html5lib is this one:

html5lib/html5lib-python@9b8d8eb#diff-42ac3b2a6b7432ce8cc3967afb76ba26L245

If you run that string through bleach right now with 0.9999999 (7 9s), it quotes it correctly. That's because bleach always quotes attributes:

serializer = HTMLSerializer(quote_attr_values=True,

Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from bleach import clean
>>> clean('<span title=foo\u000bbar>')
u'&lt;span title="foo\\u000bbar"&gt;'
>>> 

Given that, I don't think bleach is affected by the XSS in question.

Please feel free to double-check this.

Regardless, I'm working on updating bleach to use html5lib >= 0.9999999 now. I'm not sure how long it'll take.

@lsaffre
Copy link

lsaffre commented Dec 20, 2016 via email

@willkg willkg changed the title support html5lib 1.0 (when it is released) rewrite bleach to support html5lib 1.0 (when it is released) Jan 9, 2017
@willkg
Copy link
Member Author

willkg commented Jan 24, 2017

I've been working on this on and off for a couple of months when I get a chance to spend time on it. Yesterday, I hit a problem that is currently a showstopper. I wrote up an issue for it here:

html5lib/html5lib-python#312

I'm hoping I'm horribly wrong and completely misunderstand something. If that's the case, hopefully someone points it out and then I can return to work. If not, then I'll think of another way to implement things.

Anyhow, I'm blocked until that's resolved in some manner.

@willkg willkg changed the title rewrite bleach to support html5lib 1.0 (when it is released) rewrite bleach to support html5lib >= 0.99999999 (8 9s) Jan 24, 2017
@gsnedders
Copy link
Contributor

@kitterma Uh, I certainly remember meaning to request a CVE for it… I have no idea what happened to that. I may have simply screwed up and not done so.

@willkg Bleach uses quote_attr_values=True, so isn't vulnerable.

@stonebig
Copy link

hi, any timeline for a bleach release with this compatibility fix ?

@willkg
Copy link
Member Author

willkg commented Feb 13, 2017

There's no timeline for this rather large rewrite of bleach. Last I updated, I mentioned how the changes in the html5lib API cause a number of regressions, so I'm still pretty stuck on that.

These are the options I've come up with so far in rough order of the estimated amount of work required:

Option 1: Vendor html5lib 0.9999999 and continue with bleach as is

This has a number of problems the biggest one in my opinion being that we'd be responsible for security issues. Bleach is probably small enough that we probably won't even find out about security issues. Further, I think this step implicitly involves a fork of html5lib. Because of that, this option is pretty unenthusing.

Option 2: Finish up the work and whatever bleach regressions we have--that's the new normal, do a release

This has a number of problems. Currently, we regularly get complains about the output of bleach.clean and how it's not "pretty" in regards to malicious user input. I think the regressions will exacerbate this, so I'll get more of them. Thus I'd do a bunch of work to finish this option up, then update the tests showing regressions, then do a release, then get a lot of flak from users who are unhappy because of the regressions. That's unenthusing.

Option 3: Wait for something to change upstream in html5lib

I think this will take a long time to happen because it's substantial work upstream and I don't think it's very high on their priority list and I think they're pretty starved for maintenance help, too. Maybe 6 months? Maybe a year? Maybe never? Waiting for changes upstream to happen is probably not a valid option. I don't have time to work on it myself--it's probably a huge project.

Option 4: Switch to another library

I started looking into whether we could switch from html5lib to something else and didn't get very far, but maybe it's possible.

So that's where we're at. I have no idea how long this is going to take. I make no promises on timelines etc.

@willkg
Copy link
Member Author

willkg commented Feb 18, 2017

I continued investigating option 2. One of the problems here was that in order to understand the scope of the html5lib 0.99999999 (8 9s) API change as it affected Bleach, I had to finish the rewrite. But I didn't want to spend all my time on the rewrite if I was just going to throw it away and go with a different option.

Anyhow, I'm mostly done with the rewrite. There will be some regressions in .clean() and .linkify() output and the Bleach API will change. I'll have more info when I clean things up and make the branch public.

I'm hoping I can wrap this up next week.

Who can help test?

@lsaffre
Copy link

lsaffre commented Feb 18, 2017 via email

@Xof
Copy link

Xof commented Feb 21, 2017

I'm definitely happy to help test.

@willkg
Copy link
Member Author

willkg commented Feb 24, 2017

I'm reopening this for testing.

@Xof, @lsaffre and anyone else: I landed the changes just now. You can grab a .tar.gz of it with this link:

https://github.com/mozilla/bleach/archive/543984c.tar.gz

You can install it with pip:

pip install https://github.com/mozilla/bleach/archive/543984c.tar.gz#egg=bleach

Please take some time before Monday to install it, test it out and comment on your experiences here. If you run into issues, write them up as new issues. I'll work through them as I'm able.

Thanks!

@willkg willkg reopened this Feb 24, 2017
@lsaffre
Copy link

lsaffre commented Feb 25, 2017

Great! It seems that it works now! See my blog. So for me you can close this ticket. I cannot say whether it introduced any regression because I will only now start using bleach on the battle field.

@kitterma
Copy link

I've installed and tested it as far as seeing that the tests pass with python2.7 and python3.5 on Debian. Of course, these html5lib changes ripple outward. The update, not surprisingly, causes django-html_sanitizer to fail:

ui/django-html_sanitizer#13

I don't think that should stop you from releasing though. It's already broken with the current html5lib, so anyone wanting to use it as it stands will already to muck around with using old versions of things.

@lsaffre
Copy link

lsaffre commented Feb 25, 2017

I also would welcome a release because that would simplify deploying in order to test it on real data.

@Xof
Copy link

Xof commented Feb 25, 2017

I haven't gotten it to break in my testing. Thanks for the fast work!

@willkg
Copy link
Member Author

willkg commented Mar 8, 2017

Thank you for help with testing!

I made a bunch more changes and pushed out a release just now. If there are issues, I'll try to triage them quickly.

@willkg willkg closed this as completed Mar 8, 2017
@lsaffre
Copy link

lsaffre commented Mar 9, 2017

Yes, it works perfectly now for me. For me we can close the ticket. Thanks for your work, Will!

quis added a commit to alphagov/notifications-utils that referenced this issue Mar 9, 2017
Changes:
mozilla/bleach@v1.4.3...v2.0

Summary:
https://github.com/mozilla/bleach/blob/b8b8e4aa9b688fb00f41959e74d8054e461f6aaf/CHANGES

This is the change we care about:
> * Supports html5lib >= 0.99999999 (8 9s).

Means we won’t need to do this workaround in admin any more:

> We need [to use] an older version of html5lib because bleach doesn’t
> yet support versions greater than 0.9999999 (7 9s). It will at some
> point in the future: mozilla/bleach#229
>
> Weasyprint doesn’t support versions of html5lib lower than
> 0.999999999 (9 9s) and isn’t going to: Kozea#353 (comment)
>
> So this commit patches Weasyprint to use the older version.

– quis/WeasyPrint@99718c2
quis added a commit to alphagov/notifications-utils that referenced this issue Mar 9, 2017
Changes:
mozilla/bleach@v1.4.3...v2.0

Summary:
https://github.com/mozilla/bleach/blob/b8b8e4aa9b688fb00f41959e74d8054e461f6aaf/CHANGES

This is the change we care about:
> * Supports html5lib >= 0.99999999 (8 9s).

Means we won’t need to do this workaround in admin any more:

> We need [to use] an older version of html5lib because bleach doesn’t
> yet support versions greater than 0.9999999 (7 9s). It will at some
> point in the future: mozilla/bleach#229
>
> Weasyprint doesn’t support versions of html5lib lower than
> 0.999999999 (9 9s) and isn’t going to: Kozea#353 (comment)
>
> So this commit patches Weasyprint to use the older version.

– quis/WeasyPrint@99718c2
sevenmachines pushed a commit to ministryofjustice/cla_backend that referenced this issue Aug 7, 2017
This is needed to fix the breaking incompatibility with html5 lib.

mozilla/bleach#229
ibrechin pushed a commit to ministryofjustice/cla_backend that referenced this issue Jan 29, 2018
This is needed to fix the breaking incompatibility with html5 lib.

mozilla/bleach#229
ibrechin pushed a commit to ministryofjustice/cla_backend that referenced this issue Jan 31, 2018
This is needed to fix the breaking incompatibility with html5 lib.

mozilla/bleach#229
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants