A cleaning pipeline #58

originell · 2012-03-06T16:22:22Z

One thing that occured to me while working on the BleachSanitizerMixin.sanitize_token is that it's code is getting pretty long and kind of ugly, the more functionality one needs do add.

One possible way to counter this problem might be to implement a "pipeline". I've seen this concept first in django-social-auth which I'm also using in a project. To be more specific this part of the README explains it pretty good. Personally I really love the concept. The code of the default social auth pipeline is here

So what I am proposing is something very much in the vain of social auth's pipeline:

Instead of having to tie into the (big) if condition, we iterate over an iterable with functions::
```
PIPELINE = ('SkipAllowedElements', 'StripScripts',...)
```
While iterating, each function get's called and returns either a cleaned token which gets passed on to the next function in the iterable or None for skip. Note that maybe it might be more expressive to use Exceptions (like SkipToken) here.

This may also clean up a bit of the code I introduce with the strip_script_content fork (like doing self.previous_token = token right before every return)

So, basically this is me requesting for a comment on refactoring things a bit :D – Maybe the same kind of pipeline-logic could be used for #56

The text was updated successfully, but these errors were encountered:

originell · 2012-03-06T16:24:08Z

Well OK after rereading #56 a bit, I think this is very very similar to what you mean with callbacks ;-)

jsocol · 2012-03-06T21:04:34Z

#56 is specifically about linkify. I'm a little more cavalier with linkify because it's not quite so security-critical as clean.

The current BleachSanitizerMixin and sanitize_token is based very closely on html5lib's HTMLSanitizerMixin. I'm not closed to the idea of changing it completely, but I'm very hesitant, because it's so, so critical, and I like the idea of keeping it close to the known-good algorithm where possible.

willkg · 2017-02-18T02:40:32Z

In html5lib >= 0.99999999, sanitizing happens as a filter after tokenizing and you can easily add additional filters. With the rewrite for Bleach 2.0, you can trivially use the BleachSanitizerFilter with other html5lib filters (and ones you write yourself) for a cleaning pipeline.

After I finish the rewrite, I'll verify that's true and add an item to the docs about it.

originell mentioned this issue May 15, 2012

Remove <script> tags with their content. #57

Closed

jsocol mentioned this issue Sep 24, 2012

Add the ability to restrict/rewrite URLs for href and src attributes #76

Closed

jsocol mentioned this issue Jan 31, 2014

Wishlist: Add the ability to tell if something needed to be "cleaned" or not. #109

Closed

willkg added this to the v2.0 milestone Feb 18, 2017

This was referenced Feb 24, 2017

Support linkify() callbacks to replace attributes for tags other than <a> #243

Closed

Implement ability to use Filters in cleaning #259

Merged

willkg closed this as completed in #259 Mar 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A cleaning pipeline #58

A cleaning pipeline #58

originell commented Mar 6, 2012

originell commented Mar 6, 2012

jsocol commented Mar 6, 2012

willkg commented Feb 18, 2017

A cleaning pipeline #58

A cleaning pipeline #58

Comments

originell commented Mar 6, 2012

originell commented Mar 6, 2012

jsocol commented Mar 6, 2012

willkg commented Feb 18, 2017