Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A cleaning pipeline #58

Closed
originell opened this issue Mar 6, 2012 · 3 comments
Closed

A cleaning pipeline #58

originell opened this issue Mar 6, 2012 · 3 comments
Labels
Milestone

Comments

@originell
Copy link
Contributor

One thing that occured to me while working on the BleachSanitizerMixin.sanitize_token is that it's code is getting pretty long and kind of ugly, the more functionality one needs do add.

One possible way to counter this problem might be to implement a "pipeline". I've seen this concept first in django-social-auth which I'm also using in a project. To be more specific this part of the README explains it pretty good. Personally I really love the concept. The code of the default social auth pipeline is here

So what I am proposing is something very much in the vain of social auth's pipeline:

  1. Instead of having to tie into the (big) if condition, we iterate over an iterable with functions::

    PIPELINE = ('SkipAllowedElements', 'StripScripts',...)
  2. While iterating, each function get's called and returns either a cleaned token which gets passed on to the next function in the iterable or None for skip. Note that maybe it might be more expressive to use Exceptions (like SkipToken) here.

This may also clean up a bit of the code I introduce with the strip_script_content fork (like doing self.previous_token = token right before every return)

So, basically this is me requesting for a comment on refactoring things a bit :D – Maybe the same kind of pipeline-logic could be used for #56

@originell
Copy link
Contributor Author

Well OK after rereading #56 a bit, I think this is very very similar to what you mean with callbacks ;-)

@jsocol
Copy link
Contributor

jsocol commented Mar 6, 2012

#56 is specifically about linkify. I'm a little more cavalier with linkify because it's not quite so security-critical as clean.

The current BleachSanitizerMixin and sanitize_token is based very closely on html5lib's HTMLSanitizerMixin. I'm not closed to the idea of changing it completely, but I'm very hesitant, because it's so, so critical, and I like the idea of keeping it close to the known-good algorithm where possible.

@willkg
Copy link
Member

willkg commented Feb 18, 2017

In html5lib >= 0.99999999, sanitizing happens as a filter after tokenizing and you can easily add additional filters. With the rewrite for Bleach 2.0, you can trivially use the BleachSanitizerFilter with other html5lib filters (and ones you write yourself) for a cleaning pipeline.

After I finish the rewrite, I'll verify that's true and add an item to the docs about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants