New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow method chaining without rebuilding the tree #46

Closed
jsocol opened this Issue Oct 21, 2011 · 2 comments

Comments

Projects
None yet
2 participants
@jsocol
Member

jsocol commented Oct 21, 2011

Right now, if you chain methods (e.g. linkify(clean(text))) the whole tree-building, which is expensive, has to happen twice.

I can see two ways of solving this. One is a total API rewrite, one is less complicated. I'm leaning toward the latter.

Total rewrite:

Recreate a Bleach class, but this time it is the text you want to clean, e.g.:

from bleach import Bleach
text = Bleach(text)
print text.clean()

It could store state of what's already been done (like cleaning, linkifying, etc) and either return text (as above) or return itself to allow chaining, e.g.:

# Implement __unicode__ and __str__, seems nicer than above.
print Bleach(text).clean().linkify()

Less-crazy idea:

Instead of actual unicode objects, the top-level methods return a lazy object that looks like a string, maybe even descends from unicode, but is smarter, stores the tree, and only renders itself if anything needs to treat it as a string.

Then the methods would be reworked to check for the tree before rebuilding it, and if found, they would just use the existing tree. clean() might need special handling--i.e. always go back to text first--because of the nature of what it does. But I generally say you should always call clean() first if you're chaining, anyway.

Pros and cons

The rewrite might make it easier to deal with the increasing complexity of linkify(). You could (again) subclass Bleach and override a number of things there. I don't love APIs where you have to subclass things. I'm also not sure how complex this would make the upgrade for consumers--it would be a major version change.

The less-crazy idea doesn't involve subclassing anything, ever. The API can stay the same and it can be a minor version change. I am guessing won't be too difficult to write some sort of LazyTreeStringObjectThing class. It would be a good opportunity to do some refactoring and get some stuff out of bleach/__init__.py. (I wouldn't change import paths, just where the code lives.)

@jsocol

This comment has been minimized.

Show comment
Hide comment
@jsocol

jsocol Oct 21, 2011

Member

The Bleach class would allow me to enforce calling clean() first, by throwing an error if you call clean() on something that already has a tree built. That might not be necessary. I could also just issue a warning, then render to text and rebuild the tree.

Member

jsocol commented Oct 21, 2011

The Bleach class would allow me to enforce calling clean() first, by throwing an error if you call clean() on something that already has a tree built. That might not be necessary. I could also just issue a warning, then render to text and rebuild the tree.

@willkg willkg modified the milestones: v1.6, v2.0 Oct 31, 2016

@willkg

This comment has been minimized.

Show comment
Hide comment
@willkg

willkg Mar 3, 2017

Member

I'm going to look at redoing linkify for Bleach 2.0. First thing I want to look into is converting it to an html5lib Filter. Doing that covers the underlying issues here, plus it deals with all the order of operations issues.

Member

willkg commented Mar 3, 2017

I'm going to look at redoing linkify for Bleach 2.0. First thing I want to look into is converting it to an html5lib Filter. Doing that covers the underlying issues here, plus it deals with all the order of operations issues.

@willkg willkg referenced this issue Mar 4, 2017

Merged

Linkify #260

@willkg willkg closed this in #260 Mar 4, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment