New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LinkifyFilter not working for URLs with ampersands #374

Closed
Alex3917 opened this Issue Jun 12, 2018 · 4 comments

Comments

Projects
None yet
3 participants
@Alex3917

Alex3917 commented Jun 12, 2018

Example with bleach 2.1.3:

from bleach import Cleaner
from bleach.linkifier import LinkifyFilter

url1 = 'http://a.co?b=1&c=2'
url2 = 'http://a.co?b=1&c=2'

cleaner = Cleaner(filters=[LinkifyFilter])
cleaner.clean(url1)
cleaner.clean(url2)

Both result in <a href="http://a.co?b=1">http://a.co?b=1</a>&amp;c=2

@Alex3917 Alex3917 changed the title from LinkifyFilter not working with URLs that have ampersands to LinkifyFilter not working for URLs with ampersands Jun 12, 2018

@willkg

This comment has been minimized.

Show comment
Hide comment
@willkg

willkg Jun 14, 2018

Member

I don't generally spend time on the linkify stuff. If you or someone else wanted to tackle this, then that'd be great!

Member

willkg commented Jun 14, 2018

I don't generally spend time on the linkify stuff. If you or someone else wanted to tackle this, then that'd be great!

@Alex3917

This comment has been minimized.

Show comment
Hide comment
@Alex3917

Alex3917 Jun 14, 2018

I'll look into it, can't guarantee I'll be able to figure it out though.

Also, FWIW it seems to be the same situation for URLs with # characters (fragments). It seems like the regex isn't the issue because linkifying works when just running bleach.linkify.

Alex3917 commented Jun 14, 2018

I'll look into it, can't guarantee I'll be able to figure it out though.

Also, FWIW it seems to be the same situation for URLs with # characters (fragments). It seems like the regex isn't the issue because linkifying works when just running bleach.linkify.

peterbe added a commit to peterbe/django-peterbecom that referenced this issue Oct 10, 2018

@peterbe

This comment has been minimized.

Show comment
Hide comment
@peterbe

peterbe Oct 10, 2018

For the record, I fell into this trap too.

Unfortunately, I don't have time to help here upstream but I made myself a quick workaround for this issue. In that context, speed doesn't matter because these are called so rarely.

peterbe commented Oct 10, 2018

For the record, I fell into this trap too.

Unfortunately, I don't have time to help here upstream but I made myself a quick workaround for this issue. In that context, speed doesn't matter because these are called so rarely.

peterbe added a commit to peterbe/django-peterbecom that referenced this issue Oct 10, 2018

@willkg

This comment has been minimized.

Show comment
Hide comment
@willkg

willkg Oct 10, 2018

Member

Maybe LinkifyFilter needs to join consecutive Characters tokens and then do its thing?

Member

willkg commented Oct 10, 2018

Maybe LinkifyFilter needs to join consecutive Characters tokens and then do its thing?

willkg added a commit to willkg/bleach that referenced this issue Oct 10, 2018

Merge Characters tokens
The sanitizer causes fracturing of Characters tokens. That causes problems
with anything downstream in the filters because now the things they're
looking for can be split across token boundaries.

Instead of dealing with that, this fixes the sanitizer to merge characters
tokens as they're being yielded.

Fixes mozilla#374

@willkg willkg removed the needs-your-help label Oct 10, 2018

@willkg willkg closed this in #410 Oct 11, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment