-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The urlize
function does not compose parentheses correctly at the end
#827
Comments
In the utils.py, this issue is solved if you remove ')' from the 24th line:
to:
The ')' is added to the 'trail'-group instead of the 'middle'-group at line 213. |
@norela The consequence will be on inputs like the following that wrap an entire URL in parentheses (i.e., the parentheses are not part of the URL). >>> url_text = "Hello world (http://www.foo.org)."
>>> urlize(url_text) # with regex suggestion above by @norela
'Hello world (<a href="http://www.foo.org)">http://www.foo.org)</a>.' This issue is trickier than it seems. If we assume that if a URL contains parentheses, they must be balanced and cannot contain nested parentheses, we could add a quick fix after line 213 that looks something like: ...
lead, middle, trail = match.groups()
# naive check for balanced parens
if len(middle) > 0 and '(' in middle and len(trail) > 0 and trail[0] == ")":
middle = middle + trail[0]
trail = trail[1:]
if middle.startswith('www.') or (
... Unfortunately, the URL standard doesn't require parentheses be balanced and does not disallow nested parentheses, so this solution would be temporary and not robust. If we instead assume that non-URL text should contain balanced parentheses, then an alternative potential solution may be to check |
What about something like this.
returns: found a regex to extract the url from a string on stackoverflow: |
How does it perform with the following input? string = "Foo (See http://de.wikipedia.org/wiki/Agilit%C3%A4t_(Management)) Bar" |
Also, check out this Article by Jeff Atwood on this very issue |
I see, thanks.
|
I think you're on the right track. Still seeing it fail on this case: >>> string = "Foo (See http://de.wikipedia.org/wiki/Agilit%C3%A4t_(Management)) Bar"
>>> urlize(string) # includes your proposed fix
Foo (See <a href='http://de.wikipedia.org/wiki/Agilit%C3%A4t_(Management'>http://de.wikipedia.org/wiki/Agilit%C3%A4t_(Management</a>) Bar You'll notice it omitted the closing parenthesis in the URL -- i.e.,
instead of
|
Thanks for the feedback. Furthermore, I think the check should be recursive too.
|
Sorry for the delay. It seems reasonable. What do @mitsuhiko and @davidism think about this and the underlying assumptions? |
Expected Behavior
Final parentheses should be within the tag and link should be exactly as given before filtering.
Actual Behavior
Final parentheses is outside the tag and final parentheses is lost in url construction
Template Code
Your Environment
The text was updated successfully, but these errors were encountered: