Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linkify creates http link in mailto link when host contains a hyphen #300

Closed
sedrubal opened this issue Aug 20, 2017 · 10 comments · Fixed by #302
Closed

Linkify creates http link in mailto link when host contains a hyphen #300

sedrubal opened this issue Aug 20, 2017 · 10 comments · Fixed by #302

Comments

@sedrubal
Copy link
Contributor

sedrubal commented Aug 20, 2017

Linkify creates http links in mailto links if the mail host name contains a hyphen - (and parse_email=True). The http link starts with the hyphen, even if host names mustn't start with hyphens at all (https://en.wikipedia.org/wiki/Hostname#Restrictions_on_valid_hostnames).

>>> linkify('foo@exa-mple.com', parse_email=True)
'<a href="mailto:foo@exa-mple.com">foo@exa<a href="http://-mple.com" rel="nofollow">-mple.com</a></a>'

This is what the link looks like: foo@exa-mple.com 😄

@willkg
Copy link
Member

willkg commented Aug 20, 2017

I don't think this is something I'm going to get to fixing any time soon. I'll happily accepts PRs, though.

@sedrubal
Copy link
Contributor Author

Do you have any idea, why this happens?

@willkg
Copy link
Member

willkg commented Aug 21, 2017

Not offhand. You'd have to go through the code and debug it. Sorry!

@sedrubal
Copy link
Contributor Author

I found out that the problem also exists if parse_email=False:

>>> linkify('foo@bar-baz.com')
'foo@bar<a href="http://-baz.com" rel="nofollow">-baz.com</a>'

Maybe this helps locating the issue...

@sedrubal
Copy link
Contributor Author

Ah I could fix the outcome of this bug like described in the first message 🔝 (but I don't know how to solve the root cause (last message 🔼 )) by applying this patch:

diff --git a/bleach/linkifier.py b/bleach/linkifier.py
index fc346c3..b8f801e 100644
--- a/bleach/linkifier.py
+++ b/bleach/linkifier.py
@@ -344,7 +344,17 @@ class LinkifyFilter(Filter):

     def handle_links(self, src_iter):
         """Handle links in character tokens"""
+        in_a = False  # happens, if parse_email=True and if a mail was found
         for token in src_iter:
+            if in_a:
+                if token['type'] == 'EndTag' and token['name'] == 'a':
+                    in_a = False
+                yield token
+                continue
+            elif token['type'] == 'StartTag' and token['name'] == 'a':
+                in_a = True
+                yield token
+                continue
             if token['type'] == 'Characters':
                 text = token['data']
                 new_tokens = []

Is this (and some test cases) enough for a pull request or do you rather want to fix the bug, that parts of mail addresses must not be interpreted as http links?

@willkg
Copy link
Member

willkg commented Aug 24, 2017

I'm wondering if the description and #300 (comment) are actually two different issues. Are the causes related? If not, then maybe the second one should be a separate issue. We can keep them in this issue, too. At this point, whatever bookkeeping is easier for you is fine with me.

Pull requests to fix things are always helpful! I think it's fine to do a PR to cover one part of this issue and another PR to cover another part later.

Thank you for looking into this!

@sedrubal
Copy link
Contributor Author

I think, if we could fix the issue, that linkify linkifies the host part from mail addresses as normal link, the issue I described first will automatically be fixed. But unfortunately I didn't understand the regular expressions.

sedrubal added a commit to sedrubal/bleach that referenced this issue Aug 27, 2017
Partly fixes mozilla#300:

```py
>>> linkify('foo@exa-mple.com', parse_email=True)
'<a href="mailto:foo@exa-mple.com">foo@exa<a href="http://-mple.com" rel="nofollow">-mple.com</a></a>'
'<a href="mailto:foo@exa-mple.com">foo@exa-mple.com</a>'
```

but it does not fix this issue:

```py
>>> linkify('foo@bar-baz.com')
'foo@bar<a href="http://-baz.com" rel="nofollow">-baz.com</a>'
```
@sedrubal
Copy link
Contributor Author

I'm ok that you closed this issue but are you aware that #302 did not fix the issue I tried to explain above?
And I also want to know, when you're planning to release the next version of bleach, as this is currently a blocker for me using bleach in another project 😉

@willkg willkg added this to the v2.1 milestone Sep 25, 2017
@willkg
Copy link
Member

willkg commented Sep 25, 2017

@sedrubal When you have "fixes #xyz" in a PR comment, then GitHub closes the issue automatically.

In #300 (comment), I suggested this is two issues and we should open up a new issue, but I would leave the decision on that to you. You didn't say anything, so I figured between you marking the PR as having fixed this issue, then you were satisfied with everything being done.

It'd help me if you opened up a new issue with a failing test of the outstanding things.

Regarding releases, you can check the milestone this is assigned to.

Thank you!

@sedrubal
Copy link
Contributor Author

Oops I'm sorry. I opened a new one (#322). Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants