Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ASCII digits in port number parsing #207

Closed
wants to merge 1 commit into from

Conversation

zopieux
Copy link
Contributor

@zopieux zopieux commented Jun 18, 2016

Currently, bleach accepts (linkyfies) the following “URLs”:

  • http://foo.com:𝟠𝟘𝟠𝟘/
  • http://foo.com:٣٩٩٩/

That is because the \d modifier of Python unicode regular expressions matches all digits in the Nd block (search for category [Nd] here), and there are many things in there.

This PR replaces \d with [0-9]+ and two test URLs to assert these links are not recognized.

Note: an alternative would be to use the re.ASCII flag but:

  • it's hard to be compatible with Python 2
  • it may not be appropriate because we do want to match funky unicode chars in other parts of the URL

Note: funnily enough, GitHub-flavored markdown also linkyfies them! http://foo.com:𝟠𝟘𝟠𝟘/

@willkg willkg added this to the v1.5 milestone Sep 26, 2016
@willkg
Copy link
Member

willkg commented Sep 26, 2016

Tossing this in the v1.5 milestone. I'll look through it next week.

@willkg
Copy link
Member

willkg commented Oct 31, 2016

I broke this when I switched to py.test. I fixed the commit in PR #225. Closing this out in favor of that one.

@willkg willkg closed this Oct 31, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants