Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in the URL regex used to validate AnyUrl fields #1115

Closed
parthjoshi2007 opened this issue Dec 19, 2019 · 0 comments · Fixed by #1175
Closed

Bug in the URL regex used to validate AnyUrl fields #1115

parthjoshi2007 opened this issue Dec 19, 2019 · 0 comments · Fixed by #1175
Labels
bug V1 Bug related to Pydantic V1.X

Comments

@parthjoshi2007
Copy link

Bug

Please complete:

  • OS: Ubuntu 18.04
  • Python version import sys; print(sys.version): 3.7
  • Pydantic version import pydantic; print(pydantic.VERSION): 1.1

Please read the docs and search through issues to
confirm your bug hasn't already been reported.

Where possible please include a self contained code snippet describing your bug:

import pydantic

class Item(pydantic.Model):
    url: pydantic.HttpUrl = ''

item = Item(url='http://twitter.com/@handle')

The above snippet throws an error because of a bug in the url_regex used by all AnyUrl subclasses that makes the validator think that twitter.com/ is the username in this URL because of the presence of @ in the path. This can be fixed trivially by adding / to the characters that cannot be in username (or for that matter, password).

So in pydantic/networks.py, the following line would change from:

url_regex = re.compile(
    r'(?:(?P<scheme>[a-z0-9]+?)://)?'  # scheme
    r'(?:(?P<user>[^\s:]+)(?::(?P<password>\S*))?@)?'  # user info
    r'(?:'
    r'(?P<ipv4>(?:\d{1,3}\.){3}\d{1,3})|'  # ipv4
    r'(?P<ipv6>\[[A-F0-9]*:[A-F0-9:]+\])|'  # ipv6
    r'(?P<domain>[^\s/:?#]+)'  # domain, validation occurs later
    r')?'
    r'(?::(?P<port>\d+))?'  # port
    r'(?P<path>/[^\s?]*)?'  # path
    r'(?:\?(?P<query>[^\s#]+))?'  # query
    r'(?:#(?P<fragment>\S+))?',  # fragment
    re.IGNORECASE,
)

to

url_regex = re.compile(
    r'(?:(?P<scheme>[a-z0-9]+?)://)?'  # scheme
    r'(?:(?P<user>[^\s:/]+)(?::(?P<password>[^\s/]*))?@)?'  # user info
    r'(?:'
    r'(?P<ipv4>(?:\d{1,3}\.){3}\d{1,3})|'  # ipv4
    r'(?P<ipv6>\[[A-F0-9]*:[A-F0-9:]+\])|'  # ipv6
    r'(?P<domain>[^\s/:?#]+)'  # domain, validation occurs later
    r')?'
    r'(?::(?P<port>\d+))?'  # port
    r'(?P<path>/[^\s?]*)?'  # path
    r'(?:\?(?P<query>[^\s#]+))?'  # query
    r'(?:#(?P<fragment>\S+))?',  # fragment
    re.IGNORECASE,
)
@parthjoshi2007 parthjoshi2007 added the bug V1 Bug related to Pydantic V1.X label Dec 19, 2019
samuelcolvin added a commit that referenced this issue Jan 17, 2020
* regex for username and password in URLs, fix #1115

* fix linting
andreshndz pushed a commit to cuenca-mx/pydantic that referenced this issue Jan 17, 2020
* regex for username and password in URLs, fix pydantic#1115

* fix linting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug V1 Bug related to Pydantic V1.X
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant