Skip to content

Support unicode and punycode when validating TLDs #1182

@jamescurtin

Description

@jamescurtin

Feature Request

Output of python -c "import pydantic.utils; print(pydantic.utils.version_info())":

pydantic version: 1.3
            pydantic compiled: True
                 install path: /usr/local/lib/python3.8/site-packages/pydantic
               python version: 3.8.1 (default, Jan  3 2020, 22:55:55)  [GCC 8.3.0]
                     platform: Linux-4.9.184-linuxkit-x86_64-with-glibc2.2.5
     optional deps. installed: ['typing-extensions']

The IANA has approved many TLDs that are not matched by the TLD domain ending regex used for HttpUrl validation. There are currently ~152 such TLDs: see the entries in the authoritative list of TLDs containing the ASCII Compatible Encoding prefix xn--.

One approach to adding compatibility for such TLDs would be to modify the domain ending regex pattern to allow for Unicode characters, as well as the corresponding internationalized ASCII strings. For example:

_domain_ending = r"(?P<tld>(\.[^\W\d_]{2,63})|(\.(?:xn--)[_0-9a-z-]{2,63}))?\.?"

Such a change would allow for the following to run successfully

from pydantic import BaseModel, HttpUrl, ValidationError 


class Domain(BaseModel):
    domain: HttpUrl
        

ascii_domains = ["https://example.com"]

idna_domains = [
    "https://example.xn--p1ai",
    "https://example.xn--vermgensberatung-pwb",
    "https://example.xn--zfr164b",
]

unicode_domains = [str.encode(domain).decode("idna") for domain in idna_domains]
        
valid_domains = ascii_domains + idna_domains + unicode_domains

invalid_domains = ["https://example.123", "https://example.ab34"]

for domain in valid_domains:
    Domain(domain=domain)
    
for invalid_domain in invalid_domains:
    try:
        Domain(domain=invalid_domain)
    except ValidationError:
        pass

Would you accept a PR for this?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions