Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new URL parsing #755

Merged
merged 19 commits into from Sep 2, 2019
Merged

new URL parsing #755

merged 19 commits into from Sep 2, 2019

Conversation

samuelcolvin
Copy link
Collaborator

@samuelcolvin samuelcolvin commented Aug 16, 2019

Change Summary

Remove DSN and move to UrlStr to AnyUrl , and much more.

Related issue number

fix #603, fix #541

Checklist

  • Unit tests for the changes exist
  • Tests pass on CI and coverage remains at 100%
  • Documentation reflects the changes where applicable
  • changes/<pull request or issue id>-<github username>.rst file added describing change
    (see changes/README.md for details)

@samuelcolvin samuelcolvin changed the title new URL parsing, new URL parsing Aug 16, 2019
@codecov
Copy link

codecov bot commented Aug 16, 2019

Codecov Report

Merging #755 into master will not change coverage.
The diff coverage is 100%.

@@          Coverage Diff          @@
##           master   #755   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files          15     16    +1     
  Lines        2723   2786   +63     
  Branches      536    542    +6     
=====================================
+ Hits         2723   2786   +63

@samuelcolvin
Copy link
Collaborator Author

samuelcolvin commented Aug 17, 2019

For me this is ready except for lots of docs to write.

Feedback very welcome, preferably before I write all the docs.

@samuelcolvin
Copy link
Collaborator Author

samuelcolvin commented Aug 17, 2019

public classes and functions

  • AnyUrl: any scheme allowed, userinfo is optional, tld not required, max length 2^16
  • AnyHttpUrl: schema http or https, userinfo is optional, tld not required, max length 2^16
  • HttpUrl: schema http or https, userinfo is optional, tld required, max length 2083
  • PostgresDsn: schema postgres or postgresql, userinfo required, tld not required, max length 2^16
  • RedisDsn: schema redis, userinfo required, tld not required, max length 2^16
  • urlstr(strip_whitespace: bool = True, min_length: int = 1, max_length: int = 2 ** 16, tld_required: bool = True, allowed_schemes: Optional[Set[str]] = None) maybe should rename to stricturl()?

Other changes

  • international domains are allowed but encoded with punycode: 'https://www.аррӏе.com/' -> 'https://www.xn--80ak6aa92e.com/', 'https://exampl£e.org' -> 'https://xn--example-gia.org'
  • AnyUrl and subclasses are a subclass of str, but have extra properties: 'scheme', 'user', 'password', 'host', 'tld', 'host_type', 'port', 'path', 'query', 'fragment' to allow easier interpretation or further validation
  • error messages for invalid urls should be more helpful, eg. saying which part is invalid
  • DSN is removed, use AnyUrl or PostgresDsn or similar as above.

known edge cases

underscores are now allowed in all parts of a domain except the tld. Technically I think this might be slightly wrong - I think in theory the hostname cannot have underscores but subdomains can. However, consider the following two cases:

  • exam_ple.co.uk hostname is exam_ple, should not be allowed as there's an underscore in there
  • foo_bar.example.com hostname is example should be allowed since the underscore is in the subdomain

Without having an exhaustive list of TLDs it would be impossible to differentiate between these two. Therefore underscores are allowed, you could do further validation in a validator if you wanted.

Also, chrome currently accepts http://exam_ple.com as a URL, so we're in good (or at least big) company.

tests/test_networks.py Show resolved Hide resolved
tests/test_networks.py Show resolved Hide resolved
@samuelcolvin samuelcolvin merged commit 7901711 into master Sep 2, 2019
11 checks passed
@samuelcolvin samuelcolvin deleted the new-url-parsing branch Sep 2, 2019
PrettyWood added a commit to ToucanToco/toucan-connectors that referenced this issue Jan 24, 2020
PrettyWood added a commit to ToucanToco/toucan-connectors that referenced this issue Jan 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants