Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new URL parsing #755

Merged
merged 19 commits into from Sep 2, 2019
Merged

new URL parsing #755

merged 19 commits into from Sep 2, 2019

Conversation

@samuelcolvin
Copy link
Owner

samuelcolvin commented Aug 16, 2019

Change Summary

Remove DSN and move to UrlStr to AnyUrl , and much more.

Related issue number

fix #603, fix #541

Checklist

  • Unit tests for the changes exist
  • Tests pass on CI and coverage remains at 100%
  • Documentation reflects the changes where applicable
  • changes/<pull request or issue id>-<github username>.rst file added describing change
    (see changes/README.md for details)
@samuelcolvin samuelcolvin changed the title new URL parsing, new URL parsing Aug 16, 2019
@codecov

This comment has been minimized.

Copy link

codecov bot commented Aug 16, 2019

Codecov Report

Merging #755 into master will not change coverage.
The diff coverage is 100%.

@@          Coverage Diff          @@
##           master   #755   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files          15     16    +1     
  Lines        2723   2786   +63     
  Branches      536    542    +6     
=====================================
+ Hits         2723   2786   +63
samuelcolvin added 12 commits Aug 16, 2019
@samuelcolvin

This comment has been minimized.

Copy link
Owner Author

samuelcolvin commented Aug 17, 2019

For me this is ready except for lots of docs to write.

Feedback very welcome, preferably before I write all the docs.

@samuelcolvin samuelcolvin force-pushed the new-url-parsing branch from 66202ea to 2e055df Aug 17, 2019
@samuelcolvin

This comment has been minimized.

Copy link
Owner Author

samuelcolvin commented Aug 17, 2019

public classes and functions

  • AnyUrl: any scheme allowed, userinfo is optional, tld not required, max length 2^16
  • AnyHttpUrl: schema http or https, userinfo is optional, tld not required, max length 2^16
  • HttpUrl: schema http or https, userinfo is optional, tld required, max length 2083
  • PostgresDsn: schema postgres or postgresql, userinfo required, tld not required, max length 2^16
  • RedisDsn: schema redis, userinfo required, tld not required, max length 2^16
  • urlstr(strip_whitespace: bool = True, min_length: int = 1, max_length: int = 2 ** 16, tld_required: bool = True, allowed_schemes: Optional[Set[str]] = None) maybe should rename to stricturl()?

Other changes

  • international domains are allowed but encoded with punycode: 'https://www.аррӏе.com/' -> 'https://www.xn--80ak6aa92e.com/', 'https://exampl£e.org' -> 'https://xn--example-gia.org'
  • AnyUrl and subclasses are a subclass of str, but have extra properties: 'scheme', 'user', 'password', 'host', 'tld', 'host_type', 'port', 'path', 'query', 'fragment' to allow easier interpretation or further validation
  • error messages for invalid urls should be more helpful, eg. saying which part is invalid
  • DSN is removed, use AnyUrl or PostgresDsn or similar as above.

known edge cases

underscores are now allowed in all parts of a domain except the tld. Technically I think this might be slightly wrong - I think in theory the hostname cannot have underscores but subdomains can. However, consider the following two cases:

  • exam_ple.co.uk hostname is exam_ple, should not be allowed as there's an underscore in there
  • foo_bar.example.com hostname is example should be allowed since the underscore is in the subdomain

Without having an exhaustive list of TLDs it would be impossible to differentiate between these two. Therefore underscores are allowed, you could do further validation in a validator if you wanted.

Also, chrome currently accepts http://exam_ple.com as a URL, so we're in good (or at least big) company.

samuelcolvin added 5 commits Sep 1, 2019
@samuelcolvin samuelcolvin force-pushed the new-url-parsing branch from 7741640 to 11089af Sep 1, 2019
@samuelcolvin samuelcolvin merged commit 7901711 into master Sep 2, 2019
11 checks passed
11 checks passed
Header rules No header rules processed
Details
Pages changed All files already uploaded
Details
Redirect rules No redirect rules processed
Details
Mixed content No mixed content detected
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
deploy/netlify Deploy preview ready!
Details
pyup.io/safety-ci No dependencies with known security vulnerabilities.
Details
samuelcolvin.pydantic Build #20190901.13 succeeded
Details
samuelcolvin.pydantic (Job Python36) Job Python36 succeeded
Details
samuelcolvin.pydantic (Job Python37) Job Python37 succeeded
Details
@samuelcolvin samuelcolvin deleted the new-url-parsing branch Sep 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.