Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finesse URL properties #1285

Merged
merged 11 commits into from
Sep 21, 2020
Merged

Finesse URL properties #1285

merged 11 commits into from
Sep 21, 2020

Conversation

tomchristie
Copy link
Member

@tomchristie tomchristie commented Sep 12, 2020

Working towards making sure the httpx.URL() is fully consistent around str/bytes and url escaped/unescaped representation.

Refs #1275

Here's where it gets us too...

url = httpx.URL("HTTPS://jo%40email.com:a%20secret@EXAMPLE.com:1234/pa%20th?search=ab#anchorlink")

assert url.scheme == "https"
assert url.username == "jo@email.com"
assert url.password == "a secret"
assert url.userinfo == b"jo%40email.com:a%20secret"
assert url.host == "example.com"
assert url.port == 1234
assert url.netloc == "example.com:1234"
assert url.path == "/pa th"
assert url.query == b"?search=ab"
assert url.raw_path == b"/pa%20th?search=ab"
assert url.fragment == "anchorlink"

The components of a URL are broken down like this:

https://jo%40email.com:a%20secret@EXAMPLE.com:1234/pa%20th?search=ab#anchorlink
[sch]   [  username  ] [password] [   host  ] [po][ path ] [ query ] [fragment]
        [       userinfo        ] [    netloc    ][    raw_path    ]

Note that:

  • url.scheme is normalized to always be lowercased.
  • url.host is normalized to always be lowercased, and is IDNA encoded. For instance:
    assert httpx.URL("http://中国.icom.museum").host == "xn--fiqs8s.icom.museum"
  • url.userinfo is raw bytes, without URL escaping. Usually you'll want to work with
    url.username and url.password instead, which handle the URL escaping.
  • url.raw_path is raw bytes of both the path and query, without URL escaping.
    This portion is used as the target when constructing HTTP requests. Usually you'll
    want to work with url.path instead.
  • url.query is raw bytes, without URL escaping. A URL query string portion can only
    be properly URL escaped when decoding the parameter names and values themselves.

This full set of attributes are also available for use with url.copy_with(...) which type checks its arguments.

@tomchristie tomchristie marked this pull request as ready for review September 18, 2020 11:42
@tomchristie
Copy link
Member Author

tomchristie commented Sep 18, 2020

Some additional notes:

We could have rocked on with .host (unicode) and .raw_host (IDNA escaped bytes), but I've not gone for that because it brings minimal actual utility, and conflicts with other places where addresses might be passed around as (str, int) tuples.

We might want to look at bringing some extra niceties onto the URL class at some point, so...

# Access the parsed `httpx.QueryParams` data structure.
url.params

# And then, include 'params' in `copy_with`.
url = url.copy_with(params={'search': 123})

# Allow the full set of `copy_with` parameters to be used on `__init__`.
# We could handle this pretty minimally using `.copy_with` under the hood.
# We already support `URL(..., params=...)`, so together with the previous
# two items, this could subsume the existing behaviour.
url = httpx.URL(scheme=..., host=..., path=...)

# Neater query parameter manipulations...
url = url.copy_adding_params({'search': '123'})
url = url.copy_removing_params('search') 

@tomchristie tomchristie added enhancement New feature or request user-experience Ensuring that users have a good experience using the library labels Sep 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request user-experience Ensuring that users have a good experience using the library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant