Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug with merging base_url with relative path if colon present in path #1498

Closed
2 tasks done
gtors opened this issue Mar 4, 2021 · 3 comments · Fixed by #1532
Closed
2 tasks done

Bug with merging base_url with relative path if colon present in path #1498

gtors opened this issue Mar 4, 2021 · 3 comments · Fixed by #1532
Labels
bug Something isn't working user-experience Ensuring that users have a good experience using the library

Comments

@gtors
Copy link

gtors commented Mar 4, 2021

Checklist

  • The bug is reproducible against the latest release and/or master.
  • There are no similar issues or pull requests to fix it yet.

Describe the bug

After upgrading to httpx 0.17.0 from httpx 0.16.1, the base_url merge with relative paths containing a colon was broken.

To reproduce

import httpx

client = httpx.Client(base_url="https://example.org/some/path")
client.build_request("POST", "/sub:path")  # colon in path

# <Request('POST', 'sub:path')> instead of <Request('POST', 'https://example.org/some/path/sub%3Apath')>

also, calling the get method leads to the following exception:

In [6]: client.get("/sub:path")
---------------------------------------------------------------------------
UnsupportedProtocol                       Traceback (most recent call last)
~/.local/share/virtualenvs/bug-IVcS609A/lib/python3.9/site-packages/httpx/_exceptions.py in map_exceptions(mapping, **kwargs)
    325     try:
--> 326         yield
    327     except Exception as exc:

~/.local/share/virtualenvs/bug-IVcS609A/lib/python3.9/site-packages/httpx/_client.py in _send_single_request(self, request, timeout)
    855         with map_exceptions(HTTPCORE_EXC_MAP, request=request):
--> 856             (status_code, headers, stream, ext) = transport.request(
    857                 request.method.encode(),

~/.local/share/virtualenvs/bug-IVcS609A/lib/python3.9/site-packages/httpx/_transports/default.py in request(self, method, url, headers, stream, ext)
    101     ) -> typing.Tuple[int, Headers, httpcore.SyncByteStream, dict]:
--> 102         return self._pool.request(method, url, headers=headers, stream=stream, ext=ext)
    103

~/.local/share/virtualenvs/bug-IVcS609A/lib/python3.9/site-packages/httpcore/_sync/connection_pool.py in request(self, method, url, headers, stream, ext)
    190             scheme = url[0].decode("latin-1")
--> 191             raise UnsupportedProtocol(f"Unsupported URL protocol {scheme!r}")
    192         if not url[1]:

UnsupportedProtocol: Unsupported URL protocol 'sub'

The above exception was the direct cause of the following exception:

UnsupportedProtocol                       Traceback (most recent call last)
<ipython-input-6-0ea8464e1217> in <module>
----> 1 client.get("/sub:path")

~/.local/share/virtualenvs/bug-IVcS609A/lib/python3.9/site-packages/httpx/_client.py in get(self, url, params, headers, cookies, auth, allow_redirects, timeout)
    900         **Parameters**: See `httpx.request`.
    901         """
--> 902         return self.request(
    903             "GET",
    904             url,

~/.local/share/virtualenvs/bug-IVcS609A/lib/python3.9/site-packages/httpx/_client.py in request(self, method, url, content, data, files, json, params, headers, cookies, auth, allow_redirects, timeout)
    726             cookies=cookies,
    727         )
--> 728         return self.send(
    729             request, auth=auth, allow_redirects=allow_redirects, timeout=timeout
    730         )

~/.local/share/virtualenvs/bug-IVcS609A/lib/python3.9/site-packages/httpx/_client.py in send(self, request, stream, auth, allow_redirects, timeout)
    760         auth = self._build_request_auth(request, auth)
    761
--> 762         response = self._send_handling_auth(
    763             request,
    764             auth=auth,

~/.local/share/virtualenvs/bug-IVcS609A/lib/python3.9/site-packages/httpx/_client.py in _send_handling_auth(self, request, auth, timeout, allow_redirects, history)
    798
    799         while True:
--> 800             response = self._send_handling_redirects(
    801                 request,
    802                 timeout=timeout,

~/.local/share/virtualenvs/bug-IVcS609A/lib/python3.9/site-packages/httpx/_client.py in _send_handling_redirects(self, request, timeout, allow_redirects, history)
    830                 )
    831
--> 832             response = self._send_single_request(request, timeout)
    833             response.history = list(history)
    834

~/.local/share/virtualenvs/bug-IVcS609A/lib/python3.9/site-packages/httpx/_client.py in _send_single_request(self, request, timeout)
    854
    855         with map_exceptions(HTTPCORE_EXC_MAP, request=request):
--> 856             (status_code, headers, stream, ext) = transport.request(
    857                 request.method.encode(),
    858                 request.url.raw,

/usr/lib/python3.9/contextlib.py in __exit__(self, type, value, traceback)
    133                 value = type()
    134             try:
--> 135                 self.gen.throw(type, value, traceback)
    136             except StopIteration as exc:
    137                 # Suppress StopIteration *unless* it's the same exception that

~/.local/share/virtualenvs/bug-IVcS609A/lib/python3.9/site-packages/httpx/_exceptions.py in map_exceptions(mapping, **kwargs)
    341
    342         message = str(exc)
--> 343         raise mapped_exc(message, **kwargs) from exc  # type: ignore
    344
    345

UnsupportedProtocol: Unsupported URL protocol 'sub'

Expected behavior

client.build_request("POST", "/sub:path") returns <Request('POST', 'https://example.org/some/path/sub%3Apath')>

client.get("/sub:path") not raises UnsupportedProtocol exception

Actual behavior

client.build_request("POST", "/sub:path") returns <Request('POST', 'sub:path')>

client.get("/sub:path") raises UnsupportedProtocol exception

Environment

  • OS: Linux
  • Python version: 3.9
  • HTTPX version: 0.17.0
  • Async environment: asyncio
  • HTTP proxy: no
  • Custom certificates: no
@florimondmanca
Copy link
Member

florimondmanca commented Mar 4, 2021

@gtors Thanks for reporting :-)

Indeed, we've got an issue with interpreting colons when merging URLs:

>>> import httpx
>>> httpx.Request("GET", "https://example.org/sub:path").url.raw
(b'https', b'example.org', None, b'/sub:path')  # ✅
>>> httpx.Client(base_url="https://example.org").build_request("GET", "/sub:path").url.raw
(b'sub', b'', None, b'path')  # ❌
>>> httpx.Client(base_url="https://example.org").build_request("GET", "sub:path").url.raw
(b'sub', b'', None, b'path')  # ✅ (Surprising, but in line with RFC 3986: `http:path` is treated as an authority-less HTTP URL)
>>> httpx.Client(base_url="https://example.org").build_request("GET", "/sub%3Apath").url.raw
(b'https', b'example.org', None, b'/sub%3Apath')  # ✅
>>> httpx.Client(base_url="https://example.org").build_request("GET", "/ok").url.raw
(b'https', b'example.org', None, b'/ok')  # ✅

In 0.16.1 we get:

>>> import httpx
>>> httpx.Request("GET", "https://example.org/sub:path").url.raw
(b'https', b'example.org', None, b'/sub:path')  # ✅
>>> httpx.Client(base_url="https://example.org").build_request("GET", "/sub:path").url.raw
(b'https', b'example.org', None, b'/sub%3Apath')  # ✅
>>> httpx.Client(base_url="https://example.org").build_request("GET", "sub:path").url.raw
(b'sub', b'', None, b'path')  # ✅ (Surprising, but in line with RFC 3986: `http:path` is treated as an authority-less HTTP URL)
>>> httpx.Client(base_url="https://example.org").build_request("GET", "/sub%3Apath").url.raw
(b'https', b'example.org', None, b'/sub%3Apath')  # ✅
>>> httpx.Client(base_url="https://example.org").build_request("GET", "/ok").url.raw
(b'https', b'example.org', None, b'/ok')  # ✅

So indeed, it looks like something like /sub:path is treated as the absolute URL sub:path in 0.17.0, when it was not the case in 0.16.1.

In test form:

client = httpx.Client(base_url="https://www.example.com")
request = client.build_request("GET", "/sub:path")
assert request.url == "https://www.example.com/sub%3Apath"

Raises:

E       AssertionError: assert URL('sub:path') == 'https://www.example.com/sub%3Apath'
E        +  where URL('sub:path') = <Request('GET', 'sub:path')>.url

Seems to come from this operation, where we strip / from /sub:path:

httpx/httpx/_client.py

Lines 328 to 331 in 0f280af

# We always ensure the base_url paths include the trailing '/',
# and always strip any leading '/' from the merge URL.
merge_url = merge_url.copy_with(raw_path=merge_url.raw_path.lstrip(b"/"))
return self.base_url.join(merge_url)

Leading to joining sub:path with the base URL using rfc3986:

httpx/httpx/_models.py

Lines 399 to 403 in 8696405

# We drop any fragment portion, because RFC 3986 strictly
# treats URLs with a fragment portion as not being absolute URLs.
base_uri = self._uri_reference.copy_with(fragment=None)
relative_url = URL(url)
return URL(relative_url._uri_reference.resolve_with(base_uri).unsplit())

Refs #1407, which is new for 0.17.0.

Not sure yet how to resolve, or if we don't just want to disallow sub:path, but at least here's some more debugging. :-)

@florimondmanca florimondmanca added the user-experience Ensuring that users have a good experience using the library label Mar 4, 2021
@florimondmanca
Copy link
Member

cc @tomchristie — in case you've got any thoughts on this.

@tomchristie
Copy link
Member

tomchristie commented Mar 23, 2021

I guess the first step here is a failing test case.

I think the following change would work based on a bit of testing in the console.

        merge_url = URL(url)
        if merge_url.is_relative_url:
            # When merging against a base URL, we append the path.
            # In order to do so we always ensure the base_url paths include the trailing '/',
            # and always strip any leading '/' from the merge URL.
            merged_raw_path = self.base_url.raw_path + merge_url.raw_path.lstrip(b"/")
            return self.base_url.copy_with(raw_path=merged_raw_path)
        return merge_url

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working user-experience Ensuring that users have a good experience using the library
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants