Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we support bare hostnames in ALL_PROXY,HTTP_PROXY, HTTPS_PROXY? #1082

Closed
2 tasks done
hjlarry opened this issue Jul 24, 2020 · 9 comments · Fixed by #1120
Closed
2 tasks done

Should we support bare hostnames in ALL_PROXY,HTTP_PROXY, HTTPS_PROXY? #1082

hjlarry opened this issue Jul 24, 2020 · 9 comments · Fixed by #1120
Labels
discussion proxies Issues related to HTTP and SOCKS proxies user-experience Ensuring that users have a good experience using the library

Comments

@hjlarry
Copy link

hjlarry commented Jul 24, 2020

Checklist

  • The bug is reproducible against the latest release and/or master.
  • There are no similar issues or pull requests to fix it yet.

Describe the bug

I am a new user for httpx, and I just install it, then try to run it in python REPL.
I type httpx.get('https://www.github.com') and then got the "No scheme included in URL".
So I try to debug it, and finally I found that because I set the export ALL_PROXY=127.0.0.1:7890 in the shell, the proxy does not have a scheme.
This proxy set is correct for when I use curl、brew and many other programe, So it should also work fun in httpx?
for any other reason httpx must have the scheme, I think it should prompt "a proxy set error" at least.

Debugging material

~/projects » python3                                                                                                       hejl@hejldeIMAC
Python 3.8.4 (default, Jul 14 2020, 02:58:48) 
[Clang 11.0.3 (clang-1103.0.32.62)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import httpx
>>> httpx.get("https://www.github.com")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/site-packages/httpx/_api.py", line 159, in get
    return request(
  File "/usr/local/lib/python3.8/site-packages/httpx/_api.py", line 83, in request
    with Client(
  File "/usr/local/lib/python3.8/site-packages/httpx/_client.py", line 467, in __init__
    proxy_map = self.get_proxy_map(proxies, trust_env)
  File "/usr/local/lib/python3.8/site-packages/httpx/_client.py", line 87, in get_proxy_map
    return {
  File "/usr/local/lib/python3.8/site-packages/httpx/_client.py", line 88, in <dictcomp>
    key: Proxy(url=url)
  File "/usr/local/lib/python3.8/site-packages/httpx/_config.py", line 335, in __init__
    url = URL(url)
  File "/usr/local/lib/python3.8/site-packages/httpx/_models.py", line 90, in __init__
    raise InvalidURL("No scheme included in URL.")
httpx._exceptions.InvalidURL: No scheme included in URL.

Environment

  • OS: MacOS
  • Python version: 3.8.4
  • HTTPX version: 0.13.3
  • Async environment: no
  • HTTP proxy: yes 127.0.0.1:7890
  • Custom certificates: no
@florimondmanca florimondmanca added proxies Issues related to HTTP and SOCKS proxies user-experience Ensuring that users have a good experience using the library labels Jul 24, 2020
@florimondmanca
Copy link
Member

Hi!

I assume cURL and other tools also allow proxy URLs with explicit schemes, that is http://....

If so then I believe switching your env var to such a form should be the way to go.

I don't think we want to allow no-scheme URLs in general, because that would require an arbitrary decision on "what is the default scheme?", and it's not clear which of HTTP or HTTPS would be best.

Allowing no scheme URLs for proxies alone brings more complexity, which is something we're trying to avoid as much as possible.

So a solution of hinting users that "yes, you should use an explicit scheme" is probably best.

Now, one thing we could do to improve the debugging experience is show the faulty URL in the error message.

(I think there's also room for improving the debugging material for "how proxies are set up", for instance to be able to trace whether env vars were used and how. But we can probably advise on this later.)

@hjlarry
Copy link
Author

hjlarry commented Jul 24, 2020

yes, other tools allow explicit schemes. and if user not add scheme in their url, they may add a default http://.

I think this is indeed a user experience problem, because the environment proxy setting may serveral days ago, user may forget it.
One solution is to special treat the environment proxy, because proxy is not only for one programe. Another solution is better hinting, improve debugging experience is a good idea.

@florimondmanca florimondmanca added the good first issue Good for newcomers label Jul 27, 2020
@j178
Copy link
Member

j178 commented Jul 29, 2020

FYI, after the drop of URL(allow_relative=bool) in #1073, proxy without explicit schema will cause ValueError: Unknown scheme for proxy URL URL('127.0.0.1:7890') instead.

@tomchristie
Copy link
Member

@j178 I wouldn't mind making sure we've got a decent UX around handling that case.
Are you seeing that when using the proxies=... style, or when using proxy environment variables, or both?
What's the traceback there?

@j178
Copy link
Member

j178 commented Jul 29, 2020

@tomchristie Both.

>>> import os
>>> os.environ['all_proxy'] = '127.0.0.1:7890'
>>> import httpx
>>> httpx.get('https://github.com')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/johnj/.pyenv/versions/general/lib/python3.7/site-packages/httpx/_api.py", line 170, in get
    trust_env=trust_env,
  File "/home/johnj/.pyenv/versions/general/lib/python3.7/site-packages/httpx/_api.py", line 84, in request
    cert=cert, verify=verify, timeout=timeout, trust_env=trust_env,
  File "/home/johnj/.pyenv/versions/general/lib/python3.7/site-packages/httpx/_client.py", line 463, in __init__
    proxy_map = self._get_proxy_map(proxies, trust_env)
  File "/home/johnj/.pyenv/versions/general/lib/python3.7/site-packages/httpx/_client.py", line 93, in _get_proxy_map
    for key, url in get_environment_proxies().items()
  File "/home/johnj/.pyenv/versions/general/lib/python3.7/site-packages/httpx/_client.py", line 93, in <dictcomp>
    for key, url in get_environment_proxies().items()
  File "/home/johnj/.pyenv/versions/general/lib/python3.7/site-packages/httpx/_config.py", line 326, in __init__
    raise ValueError(f"Unknown scheme for proxy URL {url!r}")
ValueError: Unknown scheme for proxy URL URL('127.0.0.1:7890')
>>> os.environ.pop('all_proxy')
'127.0.0.1:7890'
>>> client = httpx.Client(proxies='127.0.0.1:7890')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/johnj/.pyenv/versions/general/lib/python3.7/site-packages/httpx/_client.py", line 463, in __init__
    proxy_map = self._get_proxy_map(proxies, trust_env)
  File "/home/johnj/.pyenv/versions/general/lib/python3.7/site-packages/httpx/_client.py", line 97, in _get_proxy_map
    proxy = Proxy(url=proxies) if isinstance(proxies, (str, URL)) else proxies
  File "/home/johnj/.pyenv/versions/general/lib/python3.7/site-packages/httpx/_config.py", line 326, in __init__
    raise ValueError(f"Unknown scheme for proxy URL {url!r}")
ValueError: Unknown scheme for proxy URL URL('127.0.0.1:7890')
>>>

@tomchristie
Copy link
Member

Right, not entirely clear if we should aim to be lenient to this, or if we should just make sure the error is as clear as possible.

I'd be interested to know how requests chooses to handle this case. (Tho we might not necessarily want to do exactly the same as them here)

@tomchristie
Copy link
Member

Another useful data point here would be - what does curl do when issuing a request if ALL_PROXY includes a bare hostname?

@tomchristie tomchristie changed the title No scheme included in URL Should we support bare hostnames in ALL_PROXY,HTTP_PROXY, HTTPS_PROXY environment variables? Jul 31, 2020
@tomchristie tomchristie changed the title Should we support bare hostnames in ALL_PROXY,HTTP_PROXY, HTTPS_PROXY environment variables? Should we support bare hostnames in ALL_PROXY,HTTP_PROXY, HTTPS_PROXY? Jul 31, 2020
@tomchristie tomchristie added discussion and removed good first issue Good for newcomers labels Jul 31, 2020
@j178
Copy link
Member

j178 commented Aug 1, 2020

I did some tests:

curl works with bare hostname proxy:

# without proxy, I cannot connect to google
$ curl --connect-timeout 5 google.com
curl: (28) Connection timed out after 5000 milliseconds

# bare hostname proxy works
$ all_proxy=127.0.0.1:7890 curl --connect-timeout 5 google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>

requests works too:

$ all_proxy=127.0.0.1:7890 python -c "import requests; print(requests.get('http://google.com'))"
<Response [200]>

requests actually auto prepend a http schema for proxy if needed:
https://github.com/psf/requests/blob/2d39c0db054e158767ab4a755476844fe40787e7/requests/adapters.py#L304

@tomchristie
Copy link
Member

Okay, in that case we should follow the same behaviour.

We'll want something like...

if not url.scheme:
    url = url.copy_with(scheme="http")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion proxies Issues related to HTTP and SOCKS proxies user-experience Ensuring that users have a good experience using the library
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants