-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
urllib.request fails for proxy credentials that contain a '/' character #67517
Comments
On Python 2.7.9, if I set an https_proxy environment variable, where the password contains a '/' character, urllib2 fails. Given this test code: import os, urllib
os.environ['http_proxy'] = "http://someuser:a/b@10.11.12.13:1234"
f = urllib.urlopen('http://www.python.org')
data = f.read()
print data I expect this error message (because my sample proxy is totally bogus): [areitz@SOMEHOST ~]$ python2.7 test.py
Traceback (most recent call last):
File "test.py", line 3, in <module>
f = urllib.urlopen('http://www.python.org')
File "/usr/lib64/python2.7/urllib.py", line 87, in urlopen
return opener.open(url)
File "/usr/lib64/python2.7/urllib.py", line 213, in open
return getattr(self, name)(url)
File "/usr/lib64/python2.7/urllib.py", line 350, in open_http
h.endheaders(data)
File "/usr/lib64/python2.7/httplib.py", line 997, in endheaders
self._send_output(message_body)
File "/usr/lib64/python2.7/httplib.py", line 850, in _send_output
self.send(msg)
File "/usr/lib64/python2.7/httplib.py", line 812, in send
self.connect()
File "/usr/lib64/python2.7/httplib.py", line 793, in connect
self.timeout, self.source_address)
File "/usr/lib64/python2.7/socket.py", line 571, in create_connection
raise err
IOError: [Errno socket error] [Errno 101] Network is unreachable Instead, I receive this error: [areitz@SOMEHOST ~]$ python2.7 test.py
Traceback (most recent call last):
File "test.py", line 3, in <module>
f = urllib.urlopen('http://www.python.org')
File "/usr/lib64/python2.7/urllib.py", line 87, in urlopen
return opener.open(url)
File "/usr/lib64/python2.7/urllib.py", line 213, in open
return getattr(self, name)(url)
File "/usr/lib64/python2.7/urllib.py", line 339, in open_http
h = httplib.HTTP(host)
File "/usr/lib64/python2.7/httplib.py", line 1107, in __init__
self._setup(self._connection_class(host, port, strict))
File "/usr/lib64/python2.7/httplib.py", line 712, in __init__
(self.host, self.port) = self._get_hostport(host, port)
File "/usr/lib64/python2.7/httplib.py", line 754, in _get_hostport
raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
httplib.InvalidURL: nonnumeric port: 'a' Note that from the error, it seems as if urllib2 is incorrectly parsing the password from the proxy URL. When trying this with curl 7.19.7, I see the proper behavior (the correct password is parsed from the proxy URL). |
Sorry, went a bit too quickly -- here is the sample code that I meant to use: import os, urllib2
os.environ['http_proxy'] = "http://someuser:a/b@10.11.12.13:1234"
f = urllib2.urlopen('http://www.python.org')
data = f.read()
print data And the stack trace that I receive: Traceback (most recent call last):
File "test.py", line 3, in <module>
f = urllib2.urlopen('http://www.python.org')
File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib64/python2.7/urllib2.py", line 431, in open
response = self._open(req, data)
File "/usr/lib64/python2.7/urllib2.py", line 449, in _open
'_open', req)
File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/usr/lib64/python2.7/urllib2.py", line 1227, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib64/python2.7/urllib2.py", line 1166, in do_open
h = http_class(host, timeout=req.timeout, **http_conn_args)
File "/usr/lib64/python2.7/httplib.py", line 712, in __init__
(self.host, self.port) = self._get_hostport(host, port)
File "/usr/lib64/python2.7/httplib.py", line 754, in _get_hostport
raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
httplib.InvalidURL: nonnumeric port: 'a' It actually looks the same -- so I suppose this issue affects both urllib and urllib2. |
Yup, can confirm that this is problem. As Andy recognized, there is parsing error that fails on '/' character in the password. The environ based proxies are used by urllib rather than urllib2. (The test case if relies on environ proxy, should use urllib.urlopen()), but the failure is coming from parsing done in httplib, so it affects both urllib and urllib2. |
Related: bpo-18140. The slash character is meant to be a reserved character in URLs, so why hasn’t it been encoded? Where does the environment variable come from? |
The proxy credentials are supplied by our sysadmin. My understanding is that the http_proxy env variable doesn't require URI encoding. In addition, the same credentials work fine with curl. |
The relevant code looks like it is _parse_proxy() at Lib/urllib/request.py:693. It has custom code to search for a slash (/), so it wouldn’t be hard to make it search after the last at (@) symbol. (I previously assumed it would use urlsplit() or similar, which would be harder to adjust.) Even Curl seems to require an @ symbol in the username or password to be encoded, i.e. the following doesn’t work, so you still need to encode the fields in general to work with Curl. http_proxy=http://a@x:b@localhost curl . . .
http_proxy=http://a:b@x@localhost curl . . . |
RFC3986 seems to state that a '/' character should be encoded: """... |
Sure, but the question is who should do the encoding -- the user, or python? I think it would be better for python to read the password from the environment variable, and encode it before using it. I think this is what users expect. |
To comply with the RFC on URLs, whoever is setting the environment variable _should_ do the encoding, and then Python will _decode_ it. But I suspect this case is more about how Python should handle an environment variable that hasn’t been encoded correctly. |
In the initial report, I thought, it was mentioned that curl reads the same http_proxy variable properly. It will be good to have a correct curl test case to ascertain that. But, at all the places, where @ character is allowed in urls (netrc, git configs, I see that @ should be encoded). In that case, this bug report is more towards detecting bad urls and presenting a better error message. |
This should demonstrate that Curl does parse literal slashes in the username and password fields: $ http_proxy=http://user/name:pass/word@localhost:22 curl -v http://example.net/
* Trying ::1...
* Connected to localhost (::1) port 22 (#0)
* Proxy auth using Basic with user 'user/name'
> GET http://example.net/ HTTP/1.1
> Proxy-Authorization: Basic dXNlci9uYW1lOnBhc3Mvd29yZA==
> User-Agent: curl/7.40.0
> Host: example.net
> Accept: */*
> Connection: TE
> TE: gzip
> Proxy-Connection: Keep-Alive
>
SSH-2.0-OpenSSH_6.2
Protocol mismatch.
* Recv failure: Connection reset by peer
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer
[Exit 56]
$ base64 -d <<< dXNlci9uYW1lOnBhc3Mvd29yZA==
user/name:pass/word$ |
#23973 will resolve this issue. The issue was localized to _parse_proxy method in urllib2. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: