Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't use session proxy in its request for HTTPS protocol #2911

Closed
FabriceSh44 opened this issue Dec 1, 2015 · 39 comments
Closed

Can't use session proxy in its request for HTTPS protocol #2911

FabriceSh44 opened this issue Dec 1, 2015 · 39 comments

Comments

@FabriceSh44
Copy link

I've been struggling with my company proxy to make an https request.

import requests
from requests.auth import HTTPProxyAuth

proxy_string = 'http://user:password@url_proxt:port_proxy'

s = requests.Session()
s.proxies = {"http": proxy_string , "https": proxy_string}
s.auth = HTTPProxyAuth(user,password)

r = s.get('http://www.google.com') # OK
print(r.text)
r = s.get('https://www.google.com',proxies={"http": proxy_string , "https": proxy_string}) #OK
print(r.text)
r = s.get('https://www.google.com') # KO
print(r.text)

When KO, I have the following exception :

HTTPSConnectionPool(host='www.google.com', port=443): Max retries exceeded with url: / (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 407 Proxy Authentication Required',)))

I looked online but didn't find someone having this specific issue with HTTPS.

Thank you for your time

Description of issue here :
http://stackoverflow.com/questions/34025964/python-requests-api-using-proxy-for-https-request-get-407-proxy-authentication-r

@Lukasa
Copy link
Member

Lukasa commented Dec 1, 2015

Have you tried the same request without setting s.auth?

@FabriceSh44
Copy link
Author

Just did. Same result : KO

@Lukasa
Copy link
Member

Lukasa commented Dec 2, 2015

Hmm. I wonder if this is connection pooling related. @Bl4ckC4t, are you comfortable with wireshark?

@FabriceSh44
Copy link
Author

I would like to avoid that. Being in a corporate environment and not knowing Wireshark that well, I'm afraid to transmit info of my company that will most probably get me fired. Is there any other way?
Would it be possible for example to transmit by default in the code the session proxies setup like I did in example 2 to the session.get() requests ?

@Lukasa
Copy link
Member

Lukasa commented Dec 2, 2015

@Bl4ckC4t What I'm worried about is that this may be interacting with our connection re-use logic. This seems the most likely cause of the problem.

@FabriceSh44
Copy link
Author

Can you tell me what you are looking for in the wireshark capture and I will try to get it for you without the dump - if you think it's possible.

@Lukasa
Copy link
Member

Lukasa commented Dec 2, 2015

I'm interested to see, in the case of the second request, whether a new TCP connection and CONNECT request are made or whether we re-use the old one. If a new one is made, I want to see if it has the Proxy-Authorization header in place.

@FabriceSh44
Copy link
Author

On the second request, an new tcp connection is made (I guessed that because it uses another port). Inside the Hypertext Transfer Protocol, the proxy authorization header is correctly set. I looked at the third one and it doesn't contain it.

@Lukasa
Copy link
Member

Lukasa commented Dec 2, 2015

@Bl4ckC4t Does the third one use a new TCP connection, or the same one?

@FabriceSh44
Copy link
Author

New one.

@Lukasa
Copy link
Member

Lukasa commented Dec 2, 2015

Ok, so that's interesting. What version of requests are you using and where did you get it?

@FabriceSh44
Copy link
Author

requests==2.8.1 on Windows.
Don't remember installing it specifically so I guess I got it either from python 3.4 installation or with a pip install that had this module as dependency.

@Lukasa
Copy link
Member

Lukasa commented Dec 2, 2015

So, I think the issue here is that we're not correct re-applying the proxy headers in this case.

@FabriceSh44
Copy link
Author

My real issue is that I'm using a library using this mechanism and failing to request. I can't change s.get(url) to s.get(url,proxy). Do you see any workaround where I would change the session object state in order to force proxy usage at every request of this session ? If not, I will wait for the fix.

In any case, thank you very much for the time you spend on this issue.

@Lukasa
Copy link
Member

Lukasa commented Dec 2, 2015

Wait, @Bl4ckC4t, my understanding is that you are using the Session each time, we're just not correctly attaching the headers. You can actually temporarily fix this problem by not using a Session.

I'm trying to get an exact reproduction of this problem on my own system. Right now, I'm getting connections that correctly re-use the established tunnel, which is not quite right.

@FabriceSh44
Copy link
Author

I'm actually using this library : https://pypi.python.org/pypi/jira
JIRA contructor create a session and following request on its object only use get method.
I'm not able (at least I don't know how) to change the implementation, switching from Session to Request or set the proxy at each session request.

@Lukasa
Copy link
Member

Lukasa commented Dec 2, 2015

@Bl4ckC4t Can you check whether the response to the second request sends the Connection header, and if so, to what value? I'm trying to work out why the connection is going away.

@Lukasa
Copy link
Member

Lukasa commented Dec 2, 2015

So far, I'm unable to reproduce this: the new TCP connections correctly have the Proxy-Authorization header set.

@FabriceSh44
Copy link
Author

Here is the overview of the dump.

"Protocol","No.","Info"
"HTTP","106","Continuation or non-HTTP traffic"
"TCP","163","55638 > http-alt [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=8 SACK_PERM=1"
"TCP","166","55638 > http-alt [ACK] Seq=1 Ack=1 Win=64860 Len=0"
"HTTP","167","GET http://www.google.com/ HTTP/1.1 "
"TCP","172","55638 > http-alt [ACK] Seq=215 Ack=4081 Win=64860 Len=0"
"TCP","177","55638 > http-alt [ACK] Seq=215 Ack=8220 Win=64860 Len=0"
"TCP","178","53650 > http-alt [ACK] Seq=1 Ack=61 Win=64380 Len=0"
,"TCP","506","[TCP Keep-Alive] 55636 > http-alt [ACK] Seq=1 Ack=1 Win=64860 Len=1"
,"TCP","607","[TCP Keep-Alive] 55636 > http-alt [ACK] Seq=1 Ack=1 Win=64860 Len=1"
,"TCP","624","55651 > http-alt [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=8 SACK_PERM=1"
,"TCP","626","55651 > http-alt [ACK] Seq=1 Ack=1 Win=64860 Len=0"
,"HTTP","627","GET http://www.google.com/ HTTP/1.1 "
,"TCP","634","55651 > http-alt [ACK] Seq=509 Ack=5489 Win=64860 Len=0"
,"TCP","637","55651 > http-alt [ACK] Seq=509 Ack=7652 Win=64860 Len=0"
,"HTTP","798","Continuation or non-HTTP traffic"
,"HTTP","799","Continuation or non-HTTP traffic"
,"TCP","803","53650 > http-alt [ACK] Seq=296 Ack=337 Win=64104 Len=0"
,"TCP","819","[TCP Keep-Alive] 55636 > http-alt [ACK] Seq=1 Ack=1 Win=64860 Len=1"
,"TCP","1015","55653 > http-alt [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=8 SACK_PERM=1"
,"TCP","1017","55653 > http-alt [ACK] Seq=1 Ack=1 Win=64860 Len=0"
,"TCP","1018","[TCP segment of a reassembled PDU]"
,"HTTP","1021","CONNECT www.google.com:443 HTTP/1.0 "
,"TCP","1024","55653 > http-alt [ACK] Seq=40 Ack=1125 Win=63737 Len=0"
,"TCP","1372","[TCP Keep-Alive] 55636 > http-alt [ACK] Seq=1 Ack=1 Win=64860 Len=1"

Breakpoint before request 2 : packet number 178
Breakpoint before request 3 : packet number 637

No proxy auth on 1021
Proxy auth on 167 and 627

i filtered on my ip and ip dest.

Which packet number do you want me to check connection header?

@Lukasa
Copy link
Member

Lukasa commented Dec 2, 2015

Hang on, hang on.

I see two HTTP requests here, and one HTTPS, but your code above makes two HTTPS requests and one HTTP. Are you sure this behaviour is right? Did the first request get redirected? (e.g. what's the value of r.history for the first request)

@FabriceSh44
Copy link
Author

Right, sorry, I messed up my test when rebuild it (lost it by mistake), let me make you a new one.

@FabriceSh44
Copy link
Author

So new one :

"Protocol","No.","Info"
"TCP","9496","62863 > http-alt [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=8 SACK_PERM=1"
"TCP","9498","62863 > http-alt [ACK] Seq=1 Ack=1 Win=64860 Len=0"
"HTTP","9499","GET http://ctldl.windowsupdate.com/msdownload/update/v3/static/trustedr/en/disallowedcertstl.cab?248b33e67353bb0c HTTP/1.1 "
"TCP","9502","62863 > http-alt [ACK] Seq=238 Ack=1209 Win=63653 Len=0"
"TCP","9503","62863 > http-alt [FIN, ACK] Seq=238 Ack=1209 Win=63653 Len=0"
"TCP","9504","62864 > http-alt [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=8 SACK_PERM=1"
"TCP","9507","62864 > http-alt [ACK] Seq=1 Ack=1 Win=64860 Len=0"
"TCP","9508","[TCP segment of a reassembled PDU]"
"TCP","9509","[TCP segment of a reassembled PDU]"
"TCP","9512","[TCP segment of a reassembled PDU]"
"TCP","9513","[TCP segment of a reassembled PDU]"
"TCP","9514","[TCP segment of a reassembled PDU]"
"HTTP","9515","GET http://ctldl.windowsupdate.com/msdownload/update/v3/static/trustedr/en/disallowedcertstl.cab?248b33e67353bb0c HTTP/1.1 "
"TCP","9624","[TCP ACKed lost segment] 62864 > http-alt [FIN, ACK] Seq=7643 Ack=957 Win=63905 Len=0"
"TCP","16269","62865 > http-alt [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=8 SACK_PERM=1"
"TCP","16271","62865 > http-alt [ACK] Seq=1 Ack=1 Win=64860 Len=0"
"HTTP","16272","GET http://www.google.com/ HTTP/1.1 "
"TCP","16743","62865 > http-alt [ACK] Seq=215 Ack=4081 Win=64860 Len=0"
"TCP","16748","62865 > http-alt [ACK] Seq=215 Ack=8224 Win=64860 Len=0"
"TCP","52594","62866 > http-alt [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=8 SACK_PERM=1"
"TCP","52596","62866 > http-alt [ACK] Seq=1 Ack=1 Win=64860 Len=0"
"TCP","52597","[TCP segment of a reassembled PDU]"
"HTTP","52606","CONNECT www.google.com:443 HTTP/1.0 "
"TCP","53070","62866 > http-alt [ACK] Seq=89 Ack=40 Win=64821 Len=0"
"TLSv1.2","54122","Client Hello"
"TCP","54128","62866 > http-alt [ACK] Seq=606 Ack=3541 Win=64860 Len=0"
"TLSv1.2","54129","Client Key Exchange, Change Cipher Spec, Encrypted Handshake Message"
"TLSv1.2","54131","Application Data"
"TCP","54338","62866 > http-alt [ACK] Seq=1247 Ack=11763 Win=64860 Len=0"
"HTTP","66776","Continuation or non-HTTP traffic"
"TCP","82001","62868 > http-alt [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=8 SACK_PERM=1"
"TCP","82003","62868 > http-alt [ACK] Seq=1 Ack=1 Win=64860 Len=0"
"TCP","82004","[TCP segment of a reassembled PDU]"
"HTTP","82312","CONNECT www.google.com:443 HTTP/1.0 "
"TCP","82365","62868 > http-alt [ACK] Seq=40 Ack=1125 Win=63737 Len=0"

Breakpoint before request 2 : packet number 16748
Breakpoint before request 3 : packet number 54338

@Lukasa
Copy link
Member

Lukasa commented Dec 2, 2015

And, again, to clarify: the third request for google.com is the one that has no Proxy-Authorization header?

@FabriceSh44
Copy link
Author

3rd request , packet 82312 - no Proxy Authorization Header
packets 52606 and 16272 have it .

@Lukasa
Copy link
Member

Lukasa commented Dec 2, 2015

hmm. That's interesting: it doesn't look like the connection is being closed, but it's not available to the pool either.

Can you verify something for me: can you right-click on each of the new SYNs that contain requests and hit "follow TCP stream"? Just check that the Proxy-Authorization header didn't come in a later packet, because it does on my machine.

@FabriceSh44
Copy link
Author

1st SYN TCP Stream :
GET http://www.google.com/ HTTP/1.1
Host: www.google.com
Connection: keep-alive
Accept-Encoding: gzip, deflate
Accept: /
Proxy-Authorization: [Id1]
User-Agent: python-requests/2.8.1

HTTP/1.1 200 OK
Headers data

BINARY


2nd SYN TCP Stream :
CONNECT www.google.com:443 HTTP/1.0
Proxy-Authorization: [Id1]
HTTP/1.1 200 Connection established

BINARY


3rd SYN TCP Stream:
CONNECT www.google.com:443 HTTP/1.0
HTTP/1.1 407 Proxy Authentication Required

Header on authentification

HTML code showing our access denied page

@Lukasa
Copy link
Member

Lukasa commented Dec 2, 2015

Hmm. Right now I'm totally short on exactly why this is happening. The TCP stream for the second response: does it get terminated? (RST or FIN packets)

@FabriceSh44
Copy link
Author

Don't see anything like that.

@Lukasa
Copy link
Member

Lukasa commented Dec 2, 2015

So, my question is why there's a second connection at all. If the connection is still up, there's no reason for us to have thrown it away as far as I can tell. I'm extremely perplexed as to why the connection is not being re-used. With the proxy I have on my local machine (Charles Proxy), we quite happily re-use that same TCP connection, and if we don't re-use it (because it got torn down) we create a new one with the new headers.

For some insane reason, one part of requests believes that the old connection is still being used, and another part believes that it's not, and I'm not sure why yet.

I'm going to take a quick look at Python 3.4's http.client module to see if I can find anything in there that would cause this.

@Lukasa
Copy link
Member

Lukasa commented Dec 2, 2015

I cannot see anything that immediately suggests that this problem would occur. Only if a second call to set_tunnel was made, somehow losing the headers, would that occur. Unfortunately, without a reproduction scenario that I can reach it's likely to be quite tricky to trace this problem.

It would be interesting if you could confirm that we never call http.client.HTTPConnection.set_tunnel twice on a connection (e.g. by adjusting your local copy to assert that self._tunnel_host is always None when called. That would be a start.

@FabriceSh44
Copy link
Author

Set_tunnel is called for request 2 and 3 (not 1).

Call stack
set_tunnel in client line 771 Python
_prepare_proxy in connectionpool line 746 Python
urlopen in connectionpool line 554 Python
send in adapters line 370 Python
send in sessions line 576 Python
request in sessions line 468 Python
get in sessions line 480 Python

@FabriceSh44
Copy link
Author

HTTPConnection pool doesn't do anything when _prepare_proxy, keeping the proxy. HTTPSConnectionPool will call set tunnel every time.

@Lukasa
Copy link
Member

Lukasa commented Dec 2, 2015

Yeah, that's all as expected.

Can you confirm that, for the second request, set_tunnel is called with the appropriate proxy headers?

@FabriceSh44
Copy link
Author

request 2 : self.proxy_headers == {'Proxy-Authorization': [Id1]}
request 3 : self.proxy_headers == {}

@FabriceSh44
Copy link
Author

Additional info, if i comment :
#s.proxies = {"http": proxy_string , "https": proxy_string}
then request 1 is still working, maybe my proxy doesn't filter http on google

So my guess is proxy info are correctly retrieved from the session proxies but it's good when it's from session.get() argument proxies.

@FabriceSh44
Copy link
Author

I think i found the issue. It's in
merge_setting
def merge_setting(request_setting, session_setting, dict_class=OrderedDict).

In input I have request_setting and session_setting.
Session_setting has http://user:password@url_proxy:port_proxy (OK)
Request_setting has http://url_proxy:port_proxy (won't be able to authenticate, was guessed from the environment)
After :
merged_setting.update(to_key_val_list(request_setting))
it gets the request setting in the merged setting which is wrong.

@Lukasa
Copy link
Member

Lukasa commented Dec 2, 2015

In the third case there should be no request setting at all as you didn't set one. Do you have the HTTPS_PROXY environment variable set?

@FabriceSh44
Copy link
Author

I thought I did, but it's not there anymore as I restarted my workstation. In the mean time, I deactivated trust_env and it works.

@Lukasa
Copy link
Member

Lukasa commented Dec 2, 2015

Hurrah! Here we are. For the future, you can put the auth credentials in the HTTPS_PROXY environment variable, which will save you this pain.

@Lukasa Lukasa closed this as completed Dec 2, 2015
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants