Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTPS proxy support (issue #50) #139

Closed
wants to merge 58 commits into from
Closed

Conversation

stanvit
Copy link
Contributor

@stanvit stanvit commented Jan 9, 2013

This Pull Request resolves #50 adding HTTPS (CONNECT) proxy support to ProxyManager with minimal API and code changes.

A short summary of the changes:

  • ProxyManager is a subclass of PoolManager now and its logic has been changed completely, though its constructor still accepts HTTPConnectionPool as the main argument so it's backwards compatible
  • Tiny change to VerifiedHTTPSConnection.connect() that introduces CONNECT tunneling support (copied from HTTPConnection.connect() code)
  • HTTPSConnectionPool._new_conn() has been changed, but its logic for non-proxied connections stays the same
  • HTTPSConnectionPool._new_conn() actually establishes the connection for proxied requests by calling HTTPSConnection.connect(), otherwise Host: header is set improperly due to httplib logic. As stated above, the behavior for non-proxied requests hasn't changed
  • tornado-proxy has been added for tests
  • TornadoServerThread has been changed a bit - now it's possible to run multiple server threads in a single test case
  • Added proxy tests
  • Docs updated

All tests run fine with Python 2.6, 2.7 and 3.2 on Mac OS 10.7.5, except for sporadic timeouts and SSL errors that occur against the unpatched library as well (and should be fixed separately). The code doesn't introduce any new regressions.

Possible TODOs and improvements:

  • Proxy authentication (though that could be another libraries' responsibility, like requests)
  • When accepting proxy_pool as the first argument for ProxyManager, copy all pool properties and use them for other pool instances generation or even use it as the pool for HTTP proxy requests
  • Usage of HTTPConnectionPool and, especially, HTTPSConnectionPool against proxy isn't really clear and requires some external logic - much like ProxyManager.connection_from_host()

@shazow
Copy link
Member

shazow commented Jan 9, 2013

This looks very interesting, thank you!

I'm going to try and take a closer look at it over the next couple of weeks. Please bug me if I'm taking too long. Things have been busy lately. :)

@shazow
Copy link
Member

shazow commented Jan 19, 2013

@wolever is going to take a closer look at this for me. :)

@wolever
Copy link
Contributor

wolever commented Jan 19, 2013

First, as of 4a5f5f7, there is one failing test:

$ git show
commit 4a5f5f765e857e4378d1eb8678de8e4c37cb61d4
Author: Stanislav Vitkovskiy <stas.vitkovsky@gmail.com>
Date:   Wed Jan 9 12:33:47 2013 +1100

    Added old ProxyManager API compatibility
…
$ nosetests
........................F..................................................................
======================================================================
FAIL: test_set_ssl_version_to_sslv2 (test.with_dummyserver.test_https.TestHTTPS)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/wolever/code/urllib3/test/with_dummyserver/test_https.py", line 39, in test_set_ssl_version_to_sslv2
    fields={'method': 'GET'})
AssertionError: SSLError not raised
-------------------- >> begin captured logging << --------------------
urllib3.connectionpool: INFO: Starting new HTTPS connection (1): localhost
root: INFO: 200 GET /specific_method?method=GET (::1) 0.40ms
urllib3.connectionpool: DEBUG: "GET /specific_method?method=GET HTTP/1.1" 200 0
--------------------- >> end captured logging << ---------------------

Name                     Stmts   Miss  Cover   Missing
------------------------------------------------------
urllib3                     14      0   100%   
urllib3._collections        43      0   100%   
urllib3.connectionpool     182      2    99%   575-576
urllib3.contrib              0      0   100%   
urllib3.exceptions          32      0   100%   
urllib3.filepost            34      0   100%   
urllib3.poolmanager         73      1    99%   176
urllib3.request             25      0   100%   
urllib3.response            74      0   100%   
urllib3.util                86      5    94%   258-263
------------------------------------------------------
TOTAL                      563      8    99%   
----------------------------------------------------------------------
Ran 91 tests in 1.061s

FAILED (failures=1)
$ git checkout master
$ nosetests
…
Ran 71 tests in 0.775s

OK

@@ -96,6 +96,11 @@ def connect(self):
# Add certificate verification
sock = socket.create_connection((self.host, self.port), self.timeout)

if self._tunnel_host:
self.sock = sock
self._tunnel() # calls self._set_hostport(), so self.host is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move these "inline" comments above:

# calls self._set_hostport() …
# …
self._tunnel()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, sentence-case please. :)

# Calls `self._set_hostport()`, ...

@wolever
Copy link
Contributor

wolever commented Jan 20, 2013

One bug: it looks like the ProxyManager fails when no port is specified in the URL:

In [14]: proxy = urllib3.ProxyManager('http://localhost:80/')

In [15]: proxy.request("GET", "https://google.ca")
Out[15]: <urllib3.response.HTTPResponse at 0x10a3fce10>

In [16]: proxy = urllib3.ProxyManager('http://localhost/')

In [17]: proxy.request("GET", "https://google.ca")
…
MaxRetryError: HTTPSConnectionPool(host='google.ca', port=443): Max retries exceeded with url: / (Caused by <class 'socket.error'>: [Errno 61] Connection refused)

@Anorov
Copy link

Anorov commented Mar 15, 2013

Tiny bug:

This actually might actually exist in master too and might not have been introduced by this branch, but either way this would be a good place to fix it.

If you comment out import ssl at the top of connectionpool.py, connection to HTTPS websites via a proxy does not work and raises a confusing exception. Regular connections to HTTPS websites work without a proxy, even if ssl is not imported. I'm not really sure who would be using urllib3/requests without the ssl module, but since it is supported, it should be accounted for.

Easiest fix, in my opinion: change line 560 in connectionpool.py from if not HTTPSConnection or HTTPSConnection is object: to if HTTPSConnection is object or self.proxy: so the appropriate exception can be raised. The if not HTTPSConnection seems like a code relic and can be removed, I think, since HTTPSConnection will never coerce to False and would instead always be set to object.

@schlamar
Copy link
Contributor

@stanvit If you do a rebase the Travis build will be triggered.

@stanvit
Copy link
Contributor Author

stanvit commented Mar 30, 2013

@schlamar, I probably will need some assistance with rebasing, I couldn't make it properly. Every change seems to be applied twice (due the previous rebase, I guess I should have squashed them) and it will probably result in doubling all commits from https-proxy branch again.

@schlamar
Copy link
Contributor

schlamar commented Apr 1, 2013

Mhh, that is strange. Could it be that you did the first rebase against upstream/master and the second against origin/master? Usually a second rebase against master should be fine. However, I'm no git expert...

@schlamar
Copy link
Contributor

schlamar commented Apr 5, 2013

I merged against master: stanvit#5

@schlamar schlamar mentioned this pull request Apr 5, 2013
11 tasks
@schlamar
Copy link
Contributor

schlamar commented Apr 5, 2013

Rebased version is here: #170
Not sure if you can force this back into this branch, so I made a new PR. If you can clean up this PR, I will close #170.

@schlamar
Copy link
Contributor

schlamar commented Apr 5, 2013

How I did this: I cherry-picked your first rebase to the master revision at that time (4ff6c97) and then did a clean rebase against master:

git checkout -b proxy-rebase-fix 4ff6c97
git cherry-pick f640206^..74444e9
git cherry-pick 6c19926
git cherry-pick 7d2ca06^..1aab973
git cherry-pick a54f794^..6c46167
git rebase master  
# resolve conflicts
git cherry-pick 5d987ce

@shazow
Copy link
Member

shazow commented Apr 5, 2013

Thank you all for continuing to hammer this out. :)

@stanvit
Copy link
Contributor Author

stanvit commented Apr 8, 2013

@schlamar , thank you very much, I'll look at that tomorrow, I was pretty busy lately.

@schlamar
Copy link
Contributor

@shazow @wolever What needs to be done for final approval? Can't wait to see this finally in master and requests :)

@schlamar
Copy link
Contributor

@Anorov Pinging you, too. Any further suggestions? (see #170).

@Anorov
Copy link

Anorov commented Apr 13, 2013

Everything looks good, assuming all tests pass after the recent SSL changes in master. Also eagerly awaiting the approval of this, so I can make my official pull for SOCKS support on top of this.

@schlamar
Copy link
Contributor

@Anorov A clean rebased version of this PR with Travis passing lives at #170.

@williamsjj
Copy link

This would be great to get merged in if it's passing tests now.

@sigmavirus24
Copy link
Contributor

Bump

@shazow
Copy link
Member

shazow commented Jul 19, 2013

Any brave champions want to shepherd this poor issue from the wilderness?

@schlamar
Copy link
Contributor

@shazow You made a few comments at #170. Which of them have to be addressed and which one are optional? Some are really hard to deal with without response from @stanvit

@stanvit
Copy link
Contributor Author

stanvit commented Jul 19, 2013

I'm sorry I haven't responded for a long time, been really busy with other things. I'll do my best to respond to questions asked by @shazow and others in #170. I can dedicate my time on this on Monday/Tuesday.

@koobs
Copy link

koobs commented Jul 19, 2013

+1 on getting this merged as soon as possible.

@shazow What it looks like your keen contributors need is your leadership and a clear list of exactly what is left to be addressed so that everyone who's interested can swarm on getting it finished and finally start enjoying it.

Let's not let the recent activity and interest wane again. Clear goals!

GO GO GO!

@schlamar
Copy link
Contributor

@stanvit Glad to here from you again. I added you as collaborator to my urllib3 fork, so you can directly push to #170. I think it's best to focus on the new PR and close this one. I'll be on vacation till 1st August, so you can take the lead on this.

@stanvit
Copy link
Contributor Author

stanvit commented Jul 19, 2013

Thank you, @shlamar . Could you please list the goals that, in your opinion, will lead to the PR acceptance?

@schlamar
Copy link
Contributor

Added a list to #170, no guarantee for completeness though :)

@Lukasa
Copy link
Sponsor Contributor

Lukasa commented Jul 19, 2013

A recommendation: can we keep the conversation in either this PR or the rebased one in #170? The conversation is difficult to follow when it's in two places. =)

@schlamar
Copy link
Contributor

👍 This is exactly what I wanted to say in my previous comment (#139 (comment)).

@shazow
Copy link
Member

shazow commented Jul 19, 2013

Closing this PR to avoid confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HTTPS requests fail through a proxy