SOCKS support #478

Closed
wants to merge 12 commits into
from

Projects

None yet

10 participants

@foxx
foxx commented Mar 9, 2012

Hi all,

I've now managed to add proper SOCKS proxy and HTTPS/SSL proxy support (took me about 6 hours, but was well worth the effort)

  • SOCKS proxy support is done via a slightly modified SocksIpy.
  • All unit tests seem to pass fine (have included tests/test_proxies.py as it's own file, as the test will only pass if you have the necessary proxy servers running - obv.)
  • proxiesDict should be backwards compatible (have tested it with multiple values) - now accepts 'socks4://' and 'socks5'://
  • HTTPS/SSL proxy support uses the _tunnel() method exposed from httplib
  • HTTPS/SSL proxy support forcibly disables SSL certificate validation
  • Has been tested extensively against 'Burp Proxy' and 'dante-server'
  • DNS lookups are still performed by native socket()

This code is going to be used extensively by one of our clients, so if we find any problems I'll let you know.

Hope this helps!

Cal


Example usage:


# SOCKS5 proxy for HTTP/HTTPS
proxiesDict = {
    'http' : "socks5://1.2.3.4:1080",
    'https' : "socks5://1.2.3.4:1080"
}

# SOCKS4 proxy for HTTP/HTTPS
proxiesDict = {
    'http' : "socks4://1.2.3.4:1080",
    'https' : "socks4://1.2.3.4:1080"
}

# HTTP proxy for HTTP/HTTPS
proxiesDict = {
    'http' : "1.2.3.4:1080",
    'https' : "1.2.3.4:1080"
}


test_requests.py before modifications:


FAILED (failures=1, errors=14)

test_requests.py after modifications:


FAILED (failures=1, errors=14)

Here is the result of the test_proxies.py (slightly trimmed):


HTTPS via SOCKS5
------------------------------------------------------------
<br><p>Your IP address is<br><b>173.255.xxx.xxx</b></p>

HTTP via SOCKS5
------------------------------------------------------------
<br><p>Your IP address is<br><b>173.255.xxx.xxx</b></p>

HTTPS via HTTP Proxy
------------------------------------------------------------
<br><p>Your IP address is<br><b>89.238.xxx.xxx</b></p>

HTTP via HTTP Proxy
------------------------------------------------------------
<br><p>Your IP address is<br><b>89.238.xxx.xxx</b></p>

HTTPS via no proxy
------------------------------------------------------------
<br><p>Your IP address is<br><b>82.30.xxx.xxx</b></p>

HTTP via no proxy
------------------------------------------------------------
<br><p>Your IP address is<br><b>82.30.xxx.xxx</b></p>


Cal Leeming added some commits Mar 9, 2012
Cal Leeming Added proper HTTPS/SSL proxy support using _tunnel() feature of httplib
Added SOCKS4/5 proxy support using socksipy
333048d
Cal Leeming Moved test file (it's been placed into its own file because the test …
…relies on the user having the necessary proxy servers configured etc)
06f286d
Cal Leeming test needed some path hacks. 9e635be
Cal Leeming tidied up commenting slightly 7b60986
Cal Leeming fixed dodgy import c496407
Cal Leeming added myself to authors 1e4dcaa
Cal Leeming Added better unit tests
Fixed fairly dumb bug where non proxied requests were failing.. lol.
810600b
Cal Leeming fixed test_proxies.py imports 1a1e6b0
Cal Leeming fixed another retarded naming error on socks.py.. argh! cdf1d07
@Anorov
Anorov commented Mar 9, 2012

I was working on doing this, but got a bit tangled with how to integrate it with connectionpool.py. Thank you for your work, this is something that's been needed for a while.

However, big warning: You are using an old and defective version of SocksiPy! If you run your test with SOCKS servers that do not respond, the program hangs indefinitely. This is due to a bug in __recvall(). If the socket sends no data, its while loop will run infinitely, so it needs a line like "if not d: raise SomeError"

The HTTP proxy negotiator has a similar bug, but it seems you're only using the SOCKS parts and sticking with requests' native HTTP proxy support.

I recommend you look at the latest branch on my page: https://github.com/Anorov/PySocks

It's currently just a copy of the original maintainer's branch. They fixed the __recvall() bug, some other bugs, and also made it Py3k. I personally changed a few things that I have yet to commit, but most of them are just changing in the exception naming and fixing __negotiatehttp(), which is unnecessary for requests. So the current branch should be fine.

On a separate note, do persistent connections and keep-alives work properly through both SOCKS and HTTP proxies, in this fork? That is a major component that I've been fighting with when using SocksiPy in conjunction with other libraries (not requests though).

@foxx
foxx commented Mar 10, 2012

Ahh, thanks for letting me know!

Although I don't fully understand some of the bugs you have patched in socksipy, I can confirm that unit tests are still passing after applying your modifications - so I have pushed this change up.

I've tried to make sure my patches fallback to previous behaviour where ever possible, and it only uses socksipy if it absolutely needs to.

Thanks for your help on this, if you can think of any other changes that should go up please let me know and I'd be happy to assist.

Cal

@foxx
foxx commented Mar 10, 2012

Also @kennethreitz my apologies that this wasn't branched from the 'develop' branch, by the time I had realised it was too late. On the plus side, it seems master and develop were in sync at the time of forking.

@foxx
foxx commented Mar 10, 2012

@Anorov just saw your question about keep-alive's. Currently, keep-alive's are forcibly disabled for SSL over HTTP proxy, however SOCKS4/5 and HTTP over HTTP proxy keep-alive's will work without problem.

@juanriaza
Contributor

+1

Cal Leeming fixed bug where forcibly disabling verify was causing SSLError except…
…ion to be mistaken for a Timeout()
a35f325
@vly
vly commented Mar 13, 2012

@foxx
Thanks for looking into this issue.
If you have a chance, would you be able to provide an example of authenticating to the proxy in case of https over http?

Cheers

@foxx
foxx commented Mar 13, 2012

Hi @vly,

Not a problem, I'll take a look at this today and update the test file to reflect this.

@foxx
foxx commented Mar 14, 2012

@vly I can confirm that currently the code does not currently have proxy authentication support (however afaik, proxy authentication is not currently in the original code either).

It does still however work for remote authentication (i.e. using auth= to authenticate to the site).

@kennethreitz can you confirm if you would be happy to merge the code in its current state, with proxy authentication being done at a later date?

@vly
vly commented Mar 15, 2012

@foxx
Thanks for the confirmation.
Was wondering if I just didn't know the correct syntax. http via http proxy auth works fine either passed via user:pass@ or Authorization basic header item.

@foxx
foxx commented Mar 15, 2012

@vly Excellent stuff - it would be great to have proxy authentication support, however this patch was contributed as open source from work done for a clients specific requirement - which sadly doesn't require proxy authentication. I'm hoping @kennethreitz will accept the patch without, and either myself or someone else will add proxy authentication support at a later date.

@kennethreitz
Owner

This looks awesome!

However, the urllib3 changes should be happening at the shazow/urllib3 repo

@kennethreitz
Owner

@shazow any comments on this? I know there was another SOCKS implementation submitted recently, right?

@foxx
foxx commented Mar 15, 2012

@shazow Let me know if you would consider merging in my changes, and if so, then I will create a new fork of urllib3 with the necessary modifications (as the patch in its current state might not cleanly apply to the urllib3 repo, or at the very least would only have the new features usable via undocumented attributes).
---edited---

@foxx
foxx commented Mar 15, 2012

@shazow Or - you could accept the patch as 'undocumented features', then apply your own changes in the future to use them in a documented fashion (as they should apply cleanly - they just won't be accessibly without setting some undocumented attributes). Then once this has been done, Python Requests would be modified to use the documented approach.

This would mean the features will be accessible quicker, without being blocked on many other issues. I was thinking of just applying the patches myself to urllib3, but it would appear you have quite a few other issues that might block this, as well as a design decision needed, which would probably take up too much time for me to do myself.

@shazow
Collaborator
shazow commented Mar 15, 2012

@foxx Hey there! @wolever has been helping me code review and manage with this particular feature request. Could you take a look at this @wolever?

Superquick glance at it looks alright. Our long-term redesign ambitions will make this code simpler but that's a way's out so let's not wait for that. Cleanup into urllib3's codebase and some unit tests in urllib3's suite would be great. :)

@foxx
foxx commented Mar 15, 2012

@shazow Sounds good man. @wolever can you make sure you are happy with this code and approach, if so I will make the necessary fork/mods. ---edited---

@JensRantil JensRantil referenced this pull request Mar 16, 2012
Closed

SOCKS proxy support #324

@foxx
foxx commented Mar 18, 2012

@wolever Any update on your thoughts for this patch??

@wolever wolever commented on the diff Mar 18, 2012
requests/packages/urllib3/connectionpool.py
@@ -72,6 +75,37 @@
'https': HTTPS_PORT,
}
+# Wrapped HTTPConnection object
+class HTTPConnection(_HTTPConnection):
+ _tunnel_host = None
+
+ def connect(self):
+ # Adds support for SOCKS wrapping
+ if self._proxy_scheme in ('socks4', 'socks5'):
+ # define the socks proxy type
+ if self._proxy_scheme == 'socks4':
+ _type = socksipy.socks.PROXY_TYPE_SOCKS4
+ elif self._proxy_scheme == 'socks5':
+ _type = socksipy.socks.PROXY_TYPE_SOCKS5
+ else:
+ assert False, "this should NEVER happen"
@wolever
wolever Mar 18, 2012

This should probably be:

raise AssertionError("bad proxy scheme: %r" %(self._proxy_scheme, ))

Or maybe "programmer error: bad proxy scheme: %r"?

@wolever wolever commented on the diff Mar 18, 2012
requests/packages/urllib3/connectionpool.py
+ if self._proxy_scheme in ('socks4', 'socks5'):
+ # define the socks proxy type
+ if self._proxy_scheme == 'socks4':
+ _type = socksipy.socks.PROXY_TYPE_SOCKS4
+ elif self._proxy_scheme == 'socks5':
+ _type = socksipy.socks.PROXY_TYPE_SOCKS5
+ else:
+ assert False, "this should NEVER happen"
+
+ self.sock = socksipy.socks.create_connection(
+ (self.host, self.port),
+ self.timeout,
+ proxy_type = _type,
+ proxy_host = self._proxy_host,
+ proxy_port = self._proxy_port
+ )
@wolever
wolever Mar 18, 2012

Instead of using the slightly magic _ properties, what about simply adding a proxy= argument to __init__, then calling scheme, host, port = get_host(self.proxy) inside connect()?

Alternatively, I suppose HTTPConnection.__init__ could accept a sock= argument. The set_tunnel call would still need to be made somewhere… But that can be done higher up the chain.

Actually, without reading through the code to double check that it's sensible, I think passing a sock= to HTTPConnection.__init__ is the way to go. Disregard the previous comment about the proxy= argument.

@foxx
foxx Mar 19, 2012

I'll need to double check the code, but this sounds like a sane approach to me

@wolever wolever commented on the diff Mar 18, 2012
requests/packages/urllib3/connectionpool.py
+ if self._proxy_scheme in ('socks4', 'socks5'):
+ # define the socks proxy type
+ if self._proxy_scheme == 'socks4':
+ _type = socksipy.socks.PROXY_TYPE_SOCKS4
+ elif self._proxy_scheme == 'socks5':
+ _type = socksipy.socks.PROXY_TYPE_SOCKS5
+ else:
+ assert False, "this should NEVER happen"
+
+ sock = socksipy.socks.create_connection(
+ (self.host, self.port),
+ self.timeout,
+ proxy_type = _type,
+ proxy_host = self._proxy_host,
+ proxy_port = self._proxy_port
+ )
@wolever
wolever Mar 18, 2012

This should no longer be necessary if [Verified]HTTPSConnection.__init__ accepts a socket=….

@wolever
wolever commented Mar 19, 2012

After some thought, I believe this should be possible (and, in fact, desirable) to implement this proxy support entirely in urllib3 by passing the proxy_url= string directly to the HTTP[S]ConnectionPool classes. This would imply that:

  • PoolManagerManager.__init__ should also accept a proxy= keyword
  • proxy_from_url would need to have the signature proxy_from_pool(proxy_url) and it would need to return a PoolManager(proxy_url=proxy_url).
  • The ProxyManager would be deprecated, as it can only handle one specific kind of proxy (regular HTTP proxies).
  • A bunc hof code will be added to HTTPConnectionPool. In general this would be suboptimal… But there are plans to do some large-scale refactoring of urllib3 anyway, and we can make the HTTPConnectionPool pretty then :)

The possible proxy_url schemes would then be something like:

  • http[s] — for "regular" HTTP proxies
  • http[s]+connect — for CONNECT HTTP proxies (ie, shazow/urllib3#56)
  • socks{4,5} — for SOCKS4/SOCKS5 proxies

Care will need to be taken so as not to confuse the scheme of the proxy with the scheme of the connection (ex, it should be possible to make an HTTP connection over an https or https+connect proxy, or an HTTPS connection over an http or http+connect proxy)… But apart from that, I feel like this will be a straight-forward change, especially given that you've got the tricky bits implemented already.

Below is a quick sketch of what I'm thinking of for HTTPConnectionPool. It's not exhaustive by any means, but between it, your SOCKSs stuff and @senko's CONNECT proxy stuff, I think this will be the best way of supporting proxies in urllib3 for now.

class HTTPConnectionPool(RequestMethods):
    def __init__(..., proxy_url=None):
        self.proxy_scheme, self.proxy_host, self.proxy_port = get_host(proxy_url)

    def _new_conn(self):
        if self.proxy_scheme.startswith('http'):
            host, port = self.proxy_host, self.proxy_port
        else:
            host, port = self.host, self.port

        if self.proxy_scheme.startswith('socks'):
            sock = create_a_new_socks_socket()
        else:
            sock = None

        conn = HTTPConnection(host=host, port=port, sock=sock)
        if self.proxy_scheme.endswith('+connect'):
            set_tunnel = getattr(conn, 'set_tunnel', None)
            set_tunnel = set_tunnel or getattr(conn, '_set_tunnel', None)
            if set_tunnel is not None:
                set_tunnel(self.proxy_host, self.proxy_port)

    def urlopen(...):
        ...
        if self.proxy_scheme in ['http', 'https']:
            headers = dict(headers)
            headers.update({
                'Accept': '*/*',
                'Proxy-Connection': 'Keep-Alive',
            })
        ...
@wolever
wolever commented Mar 19, 2012

Oh, yes — I don't know exactly how this will interact with requests (especially the HTTPProxyAuth)… So if anything I've suggested won't play well with requests, that's likely just my ignorance.

Please let me know if there's anything else I can do to help. I'm back to normal life now, so I should be able to reply in a more timely fashion.

@foxx
foxx commented Mar 19, 2012

@wolever Thank you for the feedback on this! It all looks fine, with the slight change that connections to HTTPS over HTTP proxy should default to the 'CONNECT' method (without needing to add +connect).

I've booked out some hours for Friday to get the modifications done, and I can also make the necessary Python Requests modifications afterwards too. However - if anyone else wants to have a shot at this then please do feel free!

@Anorov
Anorov commented Mar 21, 2012

Made a few updates to https://github.com/Anorov/PySocks but they shouldn't really affect urllib3 at all. I'll continue to maintain the branch as actively as possible.

Also, as was mentioned earlier, this pull request should probably be moved to the urllib3 repo instead of the requests one.

Hope you and wolever get this integrated nicely. This will be a great addition for urllib3 and requests.

@foxx
foxx commented Mar 21, 2012

@Anorov Many thanks for the updates! On Friday (if no one else has done the patch by then) - I will get a new urllib3 fork created with the necessary changes - assuming they are accepted I will then update my branch on this pull request too.

@shazow
Collaborator
shazow commented Mar 21, 2012

You guys are awesome, urllib3 and requests is lucky to have such contributors. :)

@kennethreitz
Owner

+1000.

Sorry for the delay on this stuff — I've been super slammed since PyCon. Will review soon :)

@foxx
foxx commented Mar 25, 2012

Apologies for the slight delay on getting this patch done, will have it ready by mid week.

@kennethreitz
Owner

Looking forward to it :)

@kennethreitz
Owner

Any update?

@foxx
foxx commented Apr 23, 2012

Apologies, this kept going further and further down the todo list - you know how it is :)

I've just bumped it back up again - will get it done in the next few days.

@kennethreitz
Owner

No rush — just wanted to make sure you weren't waiting on anything from anyone.

@foxx
foxx commented Apr 27, 2012

Working on this now

@kennethreitz
Owner

I can't wait to see this! Closing for now.

@foxx
foxx commented May 10, 2012

This work has now been completed in urllib3.

Although the actual functionality is working fine, the unit tests haven't been written yet, so it may be a little while before it reaches master.

For those who can't wait, please see the following pull request:
shazow/urllib3#68

Cal

@YS-
YS- commented Sep 23, 2013

why isn't it merged into current release of requests? how can I use it?

@Lukasa
Collaborator
Lukasa commented Sep 23, 2013

@YS- This issue was originally opened more than two years ago, and it was closed more than a year ago. Please try not to resurrect old issues. =)

As to your question, we have HTTPS proxy support in the 2.0 branch of Requests, which will be pushed out as our next release. This took a while to develop, which is why it took so long. We currently do not have SOCKS proxy support, we're waiting for that code to be included in urllib3. There are plans afoot for this, but no definitive schedule at this time.

@tyll
tyll commented Oct 28, 2015

Can you please rename this issue so it does not say that SOCKS support is done since it is not there?

@Lukasa Lukasa changed the title from SSL Proxy and SOCKS proxy support now done! to SOCKS support Oct 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment