Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

SOCKS proxy support #284

Open
wants to merge 6 commits into from

8 participants

Anorov Andrey Petrov Cory Benfield Marc Schlaich Maurus Cuelenaere Kevin Burke Al Johri a3nm
Anorov

Here's the SOCKS proxy support patch.

Usage is identical to setting HTTP proxies: ProxyManager("socks4://localhost:1080") or ProxyManager("socks5://localhost:1080") (or proxy_from_url()). Default ports for both are 1080.

I added new SOCKS Connection classes in connection.py, and added a _is_socks attribute to the proxy URL that is passed to ProxyManager (on the Url object). The _is_socks attribute is used by the connection pools so that HTTP proxy negotiation steps aren't taken when the proxy has a SOCKS scheme. This could possibly be refactored (separate socks_proxy and http_proxy attributes?); tell me how you feel about the _is_socks solution.

I also touched up some other parts of the codebase, as follows, which I noticed while working through files and debugging.

  • Fixed inconsistent indentation in a few places.
  • Fixed some typos in comments.
  • Changed __str__ methods to __repr__. __str__ automatically defers to __repr__ if it isn't already defined, but the reverse isn't true. So this keeps the same behavior while also allowing easier debugging from the REPL.
  • Added a newline in the middle of the proxy connection exception message, to make it a little easier to read.

I believe this branch should currently work fine for regular usage; it seems to have no trouble with any SOCKS proxies.

However, I had many confusing issues and bugs while trying to modify the test suite. The tests are currently broken (I added raise SkipTest() in the proxy tests, for the time being). I am using Twisted for the SOCKS4 and SOCKS5 proxy servers, and added that to the test requirements.

The test issues seem to stem from the Tornado HTTP and HTTPS dummy servers. On my computer, they spat out odd SSL errors (like SSLError: [Errno 1] _ssl.c:504: error:1411B072:SSL routines:SSL3_GET_NEW_SESSION_TICKET:bad message type
and many others), and would hang seemingly randomly.

I upgraded the Tornado version in test-requirements.txt because the version listed before appeared to have a bug that prevented me (or others) from running in IPv6 mode, see here: tornadoweb/tornado#593; I had to upgrade to get IPv6-related tests to pass. I'm not sure if the upgrade has anything to do with the current issues.

I think it has something to do with trying to run Tornado in separate threads. I was not able to get multithreading to work with Twisted, so I spawned a new process for each SOCKS proxy server instead; Twisted seems to work fine, as I can connect to and use the proxy servers while the tests are running.

I would appreciate if someone could modify the tests so the HTTP servers work properly, or give me some advice on how to fix the issue. And of course, I'd appreciate any suggestions or changes for the SOCKS patch itself.

Andrey Petrov
Owner

Thank you for your bravery, @Anorov. :)

I guess the first thing we should sort out is the testing. After that we can tackle code design tweaks (though I don't see anything superbad).

Some concerns:

  • You mentioned tests were broken for you. Were you able to get them working on a clean master? What platform are you testing on?
  • Sad to re-introduce Twisted into our testing framework. :( Maybe we can use something like this, just to stick with Tornado? Or maybe socket-level tests are viable with something like this? Personally I lean towards socket-level tests these days because they're usually the most bare-metal to what we're testing and introduce the least interfering cruft.
  • There are a couple of small things I think we should change, like where socks_connection_from_url lives and such, but let's deal with that after. :)

Also cc other people who helped on proxy-related work, I could really use more eyes on this: @schlamar @stanvit @brendoncrawford @foxx @lukasa @sigmavirus24 @t-8ch

Anorov

I initially put socks_connection_from_url in util.py but later decided that since it would only be used by the connection classes, I might as well put it in connection.py. Open to suggestions about that, though.

And yeah, I wasn't terribly keen on using Twisted for this but I could not find any other decent pure-Python SOCKS4 and SOCKS5 modules; the few that I found had serious issues with them. I'm aware of many non-Python ones; those would probably be the easiest to use, but would add a lot of additional dependencies to the tests.

I had issues with socks5.py. shuttle seems to work alright, but it only supports SOCKS5 and I did not find a good corresponding SOCKS4 equivalent. Still looking around for one.

I am able to run the tests on a clean master. Only after my re-arranging of the tests to accomodate SOCKS proxies did I get issues; the issues were only being had with the HTTP, SOCKS4, and SOCKS5 proxy tests. I believe it has something to do with starting TornadoServerThreads. So the broken code is likely in dummyserver/testcase.py. I am running Xubuntu 64-bit, on an Intel CPU.

Cory Benfield
Collaborator

So, I can confirm the broken tests on OS X, and they're very dramatically broken. In fact, their brokenness appears to be non-deterministic, which is awesome. In three runs I got two hangs (in different tests) and one run that ran to completion but failed many tests. However, as @Anorov spotted, they all failed with SSL errors (or errors relating to SSL, or logs that indicate that HTTPS was involved).

Running each test file by itself reveals that the problem is coming from with_dummyserver/test_proxy.py (not really a shock since @Anorov already spotted that). Running just that gets way more exciting, occasionally dumping fun errors like this one:

(env)cory@corymbp:urllib3/ % ./env/bin/nosetests test/with_dummyserver/test_proxy.py
E.python2.7(5354,0x110012000) malloc: *** error for object 0x7fdf01f79790: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
[1]    5354 abort      ./env/bin/nosetests test/with_dummyserver/test_proxy.py

So I suspect, as @Anorov does, that Tornado and Twisted are getting in each other's way. I particularly wonder if they're accidentally using each others sockets. My evidence for this is that in addition to the various SSL errors that pop up I frequently see errors moaning about file descriptors, like this one, wherein Tornado attempts to stop handling a file descriptor that it was never handling to begin with:

======================================================================
FAIL: test_basic_proxy (test.with_dummyserver.test_proxy.TestHTTPProxy)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/cory/tmp/urllib3/test/with_dummyserver/test_proxy.py", line 42, in test_basic_proxy
    self.assertEqual(r.status, 200)
AssertionError: 500 != 200
-------------------- >> begin captured logging << --------------------
urllib3.connectionpool: INFO: Starting new HTTP connection (1): localhost
tornado.general: ERROR: Uncaught exception, closing connection.
Traceback (most recent call last):
  File "/Users/cory/tmp/urllib3/env/lib/python2.7/site-packages/tornado/iostream.py", line 330, in _handle_events
    self.io_loop.update_handler(self.fileno(), self._state)
  File "/Users/cory/tmp/urllib3/env/lib/python2.7/site-packages/tornado/ioloop.py", line 529, in update_handler
    self._impl.modify(fd, events | self.ERROR)
  File "/Users/cory/tmp/urllib3/env/lib/python2.7/site-packages/tornado/platform/kqueue.py", line 45, in modify
    self.unregister(fd)
  File "/Users/cory/tmp/urllib3/env/lib/python2.7/site-packages/tornado/platform/kqueue.py", line 49, in unregister
    events = self._active.pop(fd)
KeyError: 15
tornado.access: INFO: 200 GET / (127.0.0.1) 1.60ms
tornado.application: ERROR: Exception in I/O handler for fd 15
Traceback (most recent call last):
  File "/Users/cory/tmp/urllib3/env/lib/python2.7/site-packages/tornado/ioloop.py", line 672, in start
    self._handlers[fd](fd, events)
  File "/Users/cory/tmp/urllib3/env/lib/python2.7/site-packages/tornado/stack_context.py", line 331, in wrapped
    raise_exc_info(exc)
  File "/Users/cory/tmp/urllib3/env/lib/python2.7/site-packages/tornado/stack_context.py", line 302, in wrapped
    ret = fn(*args, **kwargs)
  File "/Users/cory/tmp/urllib3/env/lib/python2.7/site-packages/tornado/iostream.py", line 330, in _handle_events
    self.io_loop.update_handler(self.fileno(), self._state)
  File "/Users/cory/tmp/urllib3/env/lib/python2.7/site-packages/tornado/ioloop.py", line 529, in update_handler
    self._impl.modify(fd, events | self.ERROR)
  File "/Users/cory/tmp/urllib3/env/lib/python2.7/site-packages/tornado/platform/kqueue.py", line 45, in modify
    self.unregister(fd)
  File "/Users/cory/tmp/urllib3/env/lib/python2.7/site-packages/tornado/platform/kqueue.py", line 49, in unregister
    events = self._active.pop(fd)
KeyError: 15
tornado.application: ERROR: Uncaught exception GET http://localhost:56221/ (127.0.0.1)
HTTPRequest(protocol='http', host='localhost:56221', method='GET', uri='http://localhost:56221/', version='HTTP/1.1', remote_ip='127.0.0.1', headers={'Host': 'localhost:56221', 'Accept-Encoding': 'identity', 'Accept': '*/*'})
Traceback (most recent call last):
  File "/Users/cory/tmp/urllib3/env/lib/python2.7/site-packages/tornado/web.py", line 1115, in _stack_context_handle_exception
    raise_exc_info((type, value, traceback))
  File "/Users/cory/tmp/urllib3/env/lib/python2.7/site-packages/tornado/stack_context.py", line 302, in wrapped
    ret = fn(*args, **kwargs)
  File "/Users/cory/tmp/urllib3/dummyserver/httpproxy.py", line 53, in handle_response
    self.set_status(response.code)
  File "/Users/cory/tmp/urllib3/env/lib/python2.7/site-packages/tornado/web.py", line 284, in set_status
    raise ValueError("unknown status code %d", status_code)
ValueError: ('unknown status code %d', 599)
tornado.access: ERROR: 500 GET http://localhost:56221/ (127.0.0.1) 26.84ms
urllib3.connectionpool: DEBUG: "GET http://localhost:56221/ HTTP/1.1" 500 93
--------------------- >> end captured logging << ---------------------

Running just the proxy test and dumping the traffic with tcpdump reveals that we're not doing any SSL at all. I also don't see any traffic to or from ports 1080 or 1081, which should be where we're running the SOCKS proxies. That last bit of information seems most telling to me.

Andrey Petrov
Owner

If memory serves, we had all kinds of fun non-deterministic stuff back when we had Twisted instead of Tornado for our dummyserver. I don't think we ever quite got Twisted working properly.

Anorov

@Lukasa Yep, that's exactly the behavior I observed. I was pulling my hair out for a while. Once I started seeing malloc errors I threw in the towel.

I temporarily "commented out" (added raise SkipTest) to the SOCKS tests, which may be why you aren't seeing 1080/1081 traffic if you didn't amend that. When testing on my own, I was able to manually connect to the SOCKS servers through my browser (I added a time.sleep(9999) immediately after the proxy server was started).

Though, if you did leave those SkipTests, that would mean Twisted wasn't running at all, which would indicate the core problem is elsewhere.

I don't actually think Tornado ioloops are supposed to be kicked off in separate threads; if so, that might explain quite a bit. I observed that TornadoServerThread._start_server() is called alone in some places: I don't think this actually starts the thread. And when I used just _start_server() in the _start_http_servers() class method, the HTTP servers would hang indefinitely when any request was sent to them.

_start_server() is already called in TornadoServerThread.run(), so I changed the calls (in the proxy tests) from _start_server() to .start(), which should spawn a thread as well as start the server. And after I did that, my hanging issue disappeared but all the other indeterministic problems began. So it's all a mess at this point.

Either way, I definitely think we should try and find non-Twisted alternatives for both SOCKS servers. If worse comes to worst, I suppose one of us (maybe me) can extend shuttle (which runs on Tornado) to also do SOCKS4. The core code is already there, it's just slightly different protocol negotiation logic.

The repo is here: https://github.com/ccp0101/shuttle

In config.py, set upstream = "upstreams.local.LocalUpstream" for regular tunneling.

Marc Schlaich

I don't actually think Tornado ioloops are supposed to be kicked off in separate threads;

No, that is not true: "Atypical applications may use more than one IOLoop, such as one IOLoop per thread, or per unittest case." (http://www.tornadoweb.org/en/stable/ioloop.html)

I observed that TornadoServerThread._start_server() is called alone in some places: I don't think this actually starts the thread. And when I used just _start_server() in the _start_http_servers() class method, the HTTP servers would hang indefinitely when any request was sent to them.

The code flow in the dummyserver test cases is really strange at some places, but last time I touched the Tornado tests (#226) I was pretty sure that I fixed all this issues with Tornado. But I'll have a look again.

Cory Benfield
Collaborator

BTW, Twisted installation via pip is not officially supported: http://twistedmatrix.com/trac/wiki/FrequentlyAskedQuestions#CanIinstallTwistedusingeasy_installorpip

Well that's crap.

Marc Schlaich

Well that's crap.

No, pip install twisted fails on Windows.

Marc Schlaich

Ok, the main issue is the upgrade to Tornado 3.x and its changes to IOLoop.instance.

On master and with Tornado 2.x, we use one global IOLoop (which we get by calling IOLoop.instance) and run it in one thread, so it is intentional that we call _start_server because this will link the web application with the global IOLoop, which is later started in the proxy thread (see c6629d4).

On Tornado 3.x IOLoop.instance returns a thread specific IOLoop, so the web application started in _start_server is linked to the MainThread's IOLoop (which is never started) so there is obviously a dead lock.

If you want to make the tests compatible with Tornado 3.x you should create a global IOLoop on your own and pass them explicitly to the calls of HTTPServer (see http://www.tornadoweb.org/en/branch3.1/httpserver.html#tornado.httpserver.HTTPServer). I don't think that the upgrade to Tornado 3.x is necessary at all. But if you want to I would suggest that you make this in a separate PR.

Cory Benfield
Collaborator

@schlamar I didn't mean 'crap' as in 'wrong', I meant 'crap' as in 'tragic'. =D

Marc Schlaich

Btw, it should be possible to run multiple threads with IOLoops but there are some race conditions which are hard to track down so I would strongly suggest keeping the single thread solution.

Cory Benfield
Collaborator

Can we just use one event loop for everything? http://www.tornadoweb.org/en/latest/twisted.html

Marc Schlaich

On Tornado 3.x IOLoop.instance returns a thread specific IOLoop, so the web application started in _start_server is linked to the MainThread's IOLoop (which is never started) so there is obviously a dead lock.

Despite these changes Tornado 3.x works on master (because we run the MainThread's IOLoop in the thread). So I'm not sure what you have done wrong. Probably you didn't start any thread at all?!

Marc Schlaich

I think I'm going to refactor the dummyserver test cases slightly (on master) to make more clear what's going on.

Andrey Petrov
Owner

@schlamar's very appreciated improvements to our dummyserver have been merged.

Feel free to rebase --force this branch onto master to see if we can get some extra sanity.

Anorov

@schlamar Thank you for investigating this. The reason I upgraded Tornado to version 3 is due to this bug: tornadoweb/tornado#593

With the older version used in the test suite, any unit test that involved IPv6 would cause bind_sockets to raise getaddrinfo exceptions on my computer. Upgrading immediately fixed that problem.

Will rebase.

Maurus Cuelenaere

@Anorov any updates on this? If not, I'm willing to look into rebasing this myself (but I'd rather not do duplicate work, hence the question).

Anorov

@mcuelenaere I had a bit of trouble with merging a few things, but the rebase has been made and I'll be updating this pull in the near future.

Kevin Burke

Any update here?

Andrey Petrov
Owner

@Anorov Anything we can help with?

Anorov

Sorry, my laptop died unfortunately. Just got a new one. I have to modify things a little bit more based on master changes, but I should have everything pushed here by next week.

At this point it might be easier for me to scrap this pull and re-do my changes to master, since I have to rewrite a fair bit of the tests, then make another pull. Is it okay if I do that, or should I just rebase again?

Andrey Petrov
Owner

Up to you. :) Retaining this thread is preferred, but if it's too much of a pain then a new PR is acceptable.

You could start a new branch, make your changes, rename it to the old branch's name and force-push it. That should keep this thread while making all the changes new.

Marc Schlaich

You could start a new branch, make your changes, rename it to the old branch's name and force-push it. That should keep this thread while making all the changes new.

:+1:

(It might be even possible to force push the new branch without a local rename with git push -f origin old-branch)

Anorov

So, a few questions while I was mulling through connection.py.

VerifiedHTTPSConnection starts off with this connect method:

    try:
        sock = socket.create_connection(
            address=(self.host, self.port),
            timeout=self.timeout,
        )
    except SocketTimeout:
        raise ConnectTimeoutError(
            self, "Connection to %s timed out. (connect timeout=%s)" %
            (self.host, self.timeout))

As far as I can tell, neither HTTPConnection nor HTTPSConnection have that connect timeout wrapping.

Also, the socket.create_connection in VerifiedHTTPSConnection doesn't pass source_address here, despite the other 2 classes doing so. Both the other classes have:

    try:
        conn = socket.create_connection(
            (self.host, self.port),
            self.timeout,
            self.source_address,
        )
    except AttributeError: # Python 2.6
        conn = socket.create_connection(
            (self.host, self.port),
            self.timeout,
        )

This attribute existence check is only in VerifiedHTTPSConnection, but the same problem would affect every other Connection class:

    # the _tunnel_host attribute was added in python 2.6.3 (via
    # http://hg.python.org/cpython/rev/0f57b30a152f) so pythons 2.6(0-2) do
    # not have them.
    if getattr(self, '_tunnel_host', None):
        self.sock = sock
        # Calls self._set_hostport(), so self.host is
        # self._tunnel_host below.
        self._tunnel()

Basically, it seems like VerifiedHTTPSConnection is missing 2 fairly recent changes, and has 1 check (existence of _tunnel_host attribute) that everything else is missing. So it's kind of isolated from the rest of the file.

Should I try and remediate those 3 issues, including wrapping ConnectTimeout for all 3 classes? It would only be a few quick changes.

Anorov

Here is my new proposed version of connection.py:

https://gist.github.com/Anorov/9129376

I resolved the above issues, integrated SOCKS support, and also cleaned up some of the redundant conn, sock, and self.sock ambiguity in some parts, in favor of just modifying the instance variable each time. That last part is a bit ugly, but it accomodates _tunnel() better than the old version, I think.

I made the SOCKS modifications much simpler, without introducing any new classes; the disadvantage is that all HTTPConnection classes have a little bit of logic in their __init__ now. socks_connection_from_url was moved to util.

I also think we could simplify HTTPSConnection.__init__, see: https://gist.github.com/Anorov/9129376#file-connection-py-L137

Before I continue with the rest of the modifications and tests, I'd like comments on these changes.

Andrey Petrov
Owner

@Anorov The gist seems fine, some style comments aside. :)

Re: fixing extra things, up to you. If you'd like to fix them in a separate earlier PR, that's cool too. :)

Marc Schlaich

If you'd like to fix them in a separate earlier PR

That would be great. Or at least a separate commit :+1:

Cory Benfield Lukasa referenced this pull request in kennethreitz/requests
Closed

Adding socks5 support #1982

Al Johri

looking forward to this! :+1:

Andrey Petrov
Owner

Sorry this has stagnated over the months.

I'd still love your help @Anorov. I may have some time to invest in some urllib3 features next month, this will be near the top of my list. :)

a3nm

Hello, has any progress been made on this? Is it possible to help somehow?

Anorov

Sorry for basically going AWOL on this. I've been really busy with school and work in the past few months.

Basically I just need to hunker down, write the full test suite for this while taking into consideration some of the changes to the dummy servers, then re-apply something close to the commits I already have here.

a3nm
Nathan Van Gheem vangheem referenced this pull request from a commit in vangheem/urllib3
Nathan Van Gheem vangheem initial shot at revitalizing shazow#284 eb77363
Josh Schneier jschneier referenced this pull request from a commit in jschneier/urllib3
Nathan Van Gheem vangheem initial shot at revitalizing shazow#284 168fdc3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Nov 22, 2013
  1. Anorov

    Add SOCKS proxy support

    Anorov authored
  2. Anorov

    Update PySocks

    Anorov authored
  3. Anorov

    Fix test-requirements.txt

    Anorov authored
Commits on Nov 23, 2013
  1. Anorov

    Remove debugging lines

    Anorov authored
  2. Anorov

    Remove some old test artifacts

    Anorov authored
Commits on Nov 24, 2013
  1. Anorov

    And another

    Anorov authored
This page is out of date. Refresh to see the latest.
3  CONTRIBUTORS.txt
View
@@ -93,5 +93,8 @@ In chronological order:
* Peter Waller <p@pwaller.net>
* HTTPResponse.tell() for determining amount received over the wire
+* Pyotr Vorona <anorov.vorona@gmail.com>
+ * Added support for SOCKS proxies
+
* [Your name or handle] <[email or website]>
* [Brief summary of your changes]
2  dummyserver/proxy.py → dummyserver/httpproxy.py
View
@@ -37,7 +37,7 @@
__all__ = ['ProxyHandler', 'run_proxy']
-class ProxyHandler(tornado.web.RequestHandler):
+class HTTPProxyHandler(tornado.web.RequestHandler):
SUPPORTED_METHODS = ['GET', 'POST', 'CONNECT']
@tornado.web.asynchronous
15 dummyserver/server.py
View
@@ -9,6 +9,7 @@
import os
import sys
import threading
+import multiprocessing
import socket
from tornado import netutil
@@ -18,8 +19,7 @@
import tornado.web
from dummyserver.handlers import TestingApp
-from dummyserver.proxy import ProxyHandler
-
+from dummyserver.httpproxy import HTTPProxyHandler
log = logging.getLogger(__name__)
@@ -34,7 +34,6 @@
# Different types of servers we have:
-
class SocketServerThread(threading.Thread):
"""
:param socket_handler: Callable which receives a socket argument for one
@@ -106,10 +105,8 @@ def stop(self):
self.ioloop.add_callback(self.server.stop)
self.ioloop.add_callback(self.ioloop.stop)
-
-class ProxyServerThread(TornadoServerThread):
- app = tornado.web.Application([(r'.*', ProxyHandler)])
-
+class HTTPProxyServerThread(TornadoServerThread):
+ app = tornado.web.Application([(r'.*', HTTPProxyHandler)])
if __name__ == '__main__':
log.setLevel(logging.DEBUG)
@@ -123,6 +120,6 @@ class ProxyServerThread(TornadoServerThread):
print("Starting WSGI server at: %s" % url)
- scheme, host, port = get_host(url)
- t = TornadoServerThread(scheme=scheme, host=host, port=port)
+ scheme, host, _ = get_host(url)
+ t = TornadoServerThread(scheme=scheme, host=host)
t.start()
14 dummyserver/socks4proxy.py
View
@@ -0,0 +1,14 @@
+#!/usr/bin/env python
+from twisted.internet import reactor
+from twisted.protocols.socks import SOCKSv4Factory
+
+def run_socks4_proxy(host="127.0.0.1", port=1080):
+ reactor.listenTCP(port, SOCKSv4Factory("/dev/null"), interface=host)
+ try:
+ reactor.run()
+ except (KeyboardInterrupt, SystemExit):
+ reactor.stop()
+
+if __name__ == "__main__":
+ print("Starting SOCKS4 proxy server...")
+ run_socks4_proxy()
131 dummyserver/socks5proxy.py
View
@@ -0,0 +1,131 @@
+from twisted.internet import reactor, protocol
+import struct
+
+class remote_protocol(protocol.Protocol):
+ def connectionMade(self):
+ print 'Connection made'
+ self.socks5 = self.factory.socks5
+ # -- send success to client
+ self.socks5.send_connect_response(0)
+ self.socks5.remote = self.transport
+ self.socks5.state = 'communicate'
+ def dataReceived(self, data):
+ self.socks5.transport.write(data)
+
+class remote_factory(protocol.ClientFactory):
+ def __init__(self, socks5):
+ self.protocol = remote_protocol
+ self.socks5 = socks5
+ def clientConnectionFailed(self, connector, reason):
+ print 'failed:', reason.getErrorMessage()
+ self.socks5.send_connect_response(5)
+ self.socks5.transport.loseConnection()
+ def clientConnectionLost(self, connector, reason):
+ print 'con lost:', reason.getErrorMessage()
+ self.socks5.transport.loseConnection()
+
+class socks5_protocol(protocol.Protocol):
+ def connectionMade(self):
+ self.state = 'wait_hello'
+ def dataReceived(self, data):
+ method = getattr(self, self.state)
+ method(data)
+ #--------------------------------------------------
+ def wait_hello(self, data):
+ (ver, nmethods) = struct.unpack('!BB', data[:2])
+ print 'Got version = %x, nmethods = %x' % (ver,nmethods)
+ if ver!=5:
+ # we do SOCKS5 only
+ self.transport.loseConnection()
+ return
+ if nmethods<1:
+ # not SOCKS5 protocol?!
+ self.transport.loseConnection()
+ return
+ methods = data[2:2+nmethods]
+ for meth in methods:
+ print 'method=%x' % ord(meth)
+ if ord(meth)==0:
+ # no auth, neato, accept
+ resp = struct.pack('!BB', 5, 0)
+ self.transport.write(resp)
+ self.state = 'wait_connect'
+ return
+ if ord(meth)==255:
+ # disconnect
+ self.transport.loseConnection()
+ return
+ #-- we should have processed the request by now
+ self.transport.loseConnection()
+ #--------------------------------------------------
+ def wait_connect(self, data):
+ (ver, cmd, rsv, atyp) = struct.unpack('!BBBB', data[:4])
+ if ver!=5 or rsv!=0:
+ # protocol violation
+ self.transport.loseConnection()
+ return
+ data = data[4:]
+ if cmd==1:
+ print 'CONNECT'
+ host = None
+ if atyp==1: # IP V4
+ print 'ipv4'
+ (b1,b2,b3,b4) = struct.unpack('!BBBB', data[:4])
+ host = '%i.%i.%i.%i' % (b1,b2,b3,b4)
+ data = data[4:]
+ elif atyp==3: # domainname
+ print 'domain'
+ l = struct.unpack('!B', data[:1])
+ host = data[1:1+l]
+ data = data[1+l:]
+ elif atyp==4: # IP V6
+ print 'ipv6'
+ else:
+ # protocol violation
+ self.transport.loseConnection()
+ return
+ (port) = struct.unpack('!H', data[:2])
+ port=port[0]
+ data = data[2:]
+ print '* connecting %s:%d' % (host,port)
+ return self.perform_connect(host, port)
+ elif cmd==2:
+ print 'BIND'
+ elif cmd==3:
+ print 'UDP ASSOCIATE'
+ #-- we should have processed the request by now
+ self.transport.loseConnection()
+ #--------------------------------------------------
+ def send_connect_response(self, code):
+ try:
+ myname = self.transport.getHost().host
+ except:
+ # this might fail as no longer a socket
+ # is present
+ self.transport.loseConnection()
+ return
+ ip = [int(i) for i in myname.split('.')]
+ resp = struct.pack('!BBBB', 5, code, 0, 1 )
+ resp += struct.pack('!BBBB', ip[0], ip[1], ip[2], ip[3])
+ resp += struct.pack('!H', self.transport.getHost().port)
+ self.transport.write(resp)
+
+ def perform_connect(self, host, port):
+ factory = remote_factory(self)
+ reactor.connectTCP(host, port, factory)
+ #--------------------------------------------------
+ def communicate(self, data):
+ self.remote.write(data)
+
+
+def run_socks5_proxy(host="127.0.0.1", port=1081):
+ factory = protocol.ServerFactory()
+ factory.protocol = socks5_protocol
+ reactor.listenTCP(port, factory, interface=host)
+ try:
+ reactor.run()
+ except (KeyboardInterrupt, SystemExit):
+ reactor.stop()
+
+if __name__ == '__main__':
+ run_socks5_proxy()
80 dummyserver/testcase.py
View
@@ -1,14 +1,21 @@
import unittest
import socket
import threading
+import multiprocessing
+import time
from nose.plugins.skip import SkipTest
from dummyserver.server import (
- TornadoServerThread, SocketServerThread,
- DEFAULT_CERTS,
- ProxyServerThread,
+ TornadoServerThread,
+ SocketServerThread,
+ HTTPProxyServerThread,
+ DEFAULT_CERTS
)
+from dummyserver.httpproxy import HTTPProxyHandler
+from dummyserver.socks4proxy import run_socks4_proxy
+from dummyserver.socks5proxy import run_socks5_proxy
+
has_ipv6 = hasattr(socket, 'has_ipv6')
class SocketDummyServerTestCase(unittest.TestCase):
@@ -72,8 +79,7 @@ class HTTPSDummyServerTestCase(HTTPDummyServerTestCase):
certs = DEFAULT_CERTS
-class HTTPDummyProxyTestCase(unittest.TestCase):
-
+class DummyProxyTestCase(unittest.TestCase):
http_host = 'localhost'
http_host_alt = '127.0.0.1'
@@ -85,30 +91,80 @@ class HTTPDummyProxyTestCase(unittest.TestCase):
proxy_host_alt = '127.0.0.1'
@classmethod
- def setUpClass(cls):
- cls.http_thread = TornadoServerThread(host=cls.http_host,
- scheme='http')
- cls.http_thread._start_server()
+ def _start_http_servers(cls):
+ ready_event = threading.Event()
+ cls.http_thread = TornadoServerThread(
+ host=cls.http_host, scheme='http',
+ ready_event=ready_event)
+ cls.http_thread.start()
+ ready_event.wait()
cls.http_port = cls.http_thread.port
+ ready_event = threading.Event()
cls.https_thread = TornadoServerThread(
- host=cls.https_host, scheme='https', certs=cls.https_certs)
- cls.https_thread._start_server()
+ host=cls.https_host, scheme='https',
+ certs=cls.https_certs,
+ ready_event=ready_event)
+ cls.https_thread.start()
+ ready_event.wait()
cls.https_port = cls.https_thread.port
+ @classmethod
+ def _stop_http_servers(cls):
+ cls.http_thread.stop()
+ cls.https_thread.stop()
+
+class DummyHTTPProxyTestCase(DummyProxyTestCase):
+ @classmethod
+ def setUpClass(cls):
+ #raise SkipTest()
+ cls._start_http_servers()
ready_event = threading.Event()
- cls.proxy_thread = ProxyServerThread(
+ cls.proxy_thread = HTTPProxyServerThread(
host=cls.proxy_host, ready_event=ready_event)
cls.proxy_thread.start()
ready_event.wait()
cls.proxy_port = cls.proxy_thread.port
+
@classmethod
def tearDownClass(cls):
+ cls._stop_http_servers()
cls.proxy_thread.stop()
cls.proxy_thread.join()
+class DummySOCKS4ProxyTestCase(DummyProxyTestCase):
+ proxy_port = 1080
+
+ @classmethod
+ def setUpClass(cls):
+ raise SkipTest()
+ cls._start_http_servers()
+ # Twisted doesn't play along well with multithreading
+ cls.proxy_process = multiprocessing.Process(target=run_socks4_proxy, args=(cls.proxy_host, cls.proxy_port))
+ cls.proxy_process.start()
+ time.sleep(2)
+
+ @classmethod
+ def tearDownClass(cls):
+ cls._stop_http_servers()
+ cls.proxy_process.terminate()
+
+class DummySOCKS5ProxyTestCase(DummyProxyTestCase):
+ proxy_port = 1081
+
+ @classmethod
+ def setUpClass(cls):
+ raise SkipTest()
+ cls._start_http_servers()
+ cls.proxy_process = multiprocessing.Process(target=run_socks5_proxy, args=(cls.proxy_host, cls.proxy_port))
+ cls.proxy_process.start()
+
+ @classmethod
+ def tearDownClass(cls):
+ cls.proxy_process.terminate()
+
class IPv6HTTPDummyServerTestCase(HTTPDummyServerTestCase):
host = '::1'
3  test-requirements.txt
View
@@ -1,4 +1,5 @@
nose==1.3
mock==1.0.1
-tornado==2.4.1
+tornado==3.1.1
coverage==3.6
+twisted==13.2
10 test/test_proxymanager.py
View
@@ -1,9 +1,9 @@
import unittest
from urllib3.poolmanager import ProxyManager
+from urllib3.exceptions import ProxyError
-
-class TestProxyManager(unittest.TestCase):
+class TestProxyManagerParsing(unittest.TestCase):
def test_proxy_headers(self):
p = ProxyManager('http://something:1234')
url = 'http://pypi.python.org/test'
@@ -38,6 +38,12 @@ def test_default_port(self):
p = ProxyManager('https://something')
self.assertEqual(p.proxy.port, 443)
+ def test_proxy_scheme(self):
+ for t in ("http", "https", "socks4", "socks5"):
+ url = "%s://localhost" % t
+ self.assertIsNotNone(ProxyManager(url))
+ self.assertRaises(ProxyError, ProxyManager, "invalid://localhost")
+
if __name__ == '__main__':
unittest.main()
81 test/with_dummyserver/test_proxy_poolmanager.py → test/with_dummyserver/test_proxy.py
View
@@ -2,15 +2,30 @@
import json
import socket
-from dummyserver.testcase import HTTPDummyProxyTestCase
+from dummyserver.testcase import (
+ DummyProxyTestCase,
+ DummyHTTPProxyTestCase,
+ DummySOCKS4ProxyTestCase,
+ DummySOCKS5ProxyTestCase
+)
from dummyserver.server import DEFAULT_CA, DEFAULT_CA_BAD
from urllib3.poolmanager import proxy_from_url, ProxyManager
from urllib3.exceptions import MaxRetryError, SSLError, ProxyError
-from urllib3.connectionpool import connection_from_url, VerifiedHTTPSConnection
-
-
-class TestHTTPProxyManager(HTTPDummyProxyTestCase):
+from urllib3.connectionpool import (
+ connection_from_url,
+ VerifiedHTTPSConnection,
+ HTTPConnectionPool,
+ HTTPSConnectionPool
+)
+from urllib3.connection import (
+ socks_connection_from_url,
+ SOCKSHTTPConnection,
+ SOCKSHTTPSConnection
+)
+from urllib3.util import parse_url
+
+class ProxyManagerTester(object):
def setUp(self):
self.http_url = 'http://%s:%d' % (self.http_host, self.http_port)
@@ -19,15 +34,14 @@ def setUp(self):
self.https_url = 'https://%s:%d' % (self.https_host, self.https_port)
self.https_url_alt = 'https://%s:%d' % (self.https_host_alt,
self.https_port)
- self.proxy_url = 'http://%s:%d' % (self.proxy_host, self.proxy_port)
+ self.proxy_url = '%s://%s:%d' % (self.proxy_type, self.proxy_host, self.proxy_port)
def test_basic_proxy(self):
http = proxy_from_url(self.proxy_url)
-
- r = http.request('GET', '%s/' % self.http_url)
+ r = http.request('GET', '%s/' % self.http_url, timeout=1)
self.assertEqual(r.status, 200)
- r = http.request('GET', '%s/' % self.https_url)
+ r = http.request('GET', '%s/' % self.https_url, timeout=1)
self.assertEqual(r.status, 200)
def test_proxy_conn_fail(self):
@@ -45,7 +59,6 @@ def test_proxy_conn_fail(self):
def test_oldapi(self):
http = ProxyManager(connection_from_url(self.proxy_url))
-
r = http.request('GET', '%s/' % self.http_url)
self.assertEqual(r.status, 200)
@@ -71,7 +84,8 @@ def test_proxy_verified(self):
self.https_port)
conn = https_pool._new_conn()
- self.assertEqual(conn.__class__, VerifiedHTTPSConnection)
+ # Would use assertIsInstance, but only in 2.7+
+ self.assertTrue(isinstance(conn, VerifiedHTTPSConnection))
https_pool.request('GET', '/') # Should succeed without exceptions.
http = proxy_from_url(self.proxy_url, cert_reqs='REQUIRED',
@@ -245,6 +259,51 @@ def test_proxy_pooling_ext(self):
self.assertNotEqual(sc2,sc3)
self.assertEqual(sc3,sc4)
+class TestHTTPProxy(ProxyManagerTester, DummyHTTPProxyTestCase):
+ proxy_type = 'http'
+
+class SOCKSProxyTester(ProxyManagerTester):
+
+ def test_connection_from_url(self):
+ conn = socks_connection_from_url(parse_url(self.proxy_url), (self.http_host, self.http_port))
+ self.assertEqual(conn.get_proxy_sockname()[0], "127.0.0.1")
+
+ def test_socks_http_connection(self):
+ conn = SOCKSHTTPConnection(parse_url(self.proxy_url), self.http_host, self.http_port)
+ self.assertEqual(conn.proxy, parse_url(self.proxy_url))
+ conn.connect()
+ self.assertEqual(conn.sock.get_proxy_sockname()[0], "127.0.0.1")
+
+ def test_socks_https_connection(self):
+ conn = SOCKSHTTPSConnection(parse_url(self.proxy_url), self.https_host, self.https_port)
+ self.assertEqual(conn.proxy, parse_url(self.proxy_url))
+ conn.connect()
+ self.assertEqual(conn.sock.get_proxy_sockname()[0], "127.0.0.1")
+
+ def test_socks_http_connection_pool(self):
+ pool = HTTPConnectionPool(_proxy=parse_url(self.proxy_url))
+ self.assertEqual(pool.ConnectionCls, SOCKSHTTPConnection)
+
+ def test_socks_https_connection_pool(self):
+ pool = HTTPSConnectionPool(_proxy=parse_url(self.proxy_url))
+ self.assertEqual(pool.ConnectionCls, SOCKSHTTPSConnection)
+
+ def test_make_socket(self):
+ conn = SOCKSHTTPSConnection(parse_url(self.proxy_url), self.https_host, self.https_port)
+ sock = conn._make_socket()
+ self.assertEqual(sock.get_proxy_sockname()[0], "127.0.0.1")
+
+ def test_headers(self):
+ pass
+
+
+class TestSOCKS4Proxy(SOCKSProxyTester, DummySOCKS4ProxyTestCase):
+ proxy_type = 'socks4'
+
+
+class TestSOCKS5Proxy(SOCKSProxyTester, DummySOCKS5ProxyTestCase):
+ proxy_type = 'socks5'
+
if __name__ == '__main__':
unittest.main()
72 urllib3/connection.py
View
@@ -38,6 +38,7 @@ class BaseSSLError(BaseException):
ConnectTimeoutError,
)
from .packages.ssl_match_hostname import match_hostname
+from .packages import socks
from .util import (
assert_fingerprint,
resolve_cert_reqs,
@@ -45,11 +46,55 @@ class BaseSSLError(BaseException):
ssl_wrap_socket,
)
+def socks_connection_from_url(proxy_url, address, timeout=None):
+ """
+ Convenience function for connecting to a SOCKS proxy, tunneling
+ to the destination, and returning the connected socket object
+ based on a parsed proxy URL and a destination host and port.
+ """
+ proxy = proxy_url
+ proxy_type = socks.SOCKS5 if proxy.scheme == "socks5" else socks.SOCKS4
+
+ username = password = None
+ if proxy_type == socks.SOCKS5 and proxy.auth is not None:
+ username, password = proxy.auth.split(":")
+
+ return socks.create_connection(dest_pair=address,
+ proxy_type=proxy_type, proxy_addr=proxy.host,
+ proxy_port=proxy.port, proxy_username=username,
+ proxy_password=password, timeout=timeout)
+
+class SOCKSHTTPConnection(HTTPConnection):
+ """
+ An HTTPConnection that tunnels through a SOCKS proxy.
+ """
+
+ def __init__(self, proxy, *args, **kwargs):
+ # A SOCKS proxy parsed as a Url object must be passed in
+ self.proxy = proxy
+ HTTPConnection.__init__(self, *args, **kwargs)
+
+ def connect(self):
+ self.sock = socks_connection_from_url(proxy_url=self.proxy,
+ address=(self.host, self.port),
+ timeout=self.timeout)
+
+class SOCKSHTTPSConnection(HTTPSConnection):
+ def __init__(self, proxy, *args, **kwargs):
+ self.proxy = proxy
+ HTTPSConnection.__init__(self, *args, **kwargs)
+
+ def connect(self):
+ sock = socks_connection_from_url(self.proxy, self.host,
+ self.port, self.timeout)
+ self.sock = ssl.wrap_socket(sock, self.key_file, self.cert_file)
+
class VerifiedHTTPSConnection(HTTPSConnection):
"""
Based on httplib.HTTPSConnection but wraps the socket with
SSL certification.
"""
+
cert_reqs = None
ca_certs = None
ssl_version = None
@@ -65,17 +110,19 @@ def set_cert(self, key_file=None, cert_file=None,
self.assert_hostname = assert_hostname
self.assert_fingerprint = assert_fingerprint
+
+ def _make_socket(self):
+ return socket.create_connection(address=(self.host, self.port),
+ timeout=self.timeout)
+
def connect(self):
# Add certificate verification
try:
- sock = socket.create_connection(
- address=(self.host, self.port),
- timeout=self.timeout,
- )
+ sock = self._make_socket()
except SocketTimeout:
- raise ConnectTimeoutError(
- self, "Connection to %s timed out. (connect timeout=%s)" %
- (self.host, self.timeout))
+ raise ConnectTimeoutError(
+ self, "Connection to %s timed out. (connect timeout=%s)" %
+ (self.host, self.timeout))
resolved_cert_reqs = resolve_cert_reqs(self.cert_reqs)
resolved_ssl_version = resolve_ssl_version(self.ssl_version)
@@ -103,5 +150,16 @@ def connect(self):
self.assert_hostname or self.host)
+class SOCKSVerifiedHTTPSConnection(VerifiedHTTPSConnection):
+ def __init__(self, proxy, *args, **kwargs):
+ self.proxy = proxy
+ VerifiedHTTPSConnection.__init__(self, *args, **kwargs)
+
+ def _make_socket(self):
+ return socks_connection_from_url(proxy_url=self.proxy,
+ address=(self.host, self.port),
+ timeout=self.timeout)
+
if ssl:
HTTPSConnection = VerifiedHTTPSConnection
+ SOCKSHTTPSConnection = SOCKSVerifiedHTTPSConnection
58 urllib3/connectionpool.py
View
@@ -30,9 +30,13 @@
)
from .packages.ssl_match_hostname import CertificateError
from .packages import six
+from .packages import socks
from .connection import (
DummyConnection,
- HTTPConnection, HTTPSConnection, VerifiedHTTPSConnection,
+ HTTPConnection, HTTPSConnection,
+ VerifiedHTTPSConnection,
+ SOCKSHTTPConnection,
+ SOCKSHTTPSConnection,
HTTPException, BaseSSLError,
)
from .request import RequestMethods
@@ -54,9 +58,10 @@
port_by_scheme = {
'http': 80,
'https': 443,
+ 'socks4': 1080,
+ 'socks5': 1080,
}
-
## Pool objects
class ConnectionPool(object):
@@ -75,7 +80,7 @@ def __init__(self, host, port=None):
self.host = host
self.port = port
- def __str__(self):
+ def __repr__(self):
return '%s(host=%r, port=%r)' % (type(self).__name__,
self.host, self.port)
@@ -128,11 +133,11 @@ class HTTPConnectionPool(ConnectionPool, RequestMethods):
:param _proxy:
Parsed proxy URL, should not be used directly, instead, see
- :class:`urllib3.connectionpool.ProxyManager`"
+ :class:`urllib3.poolmanager.ProxyManager`"
:param _proxy_headers:
A dictionary with proxy headers, should not be used directly,
- instead, see :class:`urllib3.connectionpool.ProxyManager`"
+ instead, see :class:`urllib3.poolmanager.ProxyManager`"
"""
scheme = 'http'
@@ -156,9 +161,13 @@ def __init__(self, host, port=None, strict=False,
self.pool = self.QueueCls(maxsize)
self.block = block
+ if _proxy is not None and _proxy._is_socks:
+ self.ConnectionCls = SOCKSHTTPConnection
+
self.proxy = _proxy
self.proxy_headers = _proxy_headers or {}
+
# Fill the queue up so that doing get() on it will block properly
for _ in xrange(maxsize):
self.pool.put(None)
@@ -169,7 +178,7 @@ def __init__(self, host, port=None, strict=False,
def _new_conn(self):
"""
- Return a fresh :class:`httplib.HTTPConnection`.
+ Return a fresh :class:`httplib.HTTPConnection`, or subclass, instance.
"""
self.num_connections += 1
log.info("Starting new HTTP connection (%d): %s" %
@@ -178,6 +187,10 @@ def _new_conn(self):
extra_params = {}
if not six.PY3: # Python 2
extra_params['strict'] = self.strict
+
+ if self.proxy is not None and self.proxy._is_socks:
+ # SOCKSHTTPConnection takes one extra parameter: the proxy URL
+ extra_params['proxy'] = self.proxy
return self.ConnectionCls(host=self.host, port=self.port,
timeout=self.timeout.connect_timeout,
@@ -246,7 +259,9 @@ def _put_conn(self, conn):
conn.close()
def _get_timeout(self, timeout):
- """ Helper that always returns a :class:`urllib3.util.Timeout` """
+ """
+ Helper that always returns a :class:`urllib3.util.Timeout`
+ """
if timeout is _Default:
return self.timeout.clone()
@@ -516,8 +531,8 @@ def urlopen(self, method, url, body=None, headers=None, retries=3,
except (HTTPException, SocketError) as e:
if isinstance(e, SocketError) and self.proxy is not None:
- raise ProxyError('Cannot connect to proxy. '
- 'Socket error: %s.' % e)
+ raise ProxyError('Cannot connect to proxy.\n'
+ 'Socket error: %s' % e)
# Connection broken, discard. It will be replaced next _get_conn().
conn = None
@@ -596,12 +611,14 @@ def __init__(self, host, port=None,
self.assert_hostname = assert_hostname
self.assert_fingerprint = assert_fingerprint
+ if _proxy is not None and _proxy._is_socks:
+ self.ConnectionCls = SOCKSHTTPSConnection
+
def _prepare_conn(self, conn):
"""
Prepare the ``connection`` for :meth:`urllib3.util.ssl_wrap_socket`
and establish the tunnel if proxy is used.
"""
-
if isinstance(conn, VerifiedHTTPSConnection):
conn.set_cert(key_file=self.key_file,
cert_file=self.cert_file,
@@ -611,7 +628,7 @@ def _prepare_conn(self, conn):
assert_fingerprint=self.assert_fingerprint)
conn.ssl_version = self.ssl_version
- if self.proxy is not None:
+ if self.proxy is not None and not self.proxy._is_socks:
# Python 2.7+
try:
set_tunnel = conn.set_tunnel
@@ -626,7 +643,7 @@ def _prepare_conn(self, conn):
def _new_conn(self):
"""
- Return a fresh :class:`httplib.HTTPSConnection`.
+ Return a fresh :class:`httplib.HTTPSConnection`, or subclass, instance.
"""
self.num_connections += 1
log.info("Starting new HTTPS connection (%d): %s"
@@ -637,16 +654,21 @@ def _new_conn(self):
raise SSLError("Can't connect to HTTPS URL because the SSL "
"module is not available.")
- actual_host = self.host
- actual_port = self.port
- if self.proxy is not None:
- actual_host = self.proxy.host
- actual_port = self.proxy.port
-
extra_params = {}
if not six.PY3: # Python 2
extra_params['strict'] = self.strict
+ actual_host = self.host
+ actual_port = self.port
+
+ if self.proxy is not None:
+ if self.proxy._is_socks:
+ # SOCKSHTTPSConnection takes one extra parameter: the proxy URL
+ extra_params['proxy'] = self.proxy
+ else:
+ actual_host = self.proxy.host
+ actual_port = self.proxy.port
+
conn = self.ConnectionCls(host=actual_host, port=actual_port,
timeout=self.timeout.connect_timeout,
**extra_params)
8 urllib3/filepost.py
View
@@ -38,10 +38,10 @@ def iter_field_objects(fields):
i = iter(fields)
for field in i:
- if isinstance(field, RequestField):
- yield field
- else:
- yield RequestField.from_tuples(*field)
+ if isinstance(field, RequestField):
+ yield field
+ else:
+ yield RequestField.from_tuples(*field)
def iter_fields(fields):
483 urllib3/packages/socks.py
View
@@ -0,0 +1,483 @@
+"""
+SocksiPy - Python SOCKS module.
+Version 1.4
+
+Copyright 2006 Dan-Haim. All rights reserved.
+
+Redistribution and use in source and binary forms, with or without modification,
+are permitted provided that the following conditions are met:
+1. Redistributions of source code must retain the above copyright notice, this
+ list of conditions and the following disclaimer.
+2. Redistributions in binary form must reproduce the above copyright notice,
+ this list of conditions and the following disclaimer in the documentation
+ and/or other materials provided with the distribution.
+3. Neither the name of Dan Haim nor the names of his contributors may be used
+ to endorse or promote products derived from this software without specific
+ prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY DAN HAIM "AS IS" AND ANY EXPRESS OR IMPLIED
+WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
+EVENT SHALL DAN HAIM OR HIS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA
+OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMANGE.
+
+
+This module provides a standard socket-like interface for Python
+for tunneling connections through SOCKS proxies.
+
+===============================================================================
+
+Minor modifications made by Christopher Gilbert (http://motomastyle.com/)
+for use in PyLoris (http://pyloris.sourceforge.net/)
+
+Minor modifications made by Mario Vilas (http://breakingcode.wordpress.com/)
+mainly to merge bug fixes found in Sourceforge
+
+Modifications made by Anorov (https://github.com/Anorov)
+-Forked and renamed to PySocks
+-Fixed issue with HTTP proxy failure checking (same bug that was in the old ___recvall() method)
+-Included SocksiPyHandler (sockshandler.py), to be used as a urllib2 handler,
+ courtesy of e000 (https://github.com/e000): https://gist.github.com/869791#file_socksipyhandler.py
+-Re-styled code to make it readable
+ -Aliased PROXY_TYPE_SOCKS5 -> SOCKS5 etc.
+ -Improved exception handling and output
+ -Removed irritating use of sequence indexes, replaced with tuple unpacked variables
+ -Fixed up Python 3 bytestring handling - chr(0x03).encode() -> b"\x03"
+ -Other general fixes
+-Added clarification that the HTTP proxy connection method only supports CONNECT-style tunneling HTTP proxies
+-Various small bug fixes
+"""
+
+__version__ = "1.4"
+
+import socket
+import struct
+
+PROXY_TYPE_SOCKS4 = SOCKS4 = 1
+PROXY_TYPE_SOCKS5 = SOCKS5 = 2
+PROXY_TYPE_HTTP = HTTP = 3
+
+PRINTABLE_PROXY_TYPES = {SOCKS4: "SOCKS4", SOCKS5: "SOCKS5", HTTP: "HTTP"}
+
+_orgsocket = _orig_socket = socket.socket
+
+class ProxyError(IOError): pass
+class GeneralProxyError(ProxyError): pass
+class SOCKS5AuthError(ProxyError): pass
+class SOCKS5Error(ProxyError): pass
+class SOCKS4Error(ProxyError): pass
+class HTTPError(ProxyError): pass
+
+SOCKS4_ERRORS = { 0x5B: "Request rejected or failed",
+ 0x5C: "Request rejected because SOCKS server cannot connect to identd on the client",
+ 0x5D: "Request rejected because the client program and identd report different user-ids"
+ }
+
+SOCKS5_ERRORS = { 0x01: "General SOCKS server failure",
+ 0x02: "Connection not allowed by ruleset",
+ 0x03: "Network unreachable",
+ 0x04: "Host unreachable",
+ 0x05: "Connection refused",
+ 0x06: "TTL expired",
+ 0x07: "Command not supported, or protocol error",
+ 0x08: "Address type not supported"
+ }
+
+DEFAULT_PORTS = { SOCKS4: 1080,
+ SOCKS5: 1080,
+ HTTP: 8080
+ }
+
+def set_default_proxy(proxy_type=None, addr=None, port=None, rdns=True, username=None, password=None):
+ """
+ set_default_proxy(proxy_type, addr[, port[, rdns[, username, password]]])
+
+ Sets a default proxy which all further socksocket objects will use,
+ unless explicitly changed.
+ """
+ socksocket.default_proxy = (proxy_type, addr.encode(), port, rdns,
+ username.encode() if username else None,
+ password.encode() if password else None)
+
+setdefaultproxy = set_default_proxy
+
+def get_default_proxy():
+ """
+ Returns the default proxy, set by set_default_proxy.
+ """
+ return socksocket.default_proxy
+
+getdefaultproxy = get_default_proxy
+
+def wrap_module(module):
+ """
+ Attempts to replace a module's socket library with a SOCKS socket. Must set
+ a default proxy using set_default_proxy(...) first.
+ This will only work on modules that import socket directly into the namespace;
+ most of the Python Standard Library falls into this category.
+ """
+ if socksocket.default_proxy:
+ module.socket.socket = socksocket
+ else:
+ raise GeneralProxyError("No default proxy specified")
+
+wrapmodule = wrap_module
+
+def create_connection(dest_pair, proxy_type=None, proxy_addr=None,
+ proxy_port=None, proxy_username=None,
+ proxy_password=None, timeout=None):
+ """create_connection(dest_pair, **proxy_args) -> socket object
+
+ Like socket.create_connection(), but connects to proxy
+ before returning the socket object.
+
+ dest_pair - 2-tuple of (IP/hostname, port).
+ **proxy_args - Same args passed to socksocket.set_proxy().
+ timeout - Optional socket timeout value, in seconds.
+ """
+ sock = socksocket()
+ if isinstance(timeout, (int, float)):
+ sock.settimeout(timeout)
+ sock.set_proxy(proxy_type, proxy_addr, proxy_port,
+ proxy_username, proxy_password)
+ sock.connect(dest_pair)
+ return sock
+
+class socksocket(socket.socket):
+ """socksocket([family[, type[, proto]]]) -> socket object
+
+ Open a SOCKS enabled socket. The parameters are the same as
+ those of the standard socket init. In order for SOCKS to work,
+ you must specify family=AF_INET, type=SOCK_STREAM and proto=0.
+ """
+
+ default_proxy = None
+
+ def __init__(self, family=socket.AF_INET, type=socket.SOCK_STREAM, proto=0, _sock=None):
+ _orig_socket.__init__(self, family, type, proto, _sock)
+
+ if self.default_proxy:
+ self.proxy = self.default_proxy
+ else:
+ self.proxy = (None, None, None, None, None, None)
+ self.proxy_sockname = None
+ self.proxy_peername = None
+
+ def _recvall(self, count):
+ """
+ Receive EXACTLY the number of bytes requested from the socket.
+ Blocks until the required number of bytes have been received.
+ """
+ data = b""
+ while len(data) < count:
+ d = self.recv(count - len(data))
+ if not d:
+ self.close()
+ raise GeneralProxyError("Connection closed unexpectedly")
+ data += d
+ return data
+
+ def set_proxy(self, proxy_type=None, addr=None, port=None, rdns=True, username=None, password=None):
+ """set_proxy(proxy_type, addr[, port[, rdns[, username[, password]]]])
+ Sets the proxy to be used.
+
+ proxy_type - The type of the proxy to be used. Three types
+ are supported: PROXY_TYPE_SOCKS4 (including socks4a),
+ PROXY_TYPE_SOCKS5 and PROXY_TYPE_HTTP
+ addr - The address of the server (IP or DNS).
+ port - The port of the server. Defaults to 1080 for SOCKS
+ servers and 8080 for HTTP proxy servers.
+ rdns - Should DNS queries be performed on the remote side
+ (rather than the local side). The default is True.
+ Note: This has no effect with SOCKS4 servers.
+ username - Username to authenticate with to the server.
+ The default is no authentication.
+ password - Password to authenticate with to the server.
+ Only relevant when username is also provided.
+ """
+ self.proxy = (proxy_type, addr.encode(), port, rdns,
+ username.encode() if username else None,
+ password.encode() if password else None)
+
+ setproxy = set_proxy
+
+ def get_proxy_sockname(self):
+ """
+ Returns the bound IP address and port number at the proxy.
+ """
+ return self.proxy_sockname
+
+ getproxysockname = get_proxy_sockname
+
+ def get_proxy_peername(self):
+ """
+ Returns the IP and port number of the proxy.
+ """
+ return _orig_socket.getpeername(self)
+
+ getproxypeername = get_proxy_peername
+
+ def get_peername(self):
+ """
+ Returns the IP address and port number of the destination
+ machine (note: get_proxy_peername returns the proxy)
+ """
+ return self.proxy_peername
+
+ getpeername = get_peername
+
+ def _negotiate_SOCKS5(self, dest_addr, dest_port):
+ """
+ Negotiates a connection through a SOCKS5 server.
+ """
+ proxy_type, addr, port, rdns, username, password = self.proxy
+
+ # First we'll send the authentication packages we support.
+ if username and password:
+ # The username/password details were supplied to the
+ # set_proxy method so we support the USERNAME/PASSWORD
+ # authentication (in addition to the standard none).
+ self.sendall(b"\x05\x02\x00\x02")
+ else:
+ # No username/password were entered, therefore we
+ # only support connections with no authentication.
+ self.sendall(b"\x05\x01\x00")
+
+ # We'll receive the server's response to determine which
+ # method was selected
+ chosen_auth = self._recvall(2)
+
+ if chosen_auth[0:1] != b"\x05":
+ # Note: string[i:i+1] is used because indexing of a bytestring
+ # via bytestring[i] yields an integer in Python 3
+ self.close()
+ raise GeneralProxyError("SOCKS5 proxy server sent invalid data")
+
+ # Check the chosen authentication method
+
+ if chosen_auth[1:2] == b"\x02":
+ # Okay, we need to perform a basic username/password
+ # authentication.
+ self.sendall(b"\x01" + chr(len(username)).encode()
+ + username
+ + chr(len(password)).encode()
+ + password)
+ auth_status = self._recvall(2)
+ if auth_status[0:1] != b"\x01":
+ # Bad response
+ self.close()
+ raise GeneralProxyError("SOCKS5 proxy server sent invalid data")
+ if auth_status[1:2] != b"\x00":
+ # Authentication failed
+ self.close()
+ raise SOCKS5AuthError("SOCKS5 authentication failed")
+
+ # Otherwise, authentication succeeded
+
+ # No authentication is required if 0x00
+ elif chosen_auth[1:2] != b"\x00":
+ # Reaching here is always bad
+ self.close()
+ if chosen_auth[1:2] == b"\xFF":
+ raise SOCKS5AuthError("All offered SOCKS5 authentication methods were rejected")
+ else:
+ raise GeneralProxyError("SOCKS5 proxy server sent invalid data")
+
+ # Now we can request the actual connection
+ req = b"\x05\x01\x00"
+ # If the given destination address is an IP address, we'll
+ # use the IPv4 address request even if remote resolving was specified.
+ try:
+ addr_bytes = socket.inet_aton(dest_addr)
+ req += b"\x01" + addr_bytes
+ except socket.error:
+ # Well it's not an IP number, so it's probably a DNS name.
+ if rdns:
+ # Resolve remotely
+ addr_bytes = None
+ req += b"\x03" + chr(len(dest_addr)).encode() + dest_addr.encode()
+ else:
+ # Resolve locally
+ addr_bytes = socket.inet_aton(socket.gethostbyname(dest_addr))
+ req += b"\x01" + addr_bytes
+
+ req += struct.pack(">H", dest_port)
+ self.sendall(req)
+
+ # Get the response
+ resp = self._recvall(4)
+ if resp[0:1] != b"\x05":
+ self.close()
+ raise GeneralProxyError("SOCKS5 proxy server sent invalid data")
+
+ status = ord(resp[1:2])
+ if status != 0x00:
+ # Connection failed: server returned an error
+ self.close()
+ error = SOCKS5_ERRORS.get(status, "Unknown error")
+ raise SOCKS5Error("{:#04x}: {}".format(status, error))
+
+ # Get the bound address/port
+ if resp[3:4] == b"\x01":
+ bound_addr = self._recvall(4)
+ elif resp[3:4] == b"\x03":
+ resp += self.recv(1)
+ bound_addr = self._recvall(ord(resp[4:5]))
+ else:
+ self.close()
+ raise GeneralProxyError("SOCKS5 proxy server sent invalid data")
+
+ bound_port = struct.unpack(">H", self._recvall(2))[0]
+ self.proxy_sockname = bound_addr, bound_port
+ if addr_bytes:
+ self.proxy_peername = socket.inet_ntoa(addr_bytes), dest_port
+ else:
+ self.proxy_peername = dest_addr, dest_port
+
+ def _negotiate_SOCKS4(self, dest_addr, dest_port):
+ """
+ Negotiates a connection through a SOCKS4 server.
+ """
+ proxy_type, addr, port, rdns, username, password = self.proxy
+
+ # Check if the destination address provided is an IP address
+ remote_resolve = False
+ try:
+ addr_bytes = socket.inet_aton(dest_addr)
+ except socket.error:
+ # It's a DNS name. Check where it should be resolved.
+ if rdns:
+ addr_bytes = b"\x00\x00\x00\x01"
+ remote_resolve = True
+ else:
+ addr_bytes = socket.inet_aton(socket.gethostbyname(dest_addr))
+
+ # Construct the request packet
+ req = struct.pack(">BBH", 0x04, 0x01, dest_port) + addr_bytes
+
+ # The username parameter is considered userid for SOCKS4
+ if username:
+ req += username
+ req += b"\x00"
+
+ # DNS name if remote resolving is required
+ # NOTE: This is actually an extension to the SOCKS4 protocol
+ # called SOCKS4A and may not be supported in all cases.
+ if remote_resolve:
+ req += dest_addr.encode() + b"\x00"
+ self.sendall(req)
+
+ # Get the response from the server
+ resp = self._recvall(8)
+ if resp[0:1] != b"\x00":
+ # Bad data
+ self.close()
+ raise GeneralProxyError("SOCKS4 proxy server sent invalid data")
+
+ status = ord(resp[1:2])
+ if status != 0x5A:
+ # Connection failed: server returned an error
+ self.close()
+ error = SOCKS4_ERRORS.get(status, "Unknown error")
+ raise SOCKS4Error("{:#04x}: {}".format(status, error))
+
+ # Get the bound address/port
+ self.proxy_sockname = (socket.inet_ntoa(resp[4:]), struct.unpack(">H", resp[2:4])[0])
+ if remote_resolve:
+ self.proxy_peername = socket.inet_ntoa(addr_bytes), dest_port
+ else:
+ self.proxy_peername = dest_addr, dest_port
+
+ def _negotiate_HTTP(self, dest_addr, dest_port):
+ """
+ Negotiates a connection through an HTTP server.
+ NOTE: This currently only supports HTTP CONNECT-style proxies.
+ """
+ proxy_type, addr, port, rdns, username, password = self.proxy
+
+ # If we need to resolve locally, we do this now
+ addr = dest_addr if rdns else socket.gethostbyname(dest_addr)
+
+ self.sendall(b"CONNECT " + addr.encode() + b":" + str(dest_port).encode() +
+ b" HTTP/1.1\r\n" + b"Host: " + dest_addr.encode() + b"\r\n\r\n")
+
+ resp = self.recv(4096)
+ while b"\r\n\r\n" not in resp and b"\n\n" not in resp:
+ d = self.recv(4096)
+ if not d:
+ self.close()
+ raise GeneralProxyError("Connection closed unexpectedly")
+ resp += d
+
+ # We just need the first line to check if the connection was successful
+ status_line = resp.splitlines()[0].split(b" ", 2)
+
+ if not status_line[0].startswith(b"HTTP/"):
+ self.close()
+ raise GeneralProxyError("Proxy server does not appear to be an HTTP proxy")
+
+ try:
+ status_code = int(status_line[1])
+ except ValueError:
+ self.close()
+ raise HTTPError("HTTP proxy server did not return a valid HTTP status")
+
+ if status_code != 200:
+ self.close()
+ error = "{}: {}".format(status_code, status_line[2].decode())
+ if status_code in (400, 403, 405):
+ # It's likely that the HTTP proxy server does not support the CONNECT tunneling method
+ error += ("\n[*] Note: The HTTP proxy server may not be supported by PySocks"
+ " (must be a CONNECT tunnel proxy)")
+ raise HTTPError(error)
+
+ self.proxy_sockname = (b"0.0.0.0", 0)
+ self.proxy_peername = addr, dest_port
+
+ def connect(self, dest_pair):
+ """
+ Connects to the specified destination through a proxy.
+ Uses the same API as socket's connect().
+ To select the proxy server, use set_proxy().
+
+ dest_pair - 2-tuple of (IP/hostname, port).
+ """
+ proxy_type, proxy_addr, proxy_port, rdns, username, password = self.proxy
+ dest_addr, dest_port = dest_pair
+
+ # Do a minimal input check first
+ if (not isinstance(dest_pair, (list, tuple))
+ or len(dest_pair) != 2
+ or not isinstance(dest_addr, type(""))
+ or not isinstance(dest_port, int)):
+ raise GeneralProxyError("Invalid destination-connection (host, port) pair")
+
+ try:
+ if proxy_type is None:
+ _orig_socket.connect(self, (dest_addr, dest_port))
+ else:
+ port = proxy_port or DEFAULT_PORTS.get(proxy_type)
+ if not port:
+ raise GeneralProxyError("Invalid proxy type")
+
+ _orig_socket.connect(self, (proxy_addr, port))
+
+ if proxy_type == SOCKS5:
+ self._negotiate_SOCKS5(dest_addr, dest_port)
+ elif proxy_type == SOCKS4:
+ self._negotiate_SOCKS4(dest_addr, dest_port)
+ elif proxy_type == HTTP:
+ self._negotiate_HTTP(dest_addr, dest_port)
+
+ except socket.error as error:
+ self.close()
+ proxy_server = "{}:{}".format(proxy_addr.decode(), proxy_port)
+ printable_type = PRINTABLE_PROXY_TYPES[proxy_type]
+ errno, msg = error.args
+ msg = "Error connecting to {} proxy {}: {}".format(printable_type,
+ proxy_server, msg)
+ raise socket.error(errno, msg)
20 urllib3/poolmanager.py
View
@@ -16,6 +16,7 @@
from .connectionpool import port_by_scheme
from .request import RequestMethods
from .util import parse_url
+from .exceptions import ProxyError
__all__ = ['PoolManager', 'ProxyManager', 'proxy_from_url']
@@ -176,7 +177,7 @@ class ProxyManager(PoolManager):
Behaves just like :class:`PoolManager`, but sends all requests through
the defined proxy, using the CONNECT method for HTTPS URLs.
- :param poxy_url:
+ :param proxy_url:
The URL of the proxy to be used.
:param proxy_headers:
@@ -195,7 +196,6 @@ class ProxyManager(PoolManager):
>>> r4 = proxy.request('GET', 'https://twitter.com/')
>>> len(proxy.pools)
3
-
"""
def __init__(self, proxy_url, num_pools=10, headers=None,
@@ -208,17 +208,22 @@ def __init__(self, proxy_url, num_pools=10, headers=None,
if not proxy.port:
port = port_by_scheme.get(proxy.scheme, 80)
proxy = proxy._replace(port=port)
+
+ if proxy.scheme not in ("http", "https", "socks4", "socks5"):
+ raise ProxyError("Unsupported proxy scheme '%s'" % proxy.scheme)
+
+ proxy._is_socks = (proxy is not None
+ and proxy.scheme.startswith("socks"))
self.proxy = proxy
self.proxy_headers = proxy_headers or {}
- assert self.proxy.scheme in ("http", "https"), \
- 'Not supported proxy scheme %s' % self.proxy.scheme
+
connection_pool_kw['_proxy'] = self.proxy
connection_pool_kw['_proxy_headers'] = self.proxy_headers
super(ProxyManager, self).__init__(
num_pools, headers, **connection_pool_kw)
def connection_from_host(self, host, port=None, scheme='http'):
- if scheme == "https":
+ if scheme == "https" or self.proxy._is_socks:
return super(ProxyManager, self).connection_from_host(
host, port, scheme)
@@ -241,7 +246,9 @@ def _set_proxy_headers(self, url, headers=None):
return headers_
def urlopen(self, method, url, redirect=True, **kw):
- "Same as HTTP(S)ConnectionPool.urlopen, ``url`` must be absolute."
+ """
+ Same as HTTP(S)ConnectionPool.urlopen, ``url`` must be absolute.
+ """
u = parse_url(url)
if u.scheme == "http":
@@ -250,7 +257,6 @@ def urlopen(self, method, url, redirect=True, **kw):
# need to set 'Host' at the very least.
kw['headers'] = self._set_proxy_headers(url, kw.get('headers',
self.headers))
-
return super(ProxyManager, self).urlopen(method, url, redirect, **kw)
3  urllib3/util.py
View
@@ -122,7 +122,7 @@ def __init__(self, total=None, connect=_Default, read=_Default):
self.total = self._validate_timeout(total, 'total')
self._start_connect = None
- def __str__(self):
+ def __repr__(self):
return '%s(connect=%r, read=%r, total=%r)' % (
type(self).__name__, self._connect, self._read, self.total)
@@ -606,7 +606,6 @@ def is_fp_closed(obj):
return obj.closed
-
if SSLContext is not None: # Python 3.2+
def ssl_wrap_socket(sock, keyfile=None, certfile=None, cert_reqs=None,
ca_certs=None, server_hostname=None,
Something went wrong with that request. Please try again.