Thank you for your bravery, @Anorov. :)
I guess the first thing we should sort out is the testing. After that we can tackle code design tweaks (though I don't see anything superbad).
Some concerns:
socks_connection_from_url lives and such, but let's deal with that after. :)Also cc other people who helped on proxy-related work, I could really use more eyes on this: @schlamar @stanvit @brendoncrawford @foxx @lukasa @sigmavirus24 @t-8ch
I initially put socks_connection_from_url in util.py but later decided that since it would only be used by the connection classes, I might as well put it in connection.py. Open to suggestions about that, though.
And yeah, I wasn't terribly keen on using Twisted for this but I could not find any other decent pure-Python SOCKS4 and SOCKS5 modules; the few that I found had serious issues with them. I'm aware of many non-Python ones; those would probably be the easiest to use, but would add a lot of additional dependencies to the tests.
I had issues with socks5.py. shuttle seems to work alright, but it only supports SOCKS5 and I did not find a good corresponding SOCKS4 equivalent. Still looking around for one.
I am able to run the tests on a clean master. Only after my re-arranging of the tests to accomodate SOCKS proxies did I get issues; the issues were only being had with the HTTP, SOCKS4, and SOCKS5 proxy tests. I believe it has something to do with starting TornadoServerThreads. So the broken code is likely in dummyserver/testcase.py. I am running Xubuntu 64-bit, on an Intel CPU.
So, I can confirm the broken tests on OS X, and they're very dramatically broken. In fact, their brokenness appears to be non-deterministic, which is awesome. In three runs I got two hangs (in different tests) and one run that ran to completion but failed many tests. However, as @Anorov spotted, they all failed with SSL errors (or errors relating to SSL, or logs that indicate that HTTPS was involved).
Running each test file by itself reveals that the problem is coming from with_dummyserver/test_proxy.py (not really a shock since @Anorov already spotted that). Running just that gets way more exciting, occasionally dumping fun errors like this one:
(env)cory@corymbp:urllib3/ % ./env/bin/nosetests test/with_dummyserver/test_proxy.py
E.python2.7(5354,0x110012000) malloc: *** error for object 0x7fdf01f79790: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
[1] 5354 abort ./env/bin/nosetests test/with_dummyserver/test_proxy.py
So I suspect, as @Anorov does, that Tornado and Twisted are getting in each other's way. I particularly wonder if they're accidentally using each others sockets. My evidence for this is that in addition to the various SSL errors that pop up I frequently see errors moaning about file descriptors, like this one, wherein Tornado attempts to stop handling a file descriptor that it was never handling to begin with:
======================================================================
FAIL: test_basic_proxy (test.with_dummyserver.test_proxy.TestHTTPProxy)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/cory/tmp/urllib3/test/with_dummyserver/test_proxy.py", line 42, in test_basic_proxy
self.assertEqual(r.status, 200)
AssertionError: 500 != 200
-------------------- >> begin captured logging << --------------------
urllib3.connectionpool: INFO: Starting new HTTP connection (1): localhost
tornado.general: ERROR: Uncaught exception, closing connection.
Traceback (most recent call last):
File "/Users/cory/tmp/urllib3/env/lib/python2.7/site-packages/tornado/iostream.py", line 330, in _handle_events
self.io_loop.update_handler(self.fileno(), self._state)
File "/Users/cory/tmp/urllib3/env/lib/python2.7/site-packages/tornado/ioloop.py", line 529, in update_handler
self._impl.modify(fd, events | self.ERROR)
File "/Users/cory/tmp/urllib3/env/lib/python2.7/site-packages/tornado/platform/kqueue.py", line 45, in modify
self.unregister(fd)
File "/Users/cory/tmp/urllib3/env/lib/python2.7/site-packages/tornado/platform/kqueue.py", line 49, in unregister
events = self._active.pop(fd)
KeyError: 15
tornado.access: INFO: 200 GET / (127.0.0.1) 1.60ms
tornado.application: ERROR: Exception in I/O handler for fd 15
Traceback (most recent call last):
File "/Users/cory/tmp/urllib3/env/lib/python2.7/site-packages/tornado/ioloop.py", line 672, in start
self._handlers[fd](fd, events)
File "/Users/cory/tmp/urllib3/env/lib/python2.7/site-packages/tornado/stack_context.py", line 331, in wrapped
raise_exc_info(exc)
File "/Users/cory/tmp/urllib3/env/lib/python2.7/site-packages/tornado/stack_context.py", line 302, in wrapped
ret = fn(*args, **kwargs)
File "/Users/cory/tmp/urllib3/env/lib/python2.7/site-packages/tornado/iostream.py", line 330, in _handle_events
self.io_loop.update_handler(self.fileno(), self._state)
File "/Users/cory/tmp/urllib3/env/lib/python2.7/site-packages/tornado/ioloop.py", line 529, in update_handler
self._impl.modify(fd, events | self.ERROR)
File "/Users/cory/tmp/urllib3/env/lib/python2.7/site-packages/tornado/platform/kqueue.py", line 45, in modify
self.unregister(fd)
File "/Users/cory/tmp/urllib3/env/lib/python2.7/site-packages/tornado/platform/kqueue.py", line 49, in unregister
events = self._active.pop(fd)
KeyError: 15
tornado.application: ERROR: Uncaught exception GET http://localhost:56221/ (127.0.0.1)
HTTPRequest(protocol='http', host='localhost:56221', method='GET', uri='http://localhost:56221/', version='HTTP/1.1', remote_ip='127.0.0.1', headers={'Host': 'localhost:56221', 'Accept-Encoding': 'identity', 'Accept': '*/*'})
Traceback (most recent call last):
File "/Users/cory/tmp/urllib3/env/lib/python2.7/site-packages/tornado/web.py", line 1115, in _stack_context_handle_exception
raise_exc_info((type, value, traceback))
File "/Users/cory/tmp/urllib3/env/lib/python2.7/site-packages/tornado/stack_context.py", line 302, in wrapped
ret = fn(*args, **kwargs)
File "/Users/cory/tmp/urllib3/dummyserver/httpproxy.py", line 53, in handle_response
self.set_status(response.code)
File "/Users/cory/tmp/urllib3/env/lib/python2.7/site-packages/tornado/web.py", line 284, in set_status
raise ValueError("unknown status code %d", status_code)
ValueError: ('unknown status code %d', 599)
tornado.access: ERROR: 500 GET http://localhost:56221/ (127.0.0.1) 26.84ms
urllib3.connectionpool: DEBUG: "GET http://localhost:56221/ HTTP/1.1" 500 93
--------------------- >> end captured logging << ---------------------
Running just the proxy test and dumping the traffic with tcpdump reveals that we're not doing any SSL at all. I also don't see any traffic to or from ports 1080 or 1081, which should be where we're running the SOCKS proxies. That last bit of information seems most telling to me.
If memory serves, we had all kinds of fun non-deterministic stuff back when we had Twisted instead of Tornado for our dummyserver. I don't think we ever quite got Twisted working properly.
@Lukasa Yep, that's exactly the behavior I observed. I was pulling my hair out for a while. Once I started seeing malloc errors I threw in the towel.
I temporarily "commented out" (added raise SkipTest) to the SOCKS tests, which may be why you aren't seeing 1080/1081 traffic if you didn't amend that. When testing on my own, I was able to manually connect to the SOCKS servers through my browser (I added a time.sleep(9999) immediately after the proxy server was started).
Though, if you did leave those SkipTests, that would mean Twisted wasn't running at all, which would indicate the core problem is elsewhere.
I don't actually think Tornado ioloops are supposed to be kicked off in separate threads; if so, that might explain quite a bit. I observed that TornadoServerThread._start_server() is called alone in some places: I don't think this actually starts the thread. And when I used just _start_server() in the _start_http_servers() class method, the HTTP servers would hang indefinitely when any request was sent to them.
_start_server() is already called in TornadoServerThread.run(), so I changed the calls (in the proxy tests) from _start_server() to .start(), which should spawn a thread as well as start the server. And after I did that, my hanging issue disappeared but all the other indeterministic problems began. So it's all a mess at this point.
Either way, I definitely think we should try and find non-Twisted alternatives for both SOCKS servers. If worse comes to worst, I suppose one of us (maybe me) can extend shuttle (which runs on Tornado) to also do SOCKS4. The core code is already there, it's just slightly different protocol negotiation logic.
The repo is here: https://github.com/ccp0101/shuttle
In config.py, set upstream = "upstreams.local.LocalUpstream" for regular tunneling.
I don't actually think Tornado ioloops are supposed to be kicked off in separate threads;
No, that is not true: "Atypical applications may use more than one IOLoop, such as one IOLoop per thread, or per unittest case." (http://www.tornadoweb.org/en/stable/ioloop.html)
I observed that TornadoServerThread._start_server() is called alone in some places: I don't think this actually starts the thread. And when I used just _start_server() in the _start_http_servers() class method, the HTTP servers would hang indefinitely when any request was sent to them.
The code flow in the dummyserver test cases is really strange at some places, but last time I touched the Tornado tests (#226) I was pretty sure that I fixed all this issues with Tornado. But I'll have a look again.
BTW, Twisted installation via pip is not officially supported: http://twistedmatrix.com/trac/wiki/FrequentlyAskedQuestions#CanIinstallTwistedusingeasy_installorpip
BTW, Twisted installation via pip is not officially supported: http://twistedmatrix.com/trac/wiki/FrequentlyAskedQuestions#CanIinstallTwistedusingeasy_installorpip
Well that's crap.
Well that's crap.
No, pip install twisted fails on Windows.
Ok, the main issue is the upgrade to Tornado 3.x and its changes to IOLoop.instance.
On master and with Tornado 2.x, we use one global IOLoop (which we get by calling IOLoop.instance) and run it in one thread, so it is intentional that we call _start_server because this will link the web application with the global IOLoop, which is later started in the proxy thread (see c6629d4).
On Tornado 3.x IOLoop.instance returns a thread specific IOLoop, so the web application started in _start_server is linked to the MainThread's IOLoop (which is never started) so there is obviously a dead lock.
If you want to make the tests compatible with Tornado 3.x you should create a global IOLoop on your own and pass them explicitly to the calls of HTTPServer (see http://www.tornadoweb.org/en/branch3.1/httpserver.html#tornado.httpserver.HTTPServer). I don't think that the upgrade to Tornado 3.x is necessary at all. But if you want to I would suggest that you make this in a separate PR.
Btw, it should be possible to run multiple threads with IOLoops but there are some race conditions which are hard to track down so I would strongly suggest keeping the single thread solution.
Can we just use one event loop for everything? http://www.tornadoweb.org/en/latest/twisted.html
On Tornado 3.x IOLoop.instance returns a thread specific IOLoop, so the web application started in _start_server is linked to the MainThread's IOLoop (which is never started) so there is obviously a dead lock.
Despite these changes Tornado 3.x works on master (because we run the MainThread's IOLoop in the thread). So I'm not sure what you have done wrong. Probably you didn't start any thread at all?!
I think I'm going to refactor the dummyserver test cases slightly (on master) to make more clear what's going on.
@schlamar Thank you for investigating this. The reason I upgraded Tornado to version 3 is due to this bug: tornadoweb/tornado#593
With the older version used in the test suite, any unit test that involved IPv6 would cause bind_sockets to raise getaddrinfo exceptions on my computer. Upgrading immediately fixed that problem.
Will rebase.
@Anorov any updates on this? If not, I'm willing to look into rebasing this myself (but I'd rather not do duplicate work, hence the question).
@mcuelenaere I had a bit of trouble with merging a few things, but the rebase has been made and I'll be updating this pull in the near future.
Any update here?
Sorry, my laptop died unfortunately. Just got a new one. I have to modify things a little bit more based on master changes, but I should have everything pushed here by next week.
At this point it might be easier for me to scrap this pull and re-do my changes to master, since I have to rewrite a fair bit of the tests, then make another pull. Is it okay if I do that, or should I just rebase again?
Up to you. :) Retaining this thread is preferred, but if it's too much of a pain then a new PR is acceptable.
You could start a new branch, make your changes, rename it to the old branch's name and force-push it. That should keep this thread while making all the changes new.
You could start a new branch, make your changes, rename it to the old branch's name and force-push it. That should keep this thread while making all the changes new.
(It might be even possible to force push the new branch without a local rename with git push -f origin old-branch)
So, a few questions while I was mulling through connection.py.
VerifiedHTTPSConnection starts off with this connect method:
try:
sock = socket.create_connection(
address=(self.host, self.port),
timeout=self.timeout,
)
except SocketTimeout:
raise ConnectTimeoutError(
self, "Connection to %s timed out. (connect timeout=%s)" %
(self.host, self.timeout))
As far as I can tell, neither HTTPConnection nor HTTPSConnection have that connect timeout wrapping.
Also, the socket.create_connection in VerifiedHTTPSConnection doesn't pass source_address here, despite the other 2 classes doing so. Both the other classes have:
try:
conn = socket.create_connection(
(self.host, self.port),
self.timeout,
self.source_address,
)
except AttributeError: # Python 2.6
conn = socket.create_connection(
(self.host, self.port),
self.timeout,
)
This attribute existence check is only in VerifiedHTTPSConnection, but the same problem would affect every other Connection class:
# the _tunnel_host attribute was added in python 2.6.3 (via
# http://hg.python.org/cpython/rev/0f57b30a152f) so pythons 2.6(0-2) do
# not have them.
if getattr(self, '_tunnel_host', None):
self.sock = sock
# Calls self._set_hostport(), so self.host is
# self._tunnel_host below.
self._tunnel()
Basically, it seems like VerifiedHTTPSConnection is missing 2 fairly recent changes, and has 1 check (existence of _tunnel_host attribute) that everything else is missing. So it's kind of isolated from the rest of the file.
Should I try and remediate those 3 issues, including wrapping ConnectTimeout for all 3 classes? It would only be a few quick changes.
Here is my new proposed version of connection.py:
https://gist.github.com/Anorov/9129376
I resolved the above issues, integrated SOCKS support, and also cleaned up some of the redundant conn, sock, and self.sock ambiguity in some parts, in favor of just modifying the instance variable each time. That last part is a bit ugly, but it accomodates _tunnel() better than the old version, I think.
I made the SOCKS modifications much simpler, without introducing any new classes; the disadvantage is that all HTTPConnection classes have a little bit of logic in their __init__ now. socks_connection_from_url was moved to util.
I also think we could simplify HTTPSConnection.__init__, see: https://gist.github.com/Anorov/9129376#file-connection-py-L137
Before I continue with the rest of the modifications and tests, I'd like comments on these changes.
If you'd like to fix them in a separate earlier PR
That would be great. Or at least a separate commit
looking forward to this!
Hello, has any progress been made on this? Is it possible to help somehow?
Sorry for basically going AWOL on this. I've been really busy with school and work in the past few months.
Basically I just need to hunker down, write the full test suite for this while taking into consideration some of the changes to the dummy servers, then re-apply something close to the commits I already have here.
| @@ -0,0 +1,14 @@ | ||
| +#!/usr/bin/env python | ||
| +from twisted.internet import reactor | ||
| +from twisted.protocols.socks import SOCKSv4Factory | ||
| + | ||
| +def run_socks4_proxy(host="127.0.0.1", port=1080): | ||
| + reactor.listenTCP(port, SOCKSv4Factory("/dev/null"), interface=host) | ||
| + try: | ||
| + reactor.run() | ||
| + except (KeyboardInterrupt, SystemExit): | ||
| + reactor.stop() | ||
| + | ||
| +if __name__ == "__main__": | ||
| + print("Starting SOCKS4 proxy server...") | ||
| + run_socks4_proxy() |
| @@ -0,0 +1,131 @@ | ||
| +from twisted.internet import reactor, protocol | ||
| +import struct | ||
| + | ||
| +class remote_protocol(protocol.Protocol): | ||
| + def connectionMade(self): | ||
| + print 'Connection made' | ||
| + self.socks5 = self.factory.socks5 | ||
| + # -- send success to client | ||
| + self.socks5.send_connect_response(0) | ||
| + self.socks5.remote = self.transport | ||
| + self.socks5.state = 'communicate' | ||
| + def dataReceived(self, data): | ||
| + self.socks5.transport.write(data) | ||
| + | ||
| +class remote_factory(protocol.ClientFactory): | ||
| + def __init__(self, socks5): | ||
| + self.protocol = remote_protocol | ||
| + self.socks5 = socks5 | ||
| + def clientConnectionFailed(self, connector, reason): | ||
| + print 'failed:', reason.getErrorMessage() | ||
| + self.socks5.send_connect_response(5) | ||
| + self.socks5.transport.loseConnection() | ||
| + def clientConnectionLost(self, connector, reason): | ||
| + print 'con lost:', reason.getErrorMessage() | ||
| + self.socks5.transport.loseConnection() | ||
| + | ||
| +class socks5_protocol(protocol.Protocol): | ||
| + def connectionMade(self): | ||
| + self.state = 'wait_hello' | ||
| + def dataReceived(self, data): | ||
| + method = getattr(self, self.state) | ||
| + method(data) | ||
| + #-------------------------------------------------- | ||
| + def wait_hello(self, data): | ||
| + (ver, nmethods) = struct.unpack('!BB', data[:2]) | ||
| + print 'Got version = %x, nmethods = %x' % (ver,nmethods) | ||
| + if ver!=5: | ||
| + # we do SOCKS5 only | ||
| + self.transport.loseConnection() | ||
| + return | ||
| + if nmethods<1: | ||
| + # not SOCKS5 protocol?! | ||
| + self.transport.loseConnection() | ||
| + return | ||
| + methods = data[2:2+nmethods] | ||
| + for meth in methods: | ||
| + print 'method=%x' % ord(meth) | ||
| + if ord(meth)==0: | ||
| + # no auth, neato, accept | ||
| + resp = struct.pack('!BB', 5, 0) | ||
| + self.transport.write(resp) | ||
| + self.state = 'wait_connect' | ||
| + return | ||
| + if ord(meth)==255: | ||
| + # disconnect | ||
| + self.transport.loseConnection() | ||
| + return | ||
| + #-- we should have processed the request by now | ||
| + self.transport.loseConnection() | ||
| + #-------------------------------------------------- | ||
| + def wait_connect(self, data): | ||
| + (ver, cmd, rsv, atyp) = struct.unpack('!BBBB', data[:4]) | ||
| + if ver!=5 or rsv!=0: | ||
| + # protocol violation | ||
| + self.transport.loseConnection() | ||
| + return | ||
| + data = data[4:] | ||
| + if cmd==1: | ||
| + print 'CONNECT' | ||
| + host = None | ||
| + if atyp==1: # IP V4 | ||
| + print 'ipv4' | ||
| + (b1,b2,b3,b4) = struct.unpack('!BBBB', data[:4]) | ||
| + host = '%i.%i.%i.%i' % (b1,b2,b3,b4) | ||
| + data = data[4:] | ||
| + elif atyp==3: # domainname | ||
| + print 'domain' | ||
| + l = struct.unpack('!B', data[:1]) | ||
| + host = data[1:1+l] | ||
| + data = data[1+l:] | ||
| + elif atyp==4: # IP V6 | ||
| + print 'ipv6' | ||
| + else: | ||
| + # protocol violation | ||
| + self.transport.loseConnection() | ||
| + return | ||
| + (port) = struct.unpack('!H', data[:2]) | ||
| + port=port[0] | ||
| + data = data[2:] | ||
| + print '* connecting %s:%d' % (host,port) | ||
| + return self.perform_connect(host, port) | ||
| + elif cmd==2: | ||
| + print 'BIND' | ||
| + elif cmd==3: | ||
| + print 'UDP ASSOCIATE' | ||
| + #-- we should have processed the request by now | ||
| + self.transport.loseConnection() | ||
| + #-------------------------------------------------- | ||
| + def send_connect_response(self, code): | ||
| + try: | ||
| + myname = self.transport.getHost().host | ||
| + except: | ||
| + # this might fail as no longer a socket | ||
| + # is present | ||
| + self.transport.loseConnection() | ||
| + return | ||
| + ip = [int(i) for i in myname.split('.')] | ||
| + resp = struct.pack('!BBBB', 5, code, 0, 1 ) | ||
| + resp += struct.pack('!BBBB', ip[0], ip[1], ip[2], ip[3]) | ||
| + resp += struct.pack('!H', self.transport.getHost().port) | ||
| + self.transport.write(resp) | ||
| + | ||
| + def perform_connect(self, host, port): | ||
| + factory = remote_factory(self) | ||
| + reactor.connectTCP(host, port, factory) | ||
| + #-------------------------------------------------- | ||
| + def communicate(self, data): | ||
| + self.remote.write(data) | ||
| + | ||
| + | ||
| +def run_socks5_proxy(host="127.0.0.1", port=1081): | ||
| + factory = protocol.ServerFactory() | ||
| + factory.protocol = socks5_protocol | ||
| + reactor.listenTCP(port, factory, interface=host) | ||
| + try: | ||
| + reactor.run() | ||
| + except (KeyboardInterrupt, SystemExit): | ||
| + reactor.stop() | ||
| + | ||
| +if __name__ == '__main__': | ||
| + run_socks5_proxy() |
| @@ -1,4 +1,5 @@ | ||
| nose==1.3 | ||
| mock==1.0.1 | ||
| -tornado==2.4.1 | ||
| +tornado==3.1.1 | ||
| coverage==3.6 | ||
| +twisted==13.2 |
| @@ -0,0 +1,483 @@ | ||
| +""" | ||
| +SocksiPy - Python SOCKS module. | ||
| +Version 1.4 | ||
| + | ||
| +Copyright 2006 Dan-Haim. All rights reserved. | ||
| + | ||
| +Redistribution and use in source and binary forms, with or without modification, | ||
| +are permitted provided that the following conditions are met: | ||
| +1. Redistributions of source code must retain the above copyright notice, this | ||
| + list of conditions and the following disclaimer. | ||
| +2. Redistributions in binary form must reproduce the above copyright notice, | ||
| + this list of conditions and the following disclaimer in the documentation | ||
| + and/or other materials provided with the distribution. | ||
| +3. Neither the name of Dan Haim nor the names of his contributors may be used | ||
| + to endorse or promote products derived from this software without specific | ||
| + prior written permission. | ||
| + | ||
| +THIS SOFTWARE IS PROVIDED BY DAN HAIM "AS IS" AND ANY EXPRESS OR IMPLIED | ||
| +WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF | ||
| +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO | ||
| +EVENT SHALL DAN HAIM OR HIS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, | ||
| +INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT | ||
| +LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA | ||
| +OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF | ||
| +LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT | ||
| +OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMANGE. | ||
| + | ||
| + | ||
| +This module provides a standard socket-like interface for Python | ||
| +for tunneling connections through SOCKS proxies. | ||
| + | ||
| +=============================================================================== | ||
| + | ||
| +Minor modifications made by Christopher Gilbert (http://motomastyle.com/) | ||
| +for use in PyLoris (http://pyloris.sourceforge.net/) | ||
| + | ||
| +Minor modifications made by Mario Vilas (http://breakingcode.wordpress.com/) | ||
| +mainly to merge bug fixes found in Sourceforge | ||
| + | ||
| +Modifications made by Anorov (https://github.com/Anorov) | ||
| +-Forked and renamed to PySocks | ||
| +-Fixed issue with HTTP proxy failure checking (same bug that was in the old ___recvall() method) | ||
| +-Included SocksiPyHandler (sockshandler.py), to be used as a urllib2 handler, | ||
| + courtesy of e000 (https://github.com/e000): https://gist.github.com/869791#file_socksipyhandler.py | ||
| +-Re-styled code to make it readable | ||
| + -Aliased PROXY_TYPE_SOCKS5 -> SOCKS5 etc. | ||
| + -Improved exception handling and output | ||
| + -Removed irritating use of sequence indexes, replaced with tuple unpacked variables | ||
| + -Fixed up Python 3 bytestring handling - chr(0x03).encode() -> b"\x03" | ||
| + -Other general fixes | ||
| +-Added clarification that the HTTP proxy connection method only supports CONNECT-style tunneling HTTP proxies | ||
| +-Various small bug fixes | ||
| +""" | ||
| + | ||
| +__version__ = "1.4" | ||
| + | ||
| +import socket | ||
| +import struct | ||
| + | ||
| +PROXY_TYPE_SOCKS4 = SOCKS4 = 1 | ||
| +PROXY_TYPE_SOCKS5 = SOCKS5 = 2 | ||
| +PROXY_TYPE_HTTP = HTTP = 3 | ||
| + | ||
| +PRINTABLE_PROXY_TYPES = {SOCKS4: "SOCKS4", SOCKS5: "SOCKS5", HTTP: "HTTP"} | ||
| + | ||
| +_orgsocket = _orig_socket = socket.socket | ||
| + | ||
| +class ProxyError(IOError): pass | ||
| +class GeneralProxyError(ProxyError): pass | ||
| +class SOCKS5AuthError(ProxyError): pass | ||
| +class SOCKS5Error(ProxyError): pass | ||
| +class SOCKS4Error(ProxyError): pass | ||
| +class HTTPError(ProxyError): pass | ||
| + | ||
| +SOCKS4_ERRORS = { 0x5B: "Request rejected or failed", | ||
| + 0x5C: "Request rejected because SOCKS server cannot connect to identd on the client", | ||
| + 0x5D: "Request rejected because the client program and identd report different user-ids" | ||
| + } | ||
| + | ||
| +SOCKS5_ERRORS = { 0x01: "General SOCKS server failure", | ||
| + 0x02: "Connection not allowed by ruleset", | ||
| + 0x03: "Network unreachable", | ||
| + 0x04: "Host unreachable", | ||
| + 0x05: "Connection refused", | ||
| + 0x06: "TTL expired", | ||
| + 0x07: "Command not supported, or protocol error", | ||
| + 0x08: "Address type not supported" | ||
| + } | ||
| + | ||
| +DEFAULT_PORTS = { SOCKS4: 1080, | ||
| + SOCKS5: 1080, | ||
| + HTTP: 8080 | ||
| + } | ||
| + | ||
| +def set_default_proxy(proxy_type=None, addr=None, port=None, rdns=True, username=None, password=None): | ||
| + """ | ||
| + set_default_proxy(proxy_type, addr[, port[, rdns[, username, password]]]) | ||
| + | ||
| + Sets a default proxy which all further socksocket objects will use, | ||
| + unless explicitly changed. | ||
| + """ | ||
| + socksocket.default_proxy = (proxy_type, addr.encode(), port, rdns, | ||
| + username.encode() if username else None, | ||
| + password.encode() if password else None) | ||
| + | ||
| +setdefaultproxy = set_default_proxy | ||
| + | ||
| +def get_default_proxy(): | ||
| + """ | ||
| + Returns the default proxy, set by set_default_proxy. | ||
| + """ | ||
| + return socksocket.default_proxy | ||
| + | ||
| +getdefaultproxy = get_default_proxy | ||
| + | ||
| +def wrap_module(module): | ||
| + """ | ||
| + Attempts to replace a module's socket library with a SOCKS socket. Must set | ||
| + a default proxy using set_default_proxy(...) first. | ||
| + This will only work on modules that import socket directly into the namespace; | ||
| + most of the Python Standard Library falls into this category. | ||
| + """ | ||
| + if socksocket.default_proxy: | ||
| + module.socket.socket = socksocket | ||
| + else: | ||
| + raise GeneralProxyError("No default proxy specified") | ||
| + | ||
| +wrapmodule = wrap_module | ||
| + | ||
| +def create_connection(dest_pair, proxy_type=None, proxy_addr=None, | ||
| + proxy_port=None, proxy_username=None, | ||
| + proxy_password=None, timeout=None): | ||
| + """create_connection(dest_pair, **proxy_args) -> socket object | ||
| + | ||
| + Like socket.create_connection(), but connects to proxy | ||
| + before returning the socket object. | ||
| + | ||
| + dest_pair - 2-tuple of (IP/hostname, port). | ||
| + **proxy_args - Same args passed to socksocket.set_proxy(). | ||
| + timeout - Optional socket timeout value, in seconds. | ||
| + """ | ||
| + sock = socksocket() | ||
| + if isinstance(timeout, (int, float)): | ||
| + sock.settimeout(timeout) | ||
| + sock.set_proxy(proxy_type, proxy_addr, proxy_port, | ||
| + proxy_username, proxy_password) | ||
| + sock.connect(dest_pair) | ||
| + return sock | ||
| + | ||
| +class socksocket(socket.socket): | ||
| + """socksocket([family[, type[, proto]]]) -> socket object | ||
| + | ||
| + Open a SOCKS enabled socket. The parameters are the same as | ||
| + those of the standard socket init. In order for SOCKS to work, | ||
| + you must specify family=AF_INET, type=SOCK_STREAM and proto=0. | ||
| + """ | ||
| + | ||
| + default_proxy = None | ||
| + | ||
| + def __init__(self, family=socket.AF_INET, type=socket.SOCK_STREAM, proto=0, _sock=None): | ||
| + _orig_socket.__init__(self, family, type, proto, _sock) | ||
| + | ||
| + if self.default_proxy: | ||
| + self.proxy = self.default_proxy | ||
| + else: | ||
| + self.proxy = (None, None, None, None, None, None) | ||
| + self.proxy_sockname = None | ||
| + self.proxy_peername = None | ||
| + | ||
| + def _recvall(self, count): | ||
| + """ | ||
| + Receive EXACTLY the number of bytes requested from the socket. | ||
| + Blocks until the required number of bytes have been received. | ||
| + """ | ||
| + data = b"" | ||
| + while len(data) < count: | ||
| + d = self.recv(count - len(data)) | ||
| + if not d: | ||
| + self.close() | ||
| + raise GeneralProxyError("Connection closed unexpectedly") | ||
| + data += d | ||
| + return data | ||
| + | ||
| + def set_proxy(self, proxy_type=None, addr=None, port=None, rdns=True, username=None, password=None): | ||
| + """set_proxy(proxy_type, addr[, port[, rdns[, username[, password]]]]) | ||
| + Sets the proxy to be used. | ||
| + | ||
| + proxy_type - The type of the proxy to be used. Three types | ||
| + are supported: PROXY_TYPE_SOCKS4 (including socks4a), | ||
| + PROXY_TYPE_SOCKS5 and PROXY_TYPE_HTTP | ||
| + addr - The address of the server (IP or DNS). | ||
| + port - The port of the server. Defaults to 1080 for SOCKS | ||
| + servers and 8080 for HTTP proxy servers. | ||
| + rdns - Should DNS queries be performed on the remote side | ||
| + (rather than the local side). The default is True. | ||
| + Note: This has no effect with SOCKS4 servers. | ||
| + username - Username to authenticate with to the server. | ||
| + The default is no authentication. | ||
| + password - Password to authenticate with to the server. | ||
| + Only relevant when username is also provided. | ||
| + """ | ||
| + self.proxy = (proxy_type, addr.encode(), port, rdns, | ||
| + username.encode() if username else None, | ||
| + password.encode() if password else None) | ||
| + | ||
| + setproxy = set_proxy | ||
| + | ||
| + def get_proxy_sockname(self): | ||
| + """ | ||
| + Returns the bound IP address and port number at the proxy. | ||
| + """ | ||
| + return self.proxy_sockname | ||
| + | ||
| + getproxysockname = get_proxy_sockname | ||
| + | ||
| + def get_proxy_peername(self): | ||
| + """ | ||
| + Returns the IP and port number of the proxy. | ||
| + """ | ||
| + return _orig_socket.getpeername(self) | ||
| + | ||
| + getproxypeername = get_proxy_peername | ||
| + | ||
| + def get_peername(self): | ||
| + """ | ||
| + Returns the IP address and port number of the destination | ||
| + machine (note: get_proxy_peername returns the proxy) | ||
| + """ | ||
| + return self.proxy_peername | ||
| + | ||
| + getpeername = get_peername | ||
| + | ||
| + def _negotiate_SOCKS5(self, dest_addr, dest_port): | ||
| + """ | ||
| + Negotiates a connection through a SOCKS5 server. | ||
| + """ | ||
| + proxy_type, addr, port, rdns, username, password = self.proxy | ||
| + | ||
| + # First we'll send the authentication packages we support. | ||
| + if username and password: | ||
| + # The username/password details were supplied to the | ||
| + # set_proxy method so we support the USERNAME/PASSWORD | ||
| + # authentication (in addition to the standard none). | ||
| + self.sendall(b"\x05\x02\x00\x02") | ||
| + else: | ||
| + # No username/password were entered, therefore we | ||
| + # only support connections with no authentication. | ||
| + self.sendall(b"\x05\x01\x00") | ||
| + | ||
| + # We'll receive the server's response to determine which | ||
| + # method was selected | ||
| + chosen_auth = self._recvall(2) | ||
| + | ||
| + if chosen_auth[0:1] != b"\x05": | ||
| + # Note: string[i:i+1] is used because indexing of a bytestring | ||
| + # via bytestring[i] yields an integer in Python 3 | ||
| + self.close() | ||
| + raise GeneralProxyError("SOCKS5 proxy server sent invalid data") | ||
| + | ||
| + # Check the chosen authentication method | ||
| + | ||
| + if chosen_auth[1:2] == b"\x02": | ||
| + # Okay, we need to perform a basic username/password | ||
| + # authentication. | ||
| + self.sendall(b"\x01" + chr(len(username)).encode() | ||
| + + username | ||
| + + chr(len(password)).encode() | ||
| + + password) | ||
| + auth_status = self._recvall(2) | ||
| + if auth_status[0:1] != b"\x01": | ||
| + # Bad response | ||
| + self.close() | ||
| + raise GeneralProxyError("SOCKS5 proxy server sent invalid data") | ||
| + if auth_status[1:2] != b"\x00": | ||
| + # Authentication failed | ||
| + self.close() | ||
| + raise SOCKS5AuthError("SOCKS5 authentication failed") | ||
| + | ||
| + # Otherwise, authentication succeeded | ||
| + | ||
| + # No authentication is required if 0x00 | ||
| + elif chosen_auth[1:2] != b"\x00": | ||
| + # Reaching here is always bad | ||
| + self.close() | ||
| + if chosen_auth[1:2] == b"\xFF": | ||
| + raise SOCKS5AuthError("All offered SOCKS5 authentication methods were rejected") | ||
| + else: | ||
| + raise GeneralProxyError("SOCKS5 proxy server sent invalid data") | ||
| + | ||
| + # Now we can request the actual connection | ||
| + req = b"\x05\x01\x00" | ||
| + # If the given destination address is an IP address, we'll | ||
| + # use the IPv4 address request even if remote resolving was specified. | ||
| + try: | ||
| + addr_bytes = socket.inet_aton(dest_addr) | ||
| + req += b"\x01" + addr_bytes | ||
| + except socket.error: | ||
| + # Well it's not an IP number, so it's probably a DNS name. | ||
| + if rdns: | ||
| + # Resolve remotely | ||
| + addr_bytes = None | ||
| + req += b"\x03" + chr(len(dest_addr)).encode() + dest_addr.encode() | ||
| + else: | ||
| + # Resolve locally | ||
| + addr_bytes = socket.inet_aton(socket.gethostbyname(dest_addr)) | ||
| + req += b"\x01" + addr_bytes | ||
| + | ||
| + req += struct.pack(">H", dest_port) | ||
| + self.sendall(req) | ||
| + | ||
| + # Get the response | ||
| + resp = self._recvall(4) | ||
| + if resp[0:1] != b"\x05": | ||
| + self.close() | ||
| + raise GeneralProxyError("SOCKS5 proxy server sent invalid data") | ||
| + | ||
| + status = ord(resp[1:2]) | ||
| + if status != 0x00: | ||
| + # Connection failed: server returned an error | ||
| + self.close() | ||
| + error = SOCKS5_ERRORS.get(status, "Unknown error") | ||
| + raise SOCKS5Error("{:#04x}: {}".format(status, error)) | ||
| + | ||
| + # Get the bound address/port | ||
| + if resp[3:4] == b"\x01": | ||
| + bound_addr = self._recvall(4) | ||
| + elif resp[3:4] == b"\x03": | ||
| + resp += self.recv(1) | ||
| + bound_addr = self._recvall(ord(resp[4:5])) | ||
| + else: | ||
| + self.close() | ||
| + raise GeneralProxyError("SOCKS5 proxy server sent invalid data") | ||
| + | ||
| + bound_port = struct.unpack(">H", self._recvall(2))[0] | ||
| + self.proxy_sockname = bound_addr, bound_port | ||
| + if addr_bytes: | ||
| + self.proxy_peername = socket.inet_ntoa(addr_bytes), dest_port | ||
| + else: | ||
| + self.proxy_peername = dest_addr, dest_port | ||
| + | ||
| + def _negotiate_SOCKS4(self, dest_addr, dest_port): | ||
| + """ | ||
| + Negotiates a connection through a SOCKS4 server. | ||
| + """ | ||
| + proxy_type, addr, port, rdns, username, password = self.proxy | ||
| + | ||
| + # Check if the destination address provided is an IP address | ||
| + remote_resolve = False | ||
| + try: | ||
| + addr_bytes = socket.inet_aton(dest_addr) | ||
| + except socket.error: | ||
| + # It's a DNS name. Check where it should be resolved. | ||
| + if rdns: | ||
| + addr_bytes = b"\x00\x00\x00\x01" | ||
| + remote_resolve = True | ||
| + else: | ||
| + addr_bytes = socket.inet_aton(socket.gethostbyname(dest_addr)) | ||
| + | ||
| + # Construct the request packet | ||
| + req = struct.pack(">BBH", 0x04, 0x01, dest_port) + addr_bytes | ||
| + | ||
| + # The username parameter is considered userid for SOCKS4 | ||
| + if username: | ||
| + req += username | ||
| + req += b"\x00" | ||
| + | ||
| + # DNS name if remote resolving is required | ||
| + # NOTE: This is actually an extension to the SOCKS4 protocol | ||
| + # called SOCKS4A and may not be supported in all cases. | ||
| + if remote_resolve: | ||
| + req += dest_addr.encode() + b"\x00" | ||
| + self.sendall(req) | ||
| + | ||
| + # Get the response from the server | ||
| + resp = self._recvall(8) | ||
| + if resp[0:1] != b"\x00": | ||
| + # Bad data | ||
| + self.close() | ||
| + raise GeneralProxyError("SOCKS4 proxy server sent invalid data") | ||
| + | ||
| + status = ord(resp[1:2]) | ||
| + if status != 0x5A: | ||
| + # Connection failed: server returned an error | ||
| + self.close() | ||
| + error = SOCKS4_ERRORS.get(status, "Unknown error") | ||
| + raise SOCKS4Error("{:#04x}: {}".format(status, error)) | ||
| + | ||
| + # Get the bound address/port | ||
| + self.proxy_sockname = (socket.inet_ntoa(resp[4:]), struct.unpack(">H", resp[2:4])[0]) | ||
| + if remote_resolve: | ||
| + self.proxy_peername = socket.inet_ntoa(addr_bytes), dest_port | ||
| + else: | ||
| + self.proxy_peername = dest_addr, dest_port | ||
| + | ||
| + def _negotiate_HTTP(self, dest_addr, dest_port): | ||
| + """ | ||
| + Negotiates a connection through an HTTP server. | ||
| + NOTE: This currently only supports HTTP CONNECT-style proxies. | ||
| + """ | ||
| + proxy_type, addr, port, rdns, username, password = self.proxy | ||
| + | ||
| + # If we need to resolve locally, we do this now | ||
| + addr = dest_addr if rdns else socket.gethostbyname(dest_addr) | ||
| + | ||
| + self.sendall(b"CONNECT " + addr.encode() + b":" + str(dest_port).encode() + | ||
| + b" HTTP/1.1\r\n" + b"Host: " + dest_addr.encode() + b"\r\n\r\n") | ||
| + | ||
| + resp = self.recv(4096) | ||
| + while b"\r\n\r\n" not in resp and b"\n\n" not in resp: | ||
| + d = self.recv(4096) | ||
| + if not d: | ||
| + self.close() | ||
| + raise GeneralProxyError("Connection closed unexpectedly") | ||
| + resp += d | ||
| + | ||
| + # We just need the first line to check if the connection was successful | ||
| + status_line = resp.splitlines()[0].split(b" ", 2) | ||
| + | ||
| + if not status_line[0].startswith(b"HTTP/"): | ||
| + self.close() | ||
| + raise GeneralProxyError("Proxy server does not appear to be an HTTP proxy") | ||
| + | ||
| + try: | ||
| + status_code = int(status_line[1]) | ||
| + except ValueError: | ||
| + self.close() | ||
| + raise HTTPError("HTTP proxy server did not return a valid HTTP status") | ||
| + | ||
| + if status_code != 200: | ||
| + self.close() | ||
| + error = "{}: {}".format(status_code, status_line[2].decode()) | ||
| + if status_code in (400, 403, 405): | ||
| + # It's likely that the HTTP proxy server does not support the CONNECT tunneling method | ||
| + error += ("\n[*] Note: The HTTP proxy server may not be supported by PySocks" | ||
| + " (must be a CONNECT tunnel proxy)") | ||
| + raise HTTPError(error) | ||
| + | ||
| + self.proxy_sockname = (b"0.0.0.0", 0) | ||
| + self.proxy_peername = addr, dest_port | ||
| + | ||
| + def connect(self, dest_pair): | ||
| + """ | ||
| + Connects to the specified destination through a proxy. | ||
| + Uses the same API as socket's connect(). | ||
| + To select the proxy server, use set_proxy(). | ||
| + | ||
| + dest_pair - 2-tuple of (IP/hostname, port). | ||
| + """ | ||
| + proxy_type, proxy_addr, proxy_port, rdns, username, password = self.proxy | ||
| + dest_addr, dest_port = dest_pair | ||
| + | ||
| + # Do a minimal input check first | ||
| + if (not isinstance(dest_pair, (list, tuple)) | ||
| + or len(dest_pair) != 2 | ||
| + or not isinstance(dest_addr, type("")) | ||
| + or not isinstance(dest_port, int)): | ||
| + raise GeneralProxyError("Invalid destination-connection (host, port) pair") | ||
| + | ||
| + try: | ||
| + if proxy_type is None: | ||
| + _orig_socket.connect(self, (dest_addr, dest_port)) | ||
| + else: | ||
| + port = proxy_port or DEFAULT_PORTS.get(proxy_type) | ||
| + if not port: | ||
| + raise GeneralProxyError("Invalid proxy type") | ||
| + | ||
| + _orig_socket.connect(self, (proxy_addr, port)) | ||
| + | ||
| + if proxy_type == SOCKS5: | ||
| + self._negotiate_SOCKS5(dest_addr, dest_port) | ||
| + elif proxy_type == SOCKS4: | ||
| + self._negotiate_SOCKS4(dest_addr, dest_port) | ||
| + elif proxy_type == HTTP: | ||
| + self._negotiate_HTTP(dest_addr, dest_port) | ||
| + | ||
| + except socket.error as error: | ||
| + self.close() | ||
| + proxy_server = "{}:{}".format(proxy_addr.decode(), proxy_port) | ||
| + printable_type = PRINTABLE_PROXY_TYPES[proxy_type] | ||
| + errno, msg = error.args | ||
| + msg = "Error connecting to {} proxy {}: {}".format(printable_type, | ||
| + proxy_server, msg) | ||
| + raise socket.error(errno, msg) |
Here's the SOCKS proxy support patch.
Usage is identical to setting HTTP proxies:
ProxyManager("socks4://localhost:1080")orProxyManager("socks5://localhost:1080")(orproxy_from_url()). Default ports for both are 1080.I added new SOCKS Connection classes in
connection.py, and added a_is_socksattribute to the proxy URL that is passed toProxyManager(on theUrlobject). The_is_socksattribute is used by the connection pools so that HTTP proxy negotiation steps aren't taken when the proxy has a SOCKS scheme. This could possibly be refactored (separatesocks_proxyandhttp_proxyattributes?); tell me how you feel about the_is_sockssolution.I also touched up some other parts of the codebase, as follows, which I noticed while working through files and debugging.
__str__methods to__repr__.__str__automatically defers to__repr__if it isn't already defined, but the reverse isn't true. So this keeps the same behavior while also allowing easier debugging from the REPL.I believe this branch should currently work fine for regular usage; it seems to have no trouble with any SOCKS proxies.
However, I had many confusing issues and bugs while trying to modify the test suite. The tests are currently broken (I added
raise SkipTest()in the proxy tests, for the time being). I am using Twisted for the SOCKS4 and SOCKS5 proxy servers, and added that to the test requirements.The test issues seem to stem from the Tornado HTTP and HTTPS dummy servers. On my computer, they spat out odd SSL errors (like
SSLError: [Errno 1] _ssl.c:504: error:1411B072:SSL routines:SSL3_GET_NEW_SESSION_TICKET:bad message type
and many others), and would hang seemingly randomly.I upgraded the Tornado version in
test-requirements.txtbecause the version listed before appeared to have a bug that prevented me (or others) from running in IPv6 mode, see here: tornadoweb/tornado#593; I had to upgrade to get IPv6-related tests to pass. I'm not sure if the upgrade has anything to do with the current issues.I think it has something to do with trying to run Tornado in separate threads. I was not able to get multithreading to work with Twisted, so I spawned a new process for each SOCKS proxy server instead; Twisted seems to work fine, as I can connect to and use the proxy servers while the tests are running.
I would appreciate if someone could modify the tests so the HTTP servers work properly, or give me some advice on how to fix the issue. And of course, I'd appreciate any suggestions or changes for the SOCKS patch itself.