Add rtunnel() context manager to open a reverse tunnel (server to client) #821

Closed
wants to merge 7 commits into
from

Projects

None yet

2 participants

@rasky
Contributor
rasky commented Jan 13, 2013

Hello,

this pull request implements a rtunnel() context manager that allows to setup a remote forwarding tunnel (listening on the server side, tunneling over SSH and the forwarding to a locally-reachable host:port).

It implements half of issue #78 (the half I was interested in :). I've tested the code and am already using it in my deploy scripts.

Note: while working on this patch, some exceptions would trigger a weird error message: No handlers could be found for logger "paramiko.transport" with no information about the actual exception. I had to add this code to see the exceptions:

    import logging
    log = logging.getLogger("paramiko.transport")
    sh = logging.StreamHandler()
    sh.setLevel(logging.DEBUG)
    log.addHandler(sh)

This is probably an unrelated bug: fabric does not configure paramiko logger correctly. You can reproduce the error by adding a typo within the function accept() in my code.

@rasky
Contributor
rasky commented Jan 13, 2013

I forgot to mention that I based this patch upon 1.5.0 rather than master, because that's what I'm using. I hope forward-porting isn't a problem since the new function is quite self-contained.

@bitprophet bitprophet and 1 other commented on an outdated diff Jan 28, 2013
fabric/context_managers.py
@@ -454,6 +455,100 @@ def shell_env(**kw):
return _setenv({'shell_env': kw})
+@documented_contextmanager
+def rtunnel(rport, lport=None):
@bitprophet
bitprophet Jan 28, 2013 Member

I think this would be better defined as something like:

def remote_tunnel(remote_port, local_port=None, remote_host=None, local_host='localhost'):

Specifically:

  • Somewhat more readable/clean name
  • More explicit arguments; having 'polymorphic' arguments that can take various values/meanings is fine (we use this approach frequently) but this feels like it goes a little too far, especially with 'lport' taking three sorts of strings, two of which don't even overlap.
  • I would change from your current behavior such that the default remote_host is be the host part of env.host_string (technically env.host but referring to env.host_string is always safer).
    • Speaking of: I'm not sure why you'd default the remote host to localhost? Maybe I'm missing a use case -- let me know.
@rasky
rasky Jan 28, 2013 Contributor

The main problem I can see with the 4 arguments is that there is no right order for them, and it's tricky to remember whichever you pick up. For instance, the order when using the SSH command line is remotehost : remoteport : localhost : localport.

In my implementation, lport can be either non specified (in which case we assume localhost and the same port of the local one), or a host:port string or a port number. So it's three variations if you include the default value, otherwise it's just two: host:port and port. The implement looks complicated, but the usage is indeed quite simple, and people just need to remember to choose between host:port and port.

So I don't think it's overcomplicated; it's easier to use rather than to implement. Please think again of it. If you're still convinced otherwise, I'll change it.

The default for rhost is localhost because:

  • OpenSSH servers by default do not allow a remote tunnel to bind to other interfaces than localhost. The main idea for it is that you can have a program on the remote side to open a connection to a demon in your local environment (either in your computer or your LAN); this client program will most likely run on the same computer you are logged in (the one you run deploy to), so it can happily connect to localhost. For instance, you might want to run a command to dump a copy of your local database (running on your laptop) and use it to fill up the remote staging DB.
  • There is no other reasonable default. OpenSSH allows to specify either a IP bound to an interface (so that can't be a default) or * which means "bind to all interfaces" (equivalent to 0:0:0:0). Even if the SSH server was configured to allow binding to non-localhost interfaces, I think * as a default is too aggressive and could cause security issues; you might be opening a port on a public global IP that connects back to your computer... I'm sure it is useful sometimes, but I wouldn't want it as an implicit default.

Can you please explain better the env.host_string suggestion? I don't think I understand what you are aiming to.

@bitprophet
bitprophet Jan 28, 2013 Member

Lots of stuff to go over here, bear with me :)

there is no right order for them

Correct, but in most cases I expect users to use keyword arguments for nontrivial APIs like this -- relying on argument order is and always will be fragile/hard to remember/etc, and one of the main benefits for using Python (IMO) over e.g. shell / whatever, is that we can use keyword arguments when invoking subroutines.

Your approach of moving to fewer arguments that have multiple meanings is certainly one way of combating the 'remembering order' problem, but IMO it's not the best one. Kwargs are :)

Another argument one can make in favor of your 'overloaded/compacted args' tactic is that it's less typing when used frequently -- except in this case I do not see setting up remote tunnels as a frequently-written code snippet. I'd expect maybe a handful of calls to this function in any given fabfile. Given that assumption, a clearer/more explicit API feels like the right tradeoff.

Finally, I could make an appeal to authority of a sort, and state that "multiple keyword arguments" is a much more commonly used Python idiom than "string arguments that effectively serialize multiple values" -- it's more Pythonic. (FWIW I severely regret making Fabric's own host string approach the primary API for that stuff -- it's become an enormous maintenance headache!)

and people just need to remember to choose between host:port and port.

That's one of the reasons I argue against this, actually -- I've dealt with a large userbase for a while now and users frequently get tripped up with this sort of thing :( Again, in some situations where the function is invoked frequently (or e.g. as a secondary/optional API on top of a more explicit one) I would accept that tradeoff. Here, I don't think the potential for confusion is worth the savings.

The default for lhost is localhost

Was that a typo or a misread? :) I was talking about the default of the remote host value, rhost, not lhost: link. A default local host value of localhost seems quite reasonable and I wouldn't change that.

Can you please explain better the env.host_string suggestion? I don't think I understand what you are aiming to.

What I meant was that when users do not give an explicit remote host argument, it should default to what Fabric is currently targeting. Fabric's current target host is always stored in env.host_string (it is the state value determining current target).

This way, one can invoke the function like so:

with remote_tunnel(port=8000):
    # stuff

and it defaults to forwarding port 8000 from the current target host to port 8000 on localhost. But one can still override it to choose a different remote target, e.g.:

with remote_tunnel(port=8000, remote_host='dbserver'):
    # stuff
@bitprophet
bitprophet Jan 28, 2013 Member

P.S. in my recommended API, and despite what I said above about preferring explicit kwarg invocation, a "just the port" use is one spot where I could see users using just-an-arg:

with remote_tunnel(8000):
    # stuff

because that's very clearly unambiguous in my opinion -- it says "Within this block, forward port 8000 from my target to my localhost". That is (AFAIK) a common use case for this kind of reverse port forwarding.

@rasky
rasky Jan 28, 2013 Contributor

OK I'll change to multiple arguments.

The default for lhost is localhost

Was that a typo or a misread? :) I was talking about the default of the remote host value, rhost, not lhost:

A typo (now fixed). The following paragraphs were correct and are correctly describing the rationale for using localhost as default for rhost.

and it defaults to forwarding port 8000 from the current target host to port 8000 on localhost. But one can still override it to choose a different remote target,
with remote_tunnel(port=8000, remote_host='dbserver'):

That wouldn't work. remote_host is a misnomer, it should be remote_bind_address or something like that. It has to be either an IP address assigned to a local interface, or a DNS name that resolves to such an IP (or * or 0:0:0:0 to indicate all the interfaces). This is why I can't see how I can use env.host_string as a default.

Speaking of this, local_host as per your suggestion of parameter name is really confusingly similar to localhost, but I can't find a better word, so I'll go with this for one.

@rasky
rasky Jan 28, 2013 Contributor

P.S. in my recommended API, and despite what I said above about preferring explicit kwarg invocation, a "just the port" use is one spot where I could see users using just-an-arg:

That's correct, but you would have a hard game convincing me that any of these are not clear (assuming the reader understands what a remote tunnel is in the first place):

remote_tunnel(8000)
remote_tunnel(8000, 4000)
remote_tunnel(8000, "dbserver:4000")
remote_tunnel(8000, "dbserver")
remote_tunnel("*:8000", 4000)
@bitprophet
bitprophet Jan 28, 2013 Member

Yea, I totally got creamed by that misunderstanding, which is irritating as I've run into it before when using ssh -R. Sorry about that :( I agree we should find a different name for it if possible. remote_bind_address isn't so awful and I can't think of anything that's much shorter while not being more ambiguous.

Ditto on local_host, given it's the same meaning (what IP/iface to bind on). So maybe we make them remote_bind_addr and local_bind_addr, and (since they're IMO less likely to need overriding than either port arg) turn it into e.g.:

def remote_tunnel(remote_port, local_port=None, remote_bind_addr=None, local_bind_addr=None):
    # impl

Honestly I could see us naming the bind-addr args any of:

  • *_bind: shortest but not super intuitive/readable
  • *_bind_addr as above: middle ground between length & ambiguity
  • *_bind_address: most explicit but certainly a bit long

I'll leave that decision up to you since I've been so opinionated on the rest :)

@rasky
rasky Jan 28, 2013 Contributor

To clarify my previous post: I think the API I designed is easy to read on existing code. It might be harder to undertand in documentation and thus harder for writing code though.

@bitprophet
bitprophet Jan 28, 2013 Member

For sure, I don't mean to imply it's bad per se, it's certainly usable, I just feel the more explicit/Pythonic approach is preferable, especially given my experience supporting a wide range of users.

@rasky
rasky Jan 28, 2013 Contributor

I agree we should find a different name for it if possible. remote_bind_address isn't so awful and I can't think of anything that's much shorter while not being more ambiguous.

remote_bind_address is fine, IMO. I'll push it.

Ditto on local_host, given it's the same meaning (what IP/iface to bind on).

No, this can be any IP/DNS name that is resolvable by the computer Fabric is running on (including www.google.com if you feel so). I think local_host is fine in this case, since most of the time you want either 127.0.0.1 or a firewalled computer/server not publicly accessible outside of your local network.

@bitprophet bitprophet and 1 other commented on an outdated diff Jan 28, 2013
fabric/context_managers.py
+ lport = rport
+ elif isinstance(lport, basestring):
+ if ':' in lport:
+ lhost, lport = lport.split(':')
+ lport = int(lport)
+ else:
+ lhost = lport
+ lport = rport
+ else:
+ lhost = "localhost"
+
+ sockets = []
+ channels = []
+ threads = []
+
+ def forwarder(chan):
@bitprophet
bitprophet Jan 28, 2013 Member

I wonder if we can't rejigger this to be a top-level function which is then handed to ThreadHandler as a partially applied function, e.g. functools.partial(forwarder, sockets, host, port)? Again, inner functions are a tactic used elsewhere in Fabric but as I look forward to version 2 (wherein this sort of network code would ideally get reused, not rewritten) I'm hoping to get rid of this kind of closure abuse if possible.

@rasky
rasky Jan 28, 2013 Contributor

That's basically style, so up to you. My general opinion on the matter is that I prefer less global functions until there specifically is a place to reuse them, because IMO it makes code easier to follow (with a global function, you need to grep to see where and how many times is used).

This said, would you prefer a global function in this same module, or somewhere else?

@bitprophet
bitprophet Jan 28, 2013 Member

In this module is fine -- here, I want the functions to be global not so they can be accessed by client code, but just so that following the code is easier (it's more functional, less closure use, more thread/concurrency safe, etc). They're not so large or complex that they benefit from being moved elsewhere, anyways.

Also, I would probably suggest naming them with Python's private convention, e.g. def _forwarder(...), which makes it clear to client code these are not part of the public API of the context_managers module.

@bitprophet bitprophet commented on an outdated diff Jan 28, 2013
fabric/context_managers.py
+ while True:
+ r, w, x = select.select([sock, chan], [], [])
+ if sock in r:
+ data = sock.recv(1024)
+ if len(data) == 0:
+ break
+ chan.send(data)
+ if chan in r:
+ data = chan.recv(1024)
+ if len(data) == 0:
+ break
+ sock.send(data)
+ chan.close()
+ sock.close()
+
+ def accept(channel, (src_addr, src_port), (dest_addr, dest_port)):
@bitprophet
bitprophet Jan 28, 2013 Member

Ditto re: possibly moving this to a top level def accept(<existing args>, channels, threads) + use of functools.partial

@bitprophet bitprophet commented on an outdated diff Jan 28, 2013
fabric/context_managers.py
+ break
+ sock.send(data)
+ chan.close()
+ sock.close()
+
+ def accept(channel, (src_addr, src_port), (dest_addr, dest_port)):
+ channels.append(channel)
+ th = ThreadHandler('fwd', forwarder, channel)
+ threads.append(th)
+
+ transport = connections[env.host_string].get_transport()
+ transport.request_port_forward(rhost, rport, handler=accept)
+ try:
+ yield
+ finally:
+ for sock,chan,th in zip(sockets, channels, threads):
@bitprophet
bitprophet Jan 28, 2013 Member

PEP8 nitpick, should be for sock, chan, th (spaces).

@bitprophet
Member

Hi @rasky, thanks for this! I left some comments inline that I'd love to see addressed before this gets merged. It also needs a changelog entry, which should follow existing entries near the top of docs/changelog.rst and credit yourself (hopefully ;)).

Finally -- new features should get based on master, so if you can rebase against the latest master (not 1.5, which is now a release branch) that would be awesome. I can handle that when I merge if required, though.

@rasky
Contributor
rasky commented Jan 28, 2013

One question: when I'm ready with the new version, do you prefer a clean patch to be force-pushed into this branch (so that you don't get the modification history merged), or you prefer to see and merge changes in addition to this patch?

I know different projects have different conventions, this is why I'm asking.

@bitprophet
Member

I have no clear preference re: force-push/rebase vs merge; ideally either one will look the same if I did e.g. git diff master <your-branch>. I guess given how Github handles line notes / comments, merging is probably safer, as it will make sure we don't accidentally remove access to some of the back and forth above. So if that works for you, merging sounds best. Thanks for asking!!

@rasky
Contributor
rasky commented Jan 28, 2013

Updated as per your review.

_accept looks awful, though, with its 10 arguments. My suggestion is to revert back to a nested function that it much more readable. Writing a docstring would be a mess.

_forwarder is far better. In fact, if we move the socket connection code (and the print calls) into _accept, it wouldn't even require extraneous arguments like all_sockets and would become a reusable function to bidirectionally forward data between a Paramiko channel and a socket. Does it make sense to you to modify it this way?

@bitprophet
Member

_accept looks awful, though, with its 10 arguments. My suggestion is to revert back to a nested function that it much more readable.

Yea, I hadn't realized it was using that many closure-scoped names; I agree reverting it is an acceptable tradeoff.

Writing a docstring would be a mess.

FWIW I don't usually require/recommend docstrings for _privately_named functions as they're never going to be part of a documented/public API spec. (If it makes sense to, great, but if it would be a PITA as here, it's OK to just put the non-obvious gotchas into a comment or something and skip a "publicly worded" docstring.)

if we move the socket connection code (and the print calls) into _accept, it [...] would become a reusable function

I like this idea, let's do that!


Minor nitpick: wouldn't it make sense / be consistent to rename local_host to local_bind_address, since it's effectively the same sort of value as remote_bind_address? Or am I still thinking about this wrong?

@rasky
Contributor
rasky commented Jan 28, 2013

Minor nitpick: wouldn't it make sense / be consistent to rename local_host to local_bind_address, since it's effectively the same sort of value as remote_bind_address? Or am I still thinking about this wrong?

In fact, that's a real remote host (see my other reply on the matter), so the name makes sense IMO.

@bitprophet
Member

Oh, sorry -- I didn't see that reply originally because your commit 'hid' it in the UI. Oh github :) So I did have this backwards still, it's making a locally-visible resource available on the server, not vice versa. In that case, yes, the local argument is not a binding but is the "local host to forward a connection from" or whatever. So, it's fine staying local_host.

OK, from here:

  • I have some minor language nits to pick in the docstring but I'll edit them in after I merge (it's not efficient to have a back and forth about such things)
  • Still needs a changelog entry, but if I'm tweaking docs anyway I can add it -- just let me know how you want to be credited (I assume as "Giovanni Bajo" but some folks prefer their handle instead).
  • Unless I am missing a commit, it looks like _accept hasn't been reverted as we discussed -- are you still planning on doing that or is it good as-is? (Just don't want to merge/tweak until you feel you're 100% done on your end, is all.)

Thanks again/a ton for your patience with all of this discussion, really appreciate it!

@rasky
Contributor
rasky commented Jan 29, 2013

"Giovanni Bajo" in the changelog is fine. I didn't provide a diff because I didn't rebase the branch off master.

Thanks for your work on fabric and... make sure to add forward tunnels, one day :)

@rasky
Contributor
rasky commented Jan 30, 2013

BTW, not sure if you noticed, but I completed the work

@bitprophet
Member

Yup, just haven't had time to actually merge it yet. Thanks & sorry :) Will do so soon.

@bitprophet
Member

Have a local integration branch:

  • Rebased to latest master (was out of date)
  • Updated docstring
  • Added changelog entry

However when I went to test it out again it's not working for me, could swear it was earlier in the process. Digging now.

@bitprophet bitprophet added a commit that referenced this pull request Feb 15, 2013
@bitprophet bitprophet Changelog + docstring updates re #821 ed47dc6
@bitprophet
Member

OK, the problems were mostly my fault. It is certainly possible to get unhelpful error messages out of this feature; e.g. I was first getting timeouts that were swallowed by the bare except Exception clause -- changing that might be wise but we can change it in bugfix releases if necessary -- but that was because I chose an "open but not really listening" port.

Then I selected a good (HTTP based) port, but was testing with nc -z localhost <port> and that triggered a Bad file selector out of select.select() -- but after the program successfully exited.

Then tried wget instead of nz and that works just fine. So my guess is it had to do with how nc just "rudely" slaps the port and doesn't clean up properly, or something. If this starts hitting lots of users we can dig further but I am OK with this going out now as it's purely an optional, new feature, and does seem to work in a non contrived case.

@bitprophet
Member

I have merged this to master and pushed. Thanks again @rasky! I will put out Fab 1.6 by this weekend and this feature will be included.

@rasky
Contributor
rasky commented Feb 15, 2013

I got other problems with error reporting as well (see the note in the original pull request).

Thanks for merging!

@rasky rasky closed this Feb 15, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment