New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rtunnel() context manager to open a reverse tunnel (server to client) #821
Conversation
I forgot to mention that I based this patch upon 1.5.0 rather than master, because that's what I'm using. I hope forward-porting isn't a problem since the new function is quite self-contained. |
@@ -454,6 +455,100 @@ def shell_env(**kw): | |||
return _setenv({'shell_env': kw}) | |||
|
|||
|
|||
@documented_contextmanager | |||
def rtunnel(rport, lport=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be better defined as something like:
def remote_tunnel(remote_port, local_port=None, remote_host=None, local_host='localhost'):
Specifically:
- Somewhat more readable/clean name
- More explicit arguments; having 'polymorphic' arguments that can take various values/meanings is fine (we use this approach frequently) but this feels like it goes a little too far, especially with 'lport' taking three sorts of strings, two of which don't even overlap.
- I would change from your current behavior such that the default
remote_host
is be the host part ofenv.host_string
(technicallyenv.host
but referring toenv.host_string
is always safer).- Speaking of: I'm not sure why you'd default the remote host to localhost? Maybe I'm missing a use case -- let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main problem I can see with the 4 arguments is that there is no right order for them, and it's tricky to remember whichever you pick up. For instance, the order when using the SSH command line is remotehost : remoteport : localhost : localport
.
In my implementation, lport
can be either non specified (in which case we assume localhost and the same port of the local one), or a host:port
string or a port
number. So it's three variations if you include the default value, otherwise it's just two: host:port
and port
. The implement looks complicated, but the usage is indeed quite simple, and people just need to remember to choose between host:port
and port
.
So I don't think it's overcomplicated; it's easier to use rather than to implement. Please think again of it. If you're still convinced otherwise, I'll change it.
The default for rhost
is localhost
because:
- OpenSSH servers by default do not allow a remote tunnel to bind to other interfaces than localhost. The main idea for it is that you can have a program on the remote side to open a connection to a demon in your local environment (either in your computer or your LAN); this client program will most likely run on the same computer you are logged in (the one you run deploy to), so it can happily connect to localhost. For instance, you might want to run a command to dump a copy of your local database (running on your laptop) and use it to fill up the remote staging DB.
- There is no other reasonable default. OpenSSH allows to specify either a IP bound to an interface (so that can't be a default) or
*
which means "bind to all interfaces" (equivalent to0:0:0:0
). Even if the SSH server was configured to allow binding to non-localhost interfaces, I think*
as a default is too aggressive and could cause security issues; you might be opening a port on a public global IP that connects back to your computer... I'm sure it is useful sometimes, but I wouldn't want it as an implicit default.
Can you please explain better the env.host_string
suggestion? I don't think I understand what you are aiming to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of stuff to go over here, bear with me :)
there is no right order for them
Correct, but in most cases I expect users to use keyword arguments for nontrivial APIs like this -- relying on argument order is and always will be fragile/hard to remember/etc, and one of the main benefits for using Python (IMO) over e.g. shell / whatever, is that we can use keyword arguments when invoking subroutines.
Your approach of moving to fewer arguments that have multiple meanings is certainly one way of combating the 'remembering order' problem, but IMO it's not the best one. Kwargs are :)
Another argument one can make in favor of your 'overloaded/compacted args' tactic is that it's less typing when used frequently -- except in this case I do not see setting up remote tunnels as a frequently-written code snippet. I'd expect maybe a handful of calls to this function in any given fabfile. Given that assumption, a clearer/more explicit API feels like the right tradeoff.
Finally, I could make an appeal to authority of a sort, and state that "multiple keyword arguments" is a much more commonly used Python idiom than "string arguments that effectively serialize multiple values" -- it's more Pythonic. (FWIW I severely regret making Fabric's own host string approach the primary API for that stuff -- it's become an enormous maintenance headache!)
and people just need to remember to choose between host:port and port.
That's one of the reasons I argue against this, actually -- I've dealt with a large userbase for a while now and users frequently get tripped up with this sort of thing :( Again, in some situations where the function is invoked frequently (or e.g. as a secondary/optional API on top of a more explicit one) I would accept that tradeoff. Here, I don't think the potential for confusion is worth the savings.
The default for lhost is localhost
Was that a typo or a misread? :) I was talking about the default of the remote host value, rhost
, not lhost
: link. A default local host value of localhost
seems quite reasonable and I wouldn't change that.
Can you please explain better the env.host_string suggestion? I don't think I understand what you are aiming to.
What I meant was that when users do not give an explicit remote host argument, it should default to what Fabric is currently targeting. Fabric's current target host is always stored in env.host_string
(it is the state value determining current target).
This way, one can invoke the function like so:
with remote_tunnel(port=8000):
# stuff
and it defaults to forwarding port 8000 from the current target host to port 8000 on localhost. But one can still override it to choose a different remote target, e.g.:
with remote_tunnel(port=8000, remote_host='dbserver'):
# stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P.S. in my recommended API, and despite what I said above about preferring explicit kwarg invocation, a "just the port" use is one spot where I could see users using just-an-arg:
with remote_tunnel(8000):
# stuff
because that's very clearly unambiguous in my opinion -- it says "Within this block, forward port 8000 from my target to my localhost". That is (AFAIK) a common use case for this kind of reverse port forwarding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I'll change to multiple arguments.
The default for lhost is localhost
Was that a typo or a misread? :) I was talking about the default of the remote host value, rhost, not lhost:
A typo (now fixed). The following paragraphs were correct and are correctly describing the rationale for using localhost
as default for rhost
.
and it defaults to forwarding port 8000 from the current target host to port 8000 on localhost. But one can still override it to choose a different remote target,
with remote_tunnel(port=8000, remote_host='dbserver'):
That wouldn't work. remote_host
is a misnomer, it should be remote_bind_address
or something like that. It has to be either an IP address assigned to a local interface, or a DNS name that resolves to such an IP (or *
or 0:0:0:0
to indicate all the interfaces). This is why I can't see how I can use env.host_string
as a default.
Speaking of this, local_host
as per your suggestion of parameter name is really confusingly similar to localhost
, but I can't find a better word, so I'll go with this for one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P.S. in my recommended API, and despite what I said above about preferring explicit kwarg invocation, a "just the port" use is one spot where I could see users using just-an-arg:
That's correct, but you would have a hard game convincing me that any of these are not clear (assuming the reader understands what a remote tunnel is in the first place):
remote_tunnel(8000)
remote_tunnel(8000, 4000)
remote_tunnel(8000, "dbserver:4000")
remote_tunnel(8000, "dbserver")
remote_tunnel("*:8000", 4000)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I totally got creamed by that misunderstanding, which is irritating as I've run into it before when using ssh -R
. Sorry about that :( I agree we should find a different name for it if possible. remote_bind_address
isn't so awful and I can't think of anything that's much shorter while not being more ambiguous.
Ditto on local_host
, given it's the same meaning (what IP/iface to bind on). So maybe we make them remote_bind_addr
and local_bind_addr
, and (since they're IMO less likely to need overriding than either port arg) turn it into e.g.:
def remote_tunnel(remote_port, local_port=None, remote_bind_addr=None, local_bind_addr=None):
# impl
Honestly I could see us naming the bind-addr args any of:
*_bind
: shortest but not super intuitive/readable*_bind_addr
as above: middle ground between length & ambiguity*_bind_address
: most explicit but certainly a bit long
I'll leave that decision up to you since I've been so opinionated on the rest :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify my previous post: I think the API I designed is easy to read on existing code. It might be harder to undertand in documentation and thus harder for writing code though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure, I don't mean to imply it's bad per se, it's certainly usable, I just feel the more explicit/Pythonic approach is preferable, especially given my experience supporting a wide range of users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we should find a different name for it if possible. remote_bind_address isn't so awful and I can't think of anything that's much shorter while not being more ambiguous.
remote_bind_address
is fine, IMO. I'll push it.
Ditto on local_host, given it's the same meaning (what IP/iface to bind on).
No, this can be any IP/DNS name that is resolvable by the computer Fabric is running on (including www.google.com
if you feel so). I think local_host
is fine in this case, since most of the time you want either 127.0.0.1
or a firewalled computer/server not publicly accessible outside of your local network.
Hi @rasky, thanks for this! I left some comments inline that I'd love to see addressed before this gets merged. It also needs a changelog entry, which should follow existing entries near the top of Finally -- new features should get based on master, so if you can rebase against the latest master (not 1.5, which is now a release branch) that would be awesome. I can handle that when I merge if required, though. |
One question: when I'm ready with the new version, do you prefer a clean patch to be force-pushed into this branch (so that you don't get the modification history merged), or you prefer to see and merge changes in addition to this patch? I know different projects have different conventions, this is why I'm asking. |
I have no clear preference re: force-push/rebase vs merge; ideally either one will look the same if I did e.g. |
Updated as per your review.
|
Yea, I hadn't realized it was using that many closure-scoped names; I agree reverting it is an acceptable tradeoff.
FWIW I don't usually require/recommend docstrings for
I like this idea, let's do that! Minor nitpick: wouldn't it make sense / be consistent to rename |
|
Oh, sorry -- I didn't see that reply originally because your commit 'hid' it in the UI. Oh github :) So I did have this backwards still, it's making a locally-visible resource available on the server, not vice versa. In that case, yes, the local argument is not a binding but is the "local host to forward a connection from" or whatever. So, it's fine staying OK, from here:
Thanks again/a ton for your patience with all of this discussion, really appreciate it! |
"Giovanni Bajo" in the changelog is fine. I didn't provide a diff because I didn't rebase the branch off master. Thanks for your work on fabric and... make sure to add forward tunnels, one day :) |
BTW, not sure if you noticed, but I completed the work |
Yup, just haven't had time to actually merge it yet. Thanks & sorry :) Will do so soon. |
Have a local integration branch:
However when I went to test it out again it's not working for me, could swear it was earlier in the process. Digging now. |
OK, the problems were mostly my fault. It is certainly possible to get unhelpful error messages out of this feature; e.g. I was first getting timeouts that were swallowed by the bare Then I selected a good (HTTP based) port, but was testing with Then tried |
I have merged this to master and pushed. Thanks again @rasky! I will put out Fab 1.6 by this weekend and this feature will be included. |
I got other problems with error reporting as well (see the note in the original pull request). Thanks for merging! |
Hello,
this pull request implements a rtunnel() context manager that allows to setup a remote forwarding tunnel (listening on the server side, tunneling over SSH and the forwarding to a locally-reachable host:port).
It implements half of issue #78 (the half I was interested in :). I've tested the code and am already using it in my deploy scripts.
Note: while working on this patch, some exceptions would trigger a weird error message:
No handlers could be found for logger "paramiko.transport"
with no information about the actual exception. I had to add this code to see the exceptions:This is probably an unrelated bug: fabric does not configure paramiko logger correctly. You can reproduce the error by adding a typo within the function
accept()
in my code.