Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/crypto/ssh: ssh.Channel should provide idle timeouts #21420

Closed
glycerine opened this issue Aug 12, 2017 · 23 comments
Closed

x/crypto/ssh: ssh.Channel should provide idle timeouts #21420

glycerine opened this issue Aug 12, 2017 · 23 comments
Milestone

Comments

@glycerine
Copy link

@glycerine glycerine commented Aug 12, 2017

It would be incredibly useful for securing connections everywhere if ssh.Channel implemented the full net.Conn interace. The only missing methods are for time-outs based on setting read and write deadlines.

In addition, presently when using ssh.Channel, it is common to leak Goroutines. Without deadlines for read and write, those blocking Read/Write calls on ssh.Channel will never let the goroutine return. I have found that ssh.Channel.Close() from a separate goroutine will unblock Reads, but this is quite cumbersome and would still require every piece of code currently using net.Conn to need much additional logic and elaborate coordination just to have simple read and write deadlines.

A related but distinct api request: #20288

I inquired of Han-wen and over email and he suggesting filing a bug, hence this ticket.

Emerson suggested the context.Context support might be nice as well, but that's not in the net.Conn interface. However once implemented, the net.Conn interface would make adding context support easier.

@hanwen @Gh0u1L5

@gopherbot gopherbot added this to the Unreleased milestone Aug 12, 2017
@glycerine
Copy link
Author

@glycerine glycerine commented Aug 12, 2017

As an example of deadlock prone code, the channel receive <-ch.msg at line 322 of mux.go will deadlock when the remote side dies. If you ignore the deadlock, the reading goroutine leaks.

https://github.com/golang/crypto/blob/master/ssh/mux.go#L322

    switch msg := (<-ch.msg).(type) {  // deadlock happens here
    case *channelOpenConfirmMsg:
        return ch, nil
    case *channelOpenFailureMsg:
        return nil, &OpenChannelError{msg.Reason, msg.Message}
    default:
        return nil, fmt.Errorf("ssh: unexpected packet in response to channel open: %T", m\
sg)
    }

update: Here unfortunately there is no recourse to Close()-ing the OpenChannel call, since there is no channel yet. Even the top level ssh.Client is no help, for even if one Close()-es the ssh.Client on which the call was made, the deadlock at line 322 remains.

@glycerine
Copy link
Author

@glycerine glycerine commented Aug 26, 2017

Upon prototyping the net.Conn approach, I discovered that the deadline is too low level to be useful with io.Copy(), in paritcular for sending big files that need some time to complete. Instead I've implemented a high level idle-timeout facility in my branch
https://github.com/glycerine/xcryptossh See that README for discussion. I'll change the title here to reflect this discovery.

@glycerine glycerine changed the title x/crypto/ssh: ssh.Channel should implement net.Conn x/crypto/ssh: ssh.Channel should provide idle timeouts Aug 26, 2017
@nhooyr
Copy link
Contributor

@nhooyr nhooyr commented Aug 29, 2017

What you point out as a deadlock isn't really a deadlock if the remote side dies. That channel will be closed if the remote site dies, see https://github.com/golang/crypto/blob/81e90905daefcd6fd217b62423c0908922eadb30/ssh/mux.go#L194 .

However, I still think there should be idle timeouts, we shouldn't wait forever for the remote side to answer.

@nhooyr
Copy link
Contributor

@nhooyr nhooyr commented Aug 29, 2017

Just realized this issue doesn't really address my problem, I'll open a new issue.

edit: never mind, realized you already opened a very similar one so I added a comment there

@hanwen
Copy link
Contributor

@hanwen hanwen commented Aug 30, 2017

"It would be incredibly useful for securing connections everywhere if ssh.Channel implemented the full net.Conn interace. The only missing methods are for time-outs based on setting read and write deadlines."

what does 'securing' connections mean?

OpenChannel() is a global request, so if we elect to ignore the answer (if we think it takes too long), it will upset the state machine, meaning that you have close the whole ssh.Conn.

@glycerine
Copy link
Author

@glycerine glycerine commented Aug 30, 2017

what does 'securing' connections mean?

I meant that all the Go programs everywhere that use net.Conn to do TCP... could transparently change to use ssh. And those would be much more secure connections.

OpenChannel() is a global request, so if we elect to ignore the answer (if we think it takes too long), it will upset the state machine, meaning that you have close the whole ssh.Conn.

I don't see why the upset would be intrinsic. Is it just an artifact of the implementation status quo?

@hanwen
Copy link
Contributor

@hanwen hanwen commented Aug 30, 2017

let me rephrase. Suppose the timeout triggers, how does your program proceed?

It cannot try to open another channel, since the global request lock has been taken. (the global request lock is implied by the RFC)

@glycerine
Copy link
Author

@glycerine glycerine commented Aug 30, 2017

btw I love this library, and I'm so glad that you work on it. I don't mean to be critical at all. Ok, back to the conversation:

Suppose the timeout triggers, how does your program proceed?

Interesting. I had been assuming I as an application programmer would only care about timeouts on channels that had been successfully established. I assumed the channel open would succeed or fail fairly quickly one way or the other as it stands. Is that not a correct assumption? i.e. can OpenChannel() hang?

@hanwen
Copy link
Contributor

@hanwen hanwen commented Aug 30, 2017

my question is specifically about the hang in l.322 . Your assumption in the last comment is correct, but I was under the impression you wanted to have a way to set a timeout on that (ie. on the OpenChannel method). My response is that this mostly useless, if OpenChannel is stuck, the protocol is organized such that you have to throw away the whole connection rather than just the (pending) channel.

We could try to add timeouts to SSH to mimick TCP, but the timeouts would be on the ssh.Conn, not ssh.Channel, as there is usually no way for the channel to fail without the connection itself being broken too.

@glycerine
Copy link
Author

@glycerine glycerine commented Aug 30, 2017

usually no way for the channel to fail without the connection itself being broken too

So my use case actually sees this alot, basically because it is forwarding a stream from a third party. In a diagram:

 P3 -> P2 -> P1
        ^
        |
       P4

So the connection from P2 to P1 is fine, but when the connection from P3 to P2 dies, the channel that is forwarding over P2 to P1 ends up needing an idle timeout, because P3 is gone. And in the meantime P4 to P1 traffic can be fine, so the whole connection from P2 to P1 should be preserved.

I appreciate the discussion, but the long and short is that I've got this all working on my branch now; i.e. per Channel timeouts work fine (for reads, never got writes of course). So you're welcome to pull from it as you like to add features. And we can probably close these tickets.

@hanwen
Copy link
Contributor

@hanwen hanwen commented Sep 4, 2017

@glycerine - to me, your diagram is not an argument for adding timeouts to the Channel but rather that forwarding code should detect TCP timeouts and result in closing the Channel that provides the forwarded stream.

@glycerine
Copy link
Author

@glycerine glycerine commented Sep 4, 2017

forwarding code should detect TCP timeouts

a) nobody wants to rely upon TCP timeouts; they aren't always available or reliable, moreover a unix domain socket won't have them. And logically it doesn't work because the timeout isn't going to be channel specific. If you did get a timeout, which of many channels should it apply to? No way to tell. You could be forwarding a foward and the (inductively) the drop was before you, possibly many steps back.

b) The suggestion breaks the io.Writer abstraction. Imagine you are inside io.Copy: that would require that each Write() know the Read() that its source material came from. Ugh.

Currently my Channels implement net.Conn, which I am happy about.

@hanwen
Copy link
Contributor

@hanwen hanwen commented Sep 4, 2017

I still don't get it. If P3 -> P2 is gone, then the copy goroutine that forwards over p2 -> p1 will error out, presumably leading to the forwarding channel being closed. Why does that not work as expected?

@glycerine
Copy link
Author

@glycerine glycerine commented Sep 4, 2017

the copy goroutine that forwards over p2 -> p1 will error out

Hmm... how does that error out happen without a timeout on the Channel?

@hanwen
Copy link
Contributor

@hanwen hanwen commented Sep 4, 2017

Sorry, misphrased.

P2 is the remote server that runs some flavor of sshd. If the connection P3 -> P2 dies, then the server at P2 will notice this at some point. When that happens, it should close down the channel.

What sshd is on the other side?

@glycerine
Copy link
Author

@glycerine glycerine commented Sep 4, 2017

I thought we were discussing is the converse... Can one specific Channel need a timeout while the underlying Connection is okay? I provided one specific example, but the case is more general, and its not necessarily that an SSH server is involved at all upstream.

Channel timeouts handle more general "lack of response" as well as full connection drop somewhere. i.e. someone's goroutine is hung due to logic error, waiting on user input, deadlock, livelock, a long DNS resolve, etc.

@hanwen
Copy link
Contributor

@hanwen hanwen commented Sep 4, 2017

I guess we have an impedance mismatch. I'm trying to understand problems that you have that require changes to the library to be solved.

Is there a reason you cannot use a wrapper around ssh.Channel to mimick a net.Conn?

@glycerine
Copy link
Author

@glycerine glycerine commented Sep 4, 2017

Is there a reason you cannot use a wrapper around ssh.Channel to mimick a net.Conn?

Several. I wouldn't have forked otherwise, of course. Notice the changes in xcryptossh for details, but in summary: one needs notification or a callback at the start of every Read, every Write, every packet receive, and during continuous operation. And more importantly, one must be able to interrupt the sync.Cond based blocks without closing the connection, e.g. buffer.go (https://github.com/glycerine/xcryptossh/blob/master/buffer.go#L69) and https://github.com/glycerine/xcryptossh/blob/master/channel.go#L504 ).

@hanwen
Copy link
Contributor

@hanwen hanwen commented Sep 5, 2017

I think you're making it too complicated. See https://play.golang.org/p/o_d3T-3irc for an idea.

@glycerine
Copy link
Author

@glycerine glycerine commented Sep 5, 2017

That's a deadline, not an idle timeout. Surely by now you've read the discussion above and at xcyptossh for why that's insufficient. Also refer to the title of this ticket.

@hanwen
Copy link
Contributor

@hanwen hanwen commented Sep 5, 2017

Do you want functionality that is like

// Read until buf is full or timeout has passed, whichever comes first
ReadWithTimeout(buf []byte, timeout time.Duration)

sorry for being dense, but I still haven't fully grasped what you need.

@hanwen
Copy link
Contributor

@hanwen hanwen commented Sep 5, 2017

(with the function implying that the channel stays alive if the timeout is hit.)

@glycerine
Copy link
Author

@glycerine glycerine commented Sep 5, 2017

So I'm going to close this out. It's been a useful and informative discussion. I'm happy with my https://github.com/glycerine/xcryptossh fork and continue to experiment with it, and it makes sense to me that x/crypto/ssh should stay simpler and with a fully backwards compatible API.

Do you want functionality that is like

The current API looks like this https://github.com/glycerine/xcryptossh/blob/master/channel.go#L94 (lines 94-164). I'm still working on the application layer that uses it, so it may evolve further. Status() is rather new, and I've currently exposed the IdleTimers as a part of the API because I was thinking that was needed. However I'm refactoring at the moment to split the application into two layers, a lower layer that simply maintains ssh.Connections, reconnecting whenever needed, and an upper layer that only deals in ssh.Channel.

@glycerine glycerine closed this Sep 5, 2017
@golang golang locked and limited conversation to collaborators Sep 5, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.