Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/crypto/ssh: ssh connections and channels should be cancelable #21423

Open
glycerine opened this Issue Aug 13, 2017 · 14 comments

Comments

Projects
None yet
5 participants
@glycerine
Copy link

glycerine commented Aug 13, 2017

When using x/crypto/ssh in a long-running program for communicating with many nodes that may come and go, it is necessary to be able to shutdown started SSH connections and channels cleanly. Currently there are many goroutines leaked.

For example, https://github.com/golang/crypto/blob/master/ssh/connection.go#L79 the DiscardRequests function is launched in its own go routine for each new connection, and there is no way, currently, to shut it down. It thus leaks memory.

Moreover in order to make the internals of the (brilliant, fabulous) ssh library cancellable, each instance of a raw send and receive will need to be put with a select statement that includes a cancellation signal. I suggest implenting cancellation by adding context.CancelFunc and context.Context to the Config structure.

I've made these changes in a branch, here https://github.com/glycerine/crypto/tree/cancelable

These likely are not perfect or to anyone else's taste exactly, but feel free to mold to taste.

@hanwen this may be of interest.

@gopherbot gopherbot added this to the Unreleased milestone Aug 13, 2017

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Aug 14, 2017

@glycerine

This comment has been minimized.

Copy link
Author

glycerine commented Aug 15, 2017

In addition to the branch cited above, my latest working copy is here (https://github.com/glycerine/xcryptossh). Caveat: be aware that this is my current working branch, and as I work on it, it may or may not be build-able. Nonetheless, it is a better and later version (no races, no known leaks, context.Context threaded through the API so this is a non-backcompatible version). If you wish to pull from a repo, the glycerine/xcryptossh one should be preferred over the glycerine/crypto in general. Be aware that it may have debugging printfs in.

@nhooyr

This comment has been minimized.

Copy link
Contributor

nhooyr commented Aug 30, 2017

The example you cite is not a leaked goroutine because the reqs channel will be closed when the connection closes.

I think we should have deadlines on channel receives (not go channels).

As of now, each new session will call the go session.wait() function when the remote remote command/shell is run. This is to provide a exit status via the public session.Wait() function. But my issue is that even if I call session.Close() (whose sole purpose is to close the underlying channel), it's possible for the remote server to never respond to the channel being closed, causing the goroutine that called session.Wait() and the goroutine for go session.wait() to deadlock. Its definitely very unlikely, but possible.

Another possible issue is with channel.SendRequest, it could block if wantReply is true but the server decides to never send back a reply. Unlikely, but possible.

My two examples are special because the server could handle other channels fine, so TCP keep alive and any other lower level timeout would not work. Waiting for a channel response from the server should have deadlines to prevent blockages like this.

@glycerine

This comment has been minimized.

Copy link
Author

glycerine commented Aug 30, 2017

update: The idle timeouts work quite well. They are very solid. I returned briefly to an experiment with write deadlines, with the speculative aim of implementing net.Conn. Read deadlines were quickly done and worked fine. However, on the write side, I could not get write deadlines to work. It appears that SSH will happily Write() to an ssh.Channel and return nil error, even if there is never any corresponding Read() on the other end.

If someone smarter than me wants to try their hand at getting write deadlines to work, I invite you to try :) See the branch write_deadlines. https://github.com/glycerine/xcryptossh/tree/write_deadlines

In particular, run the TestSimpleWriteDeadline test in timeout_test.go, e.g.

$ go test -v -timeout 120m -race -run WriteDeadline
@glycerine

This comment has been minimized.

Copy link
Author

glycerine commented Aug 30, 2017

@nhooyr On master of https://github.com/glycerine/xcryptossh/ I shipped SetReadDeadline() as well as SetIdleTimeout(). Give them a try and see if they meet your needs. update: As I don't use sessions much, if you have tests for missing functionality, they would be welcome.

@hanwen

This comment has been minimized.

Copy link
Contributor

hanwen commented Aug 30, 2017

"It appears that SSH will happily Write() to an ssh.Channel and return nil error, even if there is never any corresponding Read() on the other end."

Did you expect anything else? In order to know that something is reading on the other end, you'd have to do a network round trip, which is expensive.

go SSH has a 2MB buffer (https://go.googlesource.com/crypto/+/81e90905daefcd6fd217b62423c0908922eadb30/ssh/channel.go#23) similar to OpenSSH.

@hanwen

This comment has been minimized.

Copy link
Contributor

hanwen commented Aug 30, 2017

I don't fully understand the bug.

If a ssh.Channel is closed, all the associated goroutines should exit. If a ssh.Conn is closed, all associated goroutines should exit too. If you find cases where this is not happen, please file bug reports about them.

Similar on close of the channel or connection, all reads/writes will be interrupted, please file bug reports if you find cases where this does not happen.

About cancelation, adding Context throughout will be an invasive API change, which I would rather combine with other API changes, and the net effect would be that the package calls conn.Close() or channel.Close() on timeout. You might as well do this in the piece of code that has access to the incoming context.

@hanwen

This comment has been minimized.

Copy link
Contributor

hanwen commented Aug 30, 2017

" if I call session.Close() (whose sole purpose is to close the underlying channel), it's possible for the remote server to never respond to the channel being closed"

the channel close should cause session.Wait to return. Can you file a bug if this is not the case?

@nhooyr

This comment has been minimized.

Copy link
Contributor

nhooyr commented Aug 30, 2017

@glycerine Nice work but it doesn't fit my needs. What I think needs to happen is that every single read from the channel should have a deadline with it so that it does not block forever, if the deadline expires before the read finishes, then the entire connection should be closed. I don't think a idle timeout is necessary if we go with what I described.

@hanwen filed #21699

@glycerine

This comment has been minimized.

Copy link
Author

glycerine commented Aug 30, 2017

every single read from the channel should have a deadline with it so that it does not block forever

exactly what SetIdleTimeout() does

if the deadline expires before the read finishes, then the entire connection should be closed.

trivial to add in your application logic when you receive a timeout error from a Channel read.

@nhooyr

This comment has been minimized.

Copy link
Contributor

nhooyr commented Aug 30, 2017

SetIdleTimeout() doesn't do that because it resets on writes.

@glycerine

This comment has been minimized.

Copy link
Author

glycerine commented Aug 30, 2017

SetIdleTimeout() doesn't do that because it resets on writes.

So if a write succeeds but a read doesn't, then that should be a timeout... hmm. maybe. after all, write success (today I learned) doesn't really mean anything.... thinking...

@glycerine

This comment has been minimized.

Copy link
Author

glycerine commented Aug 30, 2017

@nhooyr I added a parameter to SetIdleTimeout() for your scenario. It is in the latest, v3.0.1.

glycerine/xcryptossh@83b535b

The API now looks like:

SetIdleTimeout(dur time.Duration, writesBump bool) error

where writesBump can be set to false for your use case: writes will not bump the idle timer, and reads will timeout independently of write success. Test of the functionality is in writebump_test.go. Your use case should probably be the default. After more experience, we may find we don't need or want writesBump == true ever. But for now I've left that option.

Try it out and let me know if it works for you.

@nhooyr

This comment has been minimized.

Copy link
Contributor

nhooyr commented Sep 12, 2017

I appreciate you putting in the work in to create a prototype. Unfortunately I don't have the time to test it out right now. I'll try and take a look later.

Though, I'm 100% certain we don't need a writesBump bool.

I'm not certain whether SetIdleTimeout is the best API to expose for this. Will think about it and get back to you later. 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.