Fire LostPeer
as soon as network interface becomes unacessible
#578
Comments
I think unless the tcp stream actually times out we should not close the connection. Although it does cause issues like with ssh sessions so I see the issue here for sure. If we wait then it may appear very strange to a user. I think we may need an event |
It's not only an UI change. API should back off and not schedule/write new messages until connection is okay again. #cc: @Viv-Rajkumar, @afck, @canndrew, @madadam (what is your opinion on this matter?) |
It would be easy enough to use a shorter TCP timeout and fire a This Also, is this |
Not sure it's wise to do that. We'd be just making thing more difficult to users that experience temporary internet connectivity problems (downloading torrent, don't know).
Not sure it's the correct behavior here either. So if one network interface goes down, all peers are disconnected? What if I disable my virtual network interface that is only up when I start my LXC/container jail? What happens to nodes connected via ethernet if I disable the wifi?
Indeed. It is more complicated.
But this third state could just mean "hold on and do not send messages for a little bit, because it won't work now". It saves crust/operating-system to buffer several messages that will never be delivered. Also, if connection is really lost, the sent messages would be lost. With this change, the user can have the chance to know which messages would be lost if it tries to send the message before network is recovered. Anyway, I'm waiting for routing guys (@afck, @madadam) feedback to know what they're willing to use.
Why? This third state doesn't promise to reestablish connections. If network is recovered, but connections are not, the operating system itself will timeout every one of the remaining connections. If we're really concerned about knowing which messages were delivered, we'd have a design closer to asio where there is a callback fired when the message is written. |
I agree that a I'm not sure whether Routing should act upon it at all, apart from passing the message on to Clients/Vaults. (Maybe it should try reestablishing any connections lost during that time, instead of immediately considering the peers lost.) |
I wasn't suggesting that we should do those things, merely that we could. They seem to be along the lines of what's being suggested here. I'm not exactly sure what is being suggested here though. @vinipsmaker is this how you're imagining it?: If a tcp/utp connection disconnects then we fire a |
This is exactly what I'm proposing. It's the best thing I've come up with for now to please @Viv-Rajkumar initial request. |
So my current thoughts on this are: Fiddling with the TCP keepalive settings seems like a bad idea. It'll make connections unstable for people experiencing temporary connectivity problems. Also it's impossible to do with OSX or uTP so we'd have to implement one of the other options anyway. The other options are: send regular ping/pong packets on all connections and raise a warning to the upper layers if we stop getting replies. The downside to this is creates overhead as all connections will have a constant (although small) amount of activity on them. The other option is to only raise the warning when an interface goes down. The downside to this is it only works for one side of the connection; the other peer won't know that the connection is unusable. Thoughts? |
I like the ping-pong idea. Maybe we could implement an activity detection and only ping the channel if we haven't received other events. We could have a timeout of 2 seconds to trigger the ping-pong mechanism and a timeout of 10 seconds to trigger the warning to upper layers. It's worth noticing it's not mutually exclusive with the |
This crate had a major re-write/re-design. Please raise an issue again if required as almost everything in the library and the way it handled things have changed. Closing this. |
A certain behavior was observed on
osx
:When the wifi is turned off, you dont get a disconnect (
LostPeer
) raised on theosx
machine nor the peer it’s connecting to even if thats a different platform.For communication purposes (within this thread only), let
client
be the peer who turns off the wifi.server
will be the other end.TCP is reliable and connection-oriented and could be able to recover these connections if
client
re-enable its wifi soon enough. Forserver
, the only option to get theLostPeer
event will be to wait until timeout is reached. Forclient
, the network interfaces could be watched and notification could be made immediately. In crust, we wantclient
to be notified of connection loss immediately (this means watching network interfaces and ungracefully closing the TCP connection to avoid connection being restored by the OS).Related issue: maidsafe-archive/get_if_addrs#17
The text was updated successfully, but these errors were encountered: