look for updated DNS when connection to server is lost #455

Open
wants to merge 7 commits into
from

Projects

None yet

6 participants

@maltek

I implemented issue 212:
Instead of resolving the host name in the mosh wrapper script and giving the mosh-client the resolved IPv4 address, I modified the script to pass down the host name, then changed the client to constantly re-resolve that host name when the connection is lost.

To do so, I modified Connection::hop_port() to make an asynchronous DNS request. The DNS requests are IPv4 only right now, but I used getaddrinfo, so changing it to support IPv6 should not be too hard.

Only GNU libc has an asynchronus getaddrinfo_a, so I had to use pthreads to make one.

Note also that I changed AC_CHECK_FUNCS to spit out an error message when a function isn't found, otherwise the missing getaddrinfo would just be ignored. I'm by far not an automake expert, so I'm not 100% sure this doesn't break anything.

I tested this patch on Linux, OS X and FreeBSD x86_64.

@andersk
Mosh (mobile shell) member

As explained in #81, there’s a tricky issue here that will require a bit more work. Some server hostnames (such as athena.dialup.mit.edu) are load-balanced by a round-robin DNS record where different entries point to different servers. If the hostname is resolved twice, once in the wrapper script and once inside mosh-client, then there’s no guarantee that the results will be the same, so mosh-client may try to connect to a different server than mosh-server is actually running on.

We really do need to resolve the IP address once and pass it along. It may work to also pass the hostname along for future re-resolution, as long as the client does not rely on it by throwing away the working IP address.

Malte Kraus added some commits Aug 25, 2013
Malte Kraus pass down hostname *and* IP from mosh wrapper to mosh-client, in orde…
…r to handle round-robin DNS
9bf8ffc
Malte Kraus only hop to another IP when the old one isn't among ALL the addresses…
… returned by getaddrinfo (handle round robin DNS gracefully)
b9e4daf
Malte Kraus also keep trying the last known good IP address (because of round-rob…
…in DNS)
66673fd
@maltek

Alright, I changed things to pass down IP and host name. When re-resolving, in case of a DNS response containing several IPs at once the code now only tries to switch IPs if none of those IPs match the old one. And instead of completely throwing away the previously working IP after a re-resolution, the code now flips between the new IP and the last one that got a server response.

@ibukanov

The latest patch compiles fine on Ubuntu 12.04 both on x64 and Arm. As far as I can see it does not break anything, I will see tomorrow if it can reconnect from inside firewall.

@ibukanov

The patch works nicely with server reachable from different ip inside/outside firewall.

@payco payco referenced this pull request Oct 22, 2013
Open

Missing IPv6-support #81

@m0yellow

Coming here from #81 but find the idea fascinating:

Could mosh-server add all local IPs to the list on the client, which does pollute the client list with rfc1918-IPs, but makes roaming between internal and external IPs possible, without adding them to DNS (which would break nearly all protocols other than mosh). If accepted, this could open the door to finally solve #81 by adding v6 addresses to the list in the future.

@ibukanov

@m0yellow - that does not work in general as mosh client may hit another server from a different local private net with the same address. Compared with that DNS names are supposed to be unique.

@m0yellow

I know, but this is the crux with IPs, they were intended to be unique, too.
Reading about the AS112 project, I would suggest we trust in the strength of the key and let the client hit another server, it would be one udp packet, and one RST, and then the election goes on, combined with a numbered priority based on clients connected on the server, it could still stay on the list as last resort.

BTW: I already have the problem, when the client dies, the mosh-server process runs indefinitely, and the client happily opening another session. So on servers with long uptime, I have dozen of mosh-server processes, all for the same user. So from the server side, it might even reduce the load, as more sessions could be recovered, instead of a new one created.

@ibukanov

I filed #469 which should allow to implement whatever fancy server reconnection algorithm in a few lines of, say, Perl script. I may also allow to support IPv6

@jashank

The latest version of this doesn't compile with GCC or Clang on OS X 10.9. With GCC:

> gmake V=1
g++-apple-4.2 -DHAVE_CONFIG_H -I. -I../..  -I./../util -I./../crypto -I../protobufs -D_THREAD_SAFE -I/opt/local/include   -Wall  -fno-strict-overflow -D_FORTIFY_SOURCE=2 -fstack-protector-all -Wstack-protector --param ssp-buffer-size=1 -fPIE -fno-default-inline -pipe -g -O2 -MT network.o -MD -MP -MF .deps/network.Tpo -c -o network.o network.cc
network.cc: In member function 'Network::Connection::DNSResolverAsync::Status     Network::Connection::DNSResolverAsync::try_start_stop(Network::AddrLen&)':
network.cc:271: error: invalid use of nonstatic data member 'Network::Connection::remote_addr'
gmake: *** [network.o] Error 1

With clang:

> gmake V=1 CXX=clang++
clang++ -DHAVE_CONFIG_H -I. -I../..  -I./../util -I./../crypto -I../protobufs -D_THREAD_SAFE -I/opt/local/include   -Wall  -fno-strict-overflow -D_FORTIFY_SOURCE=2 -fstack-protector-all -Wstack-protector --param ssp-buffer-size=1 -fPIE -fno-default-inline -pipe -g -O2 -MT network.o -MD -MP -MF .deps/network.Tpo -c -o network.o network.cc
clang: warning: argument unused during compilation: '-fno-default-inline'
network.cc:271:53: error: use of non-static data member 'remote_addr' of 'Connection' from nested type 'DNSResolverAsync'
        fatal_assert( result->ai_addrlen <= sizeof( remote_addr.addr ) );
                                                    ^~~~~~~~~~~
./../util/fatal_assert.h:47:5: note: expanded from macro 'fatal_assert'
  ((expr)                                                               \
    ^
1 error generated.
gmake: *** [network.o] Error 1

I couldn't come up with a quick fix to this. 66673fd still compiles, and I'm just about to check it still works. If it does, I'll bisect 66673fd..ba1fb85 and play whack-a-bug.

Update: it works excellently! Took ~40 seconds to fail over, but compared to not failing over at all, that's very acceptable.

@cgull
Mosh (mobile shell) member

Looking at athena.dialup.mit.edu, I notice something interesting: That hostname is actually a short-lived, often-changing CNAME pointing to a long-lived A record, with an apparently stable name referring to a specific machine. So pretty clearly Athena is load-balancing via the CNAME. Seems to me that for this particular host, a nice answer would simply be to have mosh resolve through any CNAME/DNAME records to an actual A/AAAA record, then remember and reuse that hostname. I don't know how widely applicable this solution would be, though.

Were these records set up this way in 2013, or is this an improvement over whatever was in place then?

[edit]

Ah, never mind. In #81 it's described as having been that way in 2012. Nothing new here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment