I implemented issue 212:
Instead of resolving the host name in the mosh wrapper script and giving the mosh-client the resolved IPv4 address, I modified the script to pass down the host name, then changed the client to constantly re-resolve that host name when the connection is lost.
To do so, I modified Connection::hop_port() to make an asynchronous DNS request. The DNS requests are IPv4 only right now, but I used getaddrinfo, so changing it to support IPv6 should not be too hard.
Only GNU libc has an asynchronus getaddrinfo_a, so I had to use pthreads to make one.
Note also that I changed AC_CHECK_FUNCS to spit out an error message when a function isn't found, otherwise the missing getaddrinfo would just be ignored. I'm by far not an automake expert, so I'm not 100% sure this doesn't break anything.
I tested this patch on Linux, OS X and FreeBSD x86_64.
look for updated DNS when connection to server is lost
As explained in #81, there’s a tricky issue here that will require a bit more work. Some server hostnames (such as athena.dialup.mit.edu) are load-balanced by a round-robin DNS record where different entries point to different servers. If the hostname is resolved twice, once in the wrapper script and once inside mosh-client, then there’s no guarantee that the results will be the same, so mosh-client may try to connect to a different server than mosh-server is actually running on.
We really do need to resolve the IP address once and pass it along. It may work to also pass the hostname along for future re-resolution, as long as the client does not rely on it by throwing away the working IP address.
pass down hostname *and* IP from mosh wrapper to mosh-client, in orde…
…r to handle round-robin DNS
only hop to another IP when the old one isn't among ALL the addresses…
… returned by getaddrinfo (handle round robin DNS gracefully)
also keep trying the last known good IP address (because of round-rob…
Alright, I changed things to pass down IP and host name. When re-resolving, in case of a DNS response containing several IPs at once the code now only tries to switch IPs if none of those IPs match the old one. And instead of completely throwing away the previously working IP after a re-resolution, the code now flips between the new IP and the last one that got a server response.
The latest patch compiles fine on Ubuntu 12.04 both on x64 and Arm. As far as I can see it does not break anything, I will see tomorrow if it can reconnect from inside firewall.
The patch works nicely with server reachable from different ip inside/outside firewall.
Coming here from #81 but find the idea fascinating:
Could mosh-server add all local IPs to the list on the client, which does pollute the client list with rfc1918-IPs, but makes roaming between internal and external IPs possible, without adding them to DNS (which would break nearly all protocols other than mosh). If accepted, this could open the door to finally solve #81 by adding v6 addresses to the list in the future.
@m0yellow - that does not work in general as mosh client may hit another server from a different local private net with the same address. Compared with that DNS names are supposed to be unique.
I know, but this is the crux with IPs, they were intended to be unique, too.
Reading about the AS112 project, I would suggest we trust in the strength of the key and let the client hit another server, it would be one udp packet, and one RST, and then the election goes on, combined with a numbered priority based on clients connected on the server, it could still stay on the list as last resort.
BTW: I already have the problem, when the client dies, the mosh-server process runs indefinitely, and the client happily opening another session. So on servers with long uptime, I have dozen of mosh-server processes, all for the same user. So from the server side, it might even reduce the load, as more sessions could be recovered, instead of a new one created.
I filed #469 which should allow to implement whatever fancy server reconnection algorithm in a few lines of, say, Perl script. I may also allow to support IPv6
fix compiler warning with -Wno-missing-field-initializers
merge with upstream (IPv6 support)
The latest version of this doesn't compile with GCC or Clang on OS X 10.9. With GCC:
> gmake V=1
g++-apple-4.2 -DHAVE_CONFIG_H -I. -I../.. -I./../util -I./../crypto -I../protobufs -D_THREAD_SAFE -I/opt/local/include -Wall -fno-strict-overflow -D_FORTIFY_SOURCE=2 -fstack-protector-all -Wstack-protector --param ssp-buffer-size=1 -fPIE -fno-default-inline -pipe -g -O2 -MT network.o -MD -MP -MF .deps/network.Tpo -c -o network.o network.cc
network.cc: In member function 'Network::Connection::DNSResolverAsync::Status Network::Connection::DNSResolverAsync::try_start_stop(Network::AddrLen&)':
network.cc:271: error: invalid use of nonstatic data member 'Network::Connection::remote_addr'
gmake: *** [network.o] Error 1
> gmake V=1 CXX=clang++
clang++ -DHAVE_CONFIG_H -I. -I../.. -I./../util -I./../crypto -I../protobufs -D_THREAD_SAFE -I/opt/local/include -Wall -fno-strict-overflow -D_FORTIFY_SOURCE=2 -fstack-protector-all -Wstack-protector --param ssp-buffer-size=1 -fPIE -fno-default-inline -pipe -g -O2 -MT network.o -MD -MP -MF .deps/network.Tpo -c -o network.o network.cc
clang: warning: argument unused during compilation: '-fno-default-inline'
network.cc:271:53: error: use of non-static data member 'remote_addr' of 'Connection' from nested type 'DNSResolverAsync'
fatal_assert( result->ai_addrlen <= sizeof( remote_addr.addr ) );
./../util/fatal_assert.h:47:5: note: expanded from macro 'fatal_assert'
1 error generated.
gmake: *** [network.o] Error 1
I couldn't come up with a quick fix to this. 66673fd still compiles, and I'm just about to check it still works. If it does, I'll bisect 66673fd..ba1fb85 and play whack-a-bug.
Update: it works excellently! Took ~40 seconds to fail over, but compared to not failing over at all, that's very acceptable.
fix compile error with clang (referenced undefined variable in assert)
Looking at athena.dialup.mit.edu, I notice something interesting: That hostname is actually a short-lived, often-changing CNAME pointing to a long-lived A record, with an apparently stable name referring to a specific machine. So pretty clearly Athena is load-balancing via the CNAME. Seems to me that for this particular host, a nice answer would simply be to have mosh resolve through any CNAME/DNAME records to an actual A/AAAA record, then remember and reuse that hostname. I don't know how widely applicable this solution would be, though.
Were these records set up this way in 2013, or is this an improvement over whatever was in place then?
Ah, never mind. In #81 it's described as having been that way in 2012. Nothing new here.