-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor and add socket activity check for internet #312
Conversation
Deprecation warnings are emitted for anyone using the old locations. They'll still work. The reason for moving the code out of interface is that connectivity checking is becoming complicated and will have some additions soon. It feels easier to manage outside of the already fairly complicated interface handling code.
This also simplifies what happens when the connectivity check fails. All errors are handled the same and there are no shortcuts to `:lan` even if it looks like the interface is removed. The expectation is that the interface will be properly removed shortly if it's for real and everything else is suspect. Moving the check handling logic into its own module is intended to make it easier to adjust it and to simplify addng the Inspector check to the InternetChecker.
FYI - I need to fix some warnings and test failures. Hopefully we'll have green lights soon. I've been verifying this on test devices. It's a big update, so depending on the comments, I'm ready to break it up into smaller chunks to help reviews if needed. |
72ee7cd
to
67475f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This is a new module that can determine whether a network interface is connected to the internet by watching traffic on TCP sockets known to Erlang/OTP. If a device is using long-lived TCP connections (like for MQTT) that send keepalive messages, then it can declare internet-connectivity just by watching to the tx and rx counters on the long-lived connection. This helps reduce use of metered connections and benefits devices behind restrictive firewalls that drop the pings. This commit only adds the module. It is hooked up in a subsequent commit.
This majorly reduces internet traffic for checking status on those network interfaces with long-lived TCP connections. Basically, if a TCP connection on the interface that's to connecting to a host on a different LAN has sent and received bytes since the last internet check, then the check is skipped. Internet is available. AWS IoT and NervesHub connections are sufficient since their keepalive messages will satisfy this check.
The connectivity checker is pickier now on whether the interface is up or down so the unit tests can't take as many shortcuts. This also fixes an issue where `ifup` would be called and then a connectivity check done immediately afterwards. The intended behavior is for the CheckLogic to decide rather than this to be assumed. The result of this is that the check will be done 500 ms later. This seems good since it's short and 500 ms allows for a little time for the interface to settle after it just declared that it was up.
@jjcarstens confirmed that this makes a very tightly firewalled device report internet connectivity now. |
This commit series deals with improving internet connectivity checking. This is
not the last PR on this topic, but it's an improvement to where VintageNet has
been on this topic.
Internet connectivity checking is important for multi-homed devices so that the
routing tables can be set up to prefer faster and cheaper interfaces. When those
interfaces lose internet connectivity, the routing tables should be updated so
reconnects go to the new best interface.
The first two commits move code around so that it's easier to improve internet
connectivity checking. All of the feedback on connectivity checking has been
nontrivial enough that mixing it in with the general network interface code is
just too much. The "ping" code is now called
TCPPing
to accurately reflectwhat it does. Moving the "3-strikes" ping failure state machine to its own
module was also needed since it was complicated enough that I was worried about
messing it up accidentally. Now the state machine is completely pure code and
easier to test.
The third and fourth commits add a socket inspector. The socket inspector looks
at TCP connections going through the network interface of interest and going
somewhere off-LAN. If the tx and rx counters on those sockets both go up between
checks, then there was internet connectivity over the past sampling period. This
means that a ping of an external host can be skipped. The keepalive messages on
MQTT and websocket connections will satisfy this check. This has two advantages:
less bandwidth usage and it works on tightly firewalled installs.
The final commit makes the internet connectivity checker try the socket
inspector first and if that fails, it tries to connect to an external host like
before.
These updates are backwards compatible, but they'll log a warning to update code
that refers to
VintageNet.Interface.InternetConnectivityChecker
toVintageNet.Connectivity.InternetChecker
.