Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Unrecoverable disconnects when using NetworkManager+OpenVPN #8
I'm not sure if I'm the only one with this issue, but when using NetworkManager + OpenVPN and the configuration files provided by this package, I get regular disconnects every few hours after which OpenVPN doesn't manage to reconnect.
I suspect it is because NetworkManager thinks the VPN tunnel device is still connected and when OpenVPN attempts to reconnect to the VPN server it fails. This can happen for two reasons:
I am pretty sure this is a NetworkManager-only issue. It might be worth considering using the pia Python script to automatically remove or comment out the "persist-tun" setting in case NetworkManager is used on the system it is running on.
To elaborate, "persist-tun" does the following:
However closing and reopening the VPN tunnel would almost certainly work around this issue in NetworkManager. An alternative would be some custom scripts which monitors the log and is somehow able to tell NetworkManager over DBUS to drop the tunnel connection, although I'm not sure if this is really possible.
Alternatively, one might consider not using "dev tun" at all and let NetworkManager+OpenVPN configure the routing accordingly without using the VPN tunnel device (I haven't tested this at all!).
From what I can see, it may or may not be a NM problem. I haven't used the VPN for a very long 12hr+ session. I could just add something in the configuration file to allow toggling it on or off if people have odd drop out issues. I wouldn't want to change the way PIA is setup to work out-of-the-box without user interactions or something seriously wrong with the configurations--which, in that case, information would need to be provided to PIA.
This sounds exactly like what I had in mind. I'll try the modified configuration files for a bit on my system now and let you know if it improves the situation.
The issue still exists:
At a loss at what to do, honestly. The output changed slightly: it at least attempts to do the handshake now, but because all traffic is still being routed over tun0 (which is down) it won't ever succeed.
I can't believe I am the only one with this issue, especially because my setup isn't really special, however I have a hard time finding similar bug reports. This whole thing would actually fail even a step before all this, if I weren't using a local DNS resolver (Unbound) because NetworkManager would attempt to resolve the VPN domain in the configuration file using the broken tunnel...
EDIT: Regarding the DNS resolution, adding this:
to the client configuration should help, however the other issue (actually establishing a connection to the VPN IP still exists).
EDIT 2: I created an issue with NetworkManager here, which goes into a bit more detail.
EDIT 3: After some more investigation NetworkManager already adds a direct route to the VPN IP via the default gateway:
The only thing that I can imagine happening right now that would prevent reconnecting to the server, is that the DNS resolver returns a different A-record for the VPN domain at the time of reconnection, which does not have an extra route configured -- thus it would try to connect to it over the VPN tunnel instead of the previous default gateway.
If this is true, adding
No, I didn't get a chance to test it yet after adding
This happens because the configured DNS resolver returns any of the specified A-records for the VPN domain randomly for rudimentary load-balancing. This means that it isn't guaranteed that the IP in the added static route will be used to reconnect to the VPN -- a different one might be chosen (for which no such route is specified) which will be attempted to be routed through the broken VPN tunnel.
So specifying a static IP in the configuration file is one solution, a different one is specifying
For now I'll try using
And this is after I added
This works if OpenVPN is used from the command-line, since the script is properly executed. When this configuration file is used through NetworkManager-OpenVPN the script instructions are completely ignored by the way! So now I have to find a different way to make this work with NetworkManager.
Slowly getting sick of this...
Alright. So dispatcher scripts for NetworkManager get a fraction of the information OpenVPN scripts get through the environment variables, which makes running this script basically impossible.
When using OpenVPN directly, only a single initially resolved IP address is added to the routes. I am not sure in how far using OpenVPN directly changes things here (maybe
What I ended up doing was using the parts from here prefixed with
I am going to try running NetworkManager+OpenVPN for a while now once again and see if it works. But yes, this definitely is a bug with NetworkManager+OpenVPN as far as I can tell and upstream just doesn't seem to care -- if it wasn't, I'm sure OpenVPN would have long since fixed it since it's probably getting used directly much more.
Sorry for all this spam, I just needed to vent my frustrations a bit and explain the process of me figuring this out.
I ended up writing fix-networkmanager-openvpn, which actually worked -- however due to networkmanager-openvpn dropping privileges, it later can't modify the tunnel anymore when it is needed. I might be able to work around this by allowing the
Let it be noted that I think the OpenVPN plugin for NetworkManager is horrible.
I solved a part of the problem, but discovered another problem with NetworkManager-OpenVPN in the process: fix-networkmanager-openvpn is started as a systemd service on boot and monitors the nm-openvpn journal. As soon as an IP address for a remote link (i.e. the chosen IP address of the VPN server) appears, a route is added that directs all traffic for this IP over the default gateway. The script is basically doing what the OpenVPN plugin should have done itself.
However, after having fixed this and the connection finally being established, NetworkManager-OpenVPN runs into a problem because it downgraded its UID and GID after creating the tunnel, preventing read/write access to
I "solved" the problem by not using the honestly bad and unmaintained plugin anymore, and instead simply enabled OpenVPN at boot using systemd:
Turns out using
I adapted the last script to do this here. Alternatively adding
Here is an excerpt from a reddit post I made concerning this obvious problems:
If the TTL of the VPNs domain is low (it usually is ~300 seconds) and you don't attempt to re-resolve the domain in these intervals to keep it cached, [openvpn] will still fail to resolve the domain to an address because your DNS resolver invalidated it and subsequently attempt to resolve it by contacting external servers (root DNS server or whatever forward-zones you configured).
To work around this I added this to my unbound configuration:
Which forces to cache all domain A-records for at least 1 hour and to recache them when the domain is attempted to be resolved with less than 10% of the TTL remaining (360 seconds or 6 minutes in this case). In addition to this, I run these script to keep my DNS cache "warmed up":
I didn't make these just for this problem, but because I wanted to always resolve domains as fast as possible -- however they are useful to work around this problem. You can see in the service file that the domains in a file
So the configured
I have since removed the heavy plumbum dependency from the script. I also made sure, that my current OpenVPN configuration really only works after disconnects with it running: it does -- it's not a coincidence.
You can see exactly what is happening here. My ISP regularly disconnects my connections every few hours, this is what you see here, resulting in the inactivity timeout. It then attempts to reconnect, but fails with the TLS error below it, because the IP can't be reached -- no surprise there: at the beginning it was 188.8.131.52 and then 184.108.40.206: there is no explicit route for the new IP forwarding traffic through the default gateway.
You can then see another attempt at connecting, failing with the same error. At this point I manually run
So the problem seems obvious: the route is added too late. It would have to be added before the connection attempt is made, not afterwards. This problem only occurs if you actually use a domain with several A-records, i.e. multiple IP addresses, in your VPN configuration; when a static IP is used you will never run into this issue -- which is the only way I can explain why this bug still exists or exists in the first place. I would wager this is the source of most problem descriptions you can find on the internet that claim that OpenVPN doesn't automatically reconnect after a disconnect.
The script also theoretically works with NetworkManager:
This, by the way, remove all routes added by OpenVPN from the routing table and allows all subsequent traffic through the previous default gateway -- a huge security issue if you haven't set up a firewall limiting traffic only to the VPN when it is active. This is why I just recently claimed to you, that "NetworkManager is still shit"; however there probably is some way to prevent NetworkManager from dropping root privileges or re-acquiring them which I haven't looked into as of yet. The root problem, requiring