Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] OpenVPN - unable to contact Daemon / device busy #1931

Closed
ghost opened this issue Nov 14, 2017 · 16 comments
Closed

[Bug] OpenVPN - unable to contact Daemon / device busy #1931

ghost opened this issue Nov 14, 2017 · 16 comments
Assignees
Labels
cleanup Low impact changes
Milestone

Comments

@ghost
Copy link

ghost commented Nov 14, 2017

Hey guys :)

This ticket is created for @fichtner based on my thread here:

https://forum.opnsense.org/index.php?topic=6376.0

Priority: low
(I don't think this problem will appear often to anyone, since it requires VPN/WAN Interface flapping to occur)
Impact: low
(One VPN interface seems to surive the flapping, while all the other VPN daemons are unable to reconnect)

My Setup:

  • 3 Client OpenVPN tunnels configured to my VPN Provider. (I have no control over the VPN Server)
  • All three Gateways are made a Failover Group with "member down" as failover option
  • Every VPN Connection has it's own virt. private subnet and Gateway which it gets "dynamically"
  • Huge ISP Latency spike issues leading to member down events every few seconds from time to time.

Questions from Franco:

Are you sure all instances are up?

Every virt. private gateway was up. Sadly i can't tell if my tier1 was working when this happened, since the physical endpoint of my VPN Routing is incomprehensible (Neural Network). Only my tier2 Connection was up as far as the GUI told me. Another weird thing was that the connection status told me that my outgoing connection is only using ~15 KB, when in fact i could browse and watch Youtube perfectly. I even checked if any IP-Leak occured, to find out if my WAN Connection was routing the traffic instead, which it didn't.

Is his a problem of clients particularly or servers and clients alike?

I can't reliable answer that question, since i only use the OpenVPN Client configuration to get my Traffic out into the world. So from my Point of view, yes this is only happening for clients. :-P

[ . . . ] or the old process is still running by the time the new instance is being brought up.

That's my guess aswell. Everytime when my ISP is having latency or packet loss connection problems, all my VPN connections are reloading many many times or staying "offline".


Just to make things clear and to explain i had to check a few options so that my setup was working:
#1912 >> Reason why i marked these 3 options
Kill states [ X ] Disable State Killing on Gateway Failure
Skip rules [ X ] Skip rules when gateway is down
Gateway switching [ X ] Allow default gateway switching

I have no idea if one of these options can result in such an error but i wanted to tell you.


I am not using all of them any more but i added 3 OpenVPN advanced options the day before that problem occured. These options are marked big.

My Advanced Settings:

persist-remote-ip
tun-mtu 1500
fragment 1300
mssfix 1379
#float
hand-window 120
tran-window 3600
inactive 604800
mute-replay-warnings
ns-cert-type server
redirect-gateway def1
resolv-retry 60
tls-cipher TLS-DHE-RSA-WITH-AES-256-GCM-SHA384:TLS-DHE-RSA-WITH-AES-256-CBC-SHA256:TLS-DHE-RSA-WITH-CAMELLIA-256-CBC-SHA:TLS-DHE-RSA-WITH-AES-256-CBC-SHA:TLS-RSA-WITH-CAMELLIA-256-CBC-SHA:TLS-RSA-WITH-AES-256-CBC-SHA
tls-timeout 5
key-direction 1
auth-nocache
auth-retry interact
key-method 2
tls-version-min 1.2
fast-io
mlock
route 0.0.0.0 192.0.0.0 net_gateway
route 64.0.0.0 192.0.0.0 net_gateway
route 128.0.0.0 192.0.0.0 net_gateway
route 192.0.0.0 192.0.0.0 net_gateway

// Problem is still occuring after deleting fast-io, mlock and 1379 from mssfix.
// That was not the issue aswell.


If there are any more questions, don't mind me and just ask. :)
Happy to help.

Best regards,
Oxy / PitchBendStretch

@ghost
Copy link
Author

ghost commented Nov 17, 2017

Hi Franco,

lucky me that my ISP is still having the most insane connection issues ever witnessed to mankind. sigh
I can now confirm alot more details than i did before, because this problem is now a daily reminder and business as usual.


What happens before the error/bug appears?

Increased amount of WAN Interface flapping with Latency spikes up to 24.000ms every 3 seconds leading to restarts and reloads of the WAN interface and every VPN interface connected to the WAN default gateway. Additionally my VPN failover gateway group fails aswell obviously because every VPN gateway is "offline" or "unknown", which means that the whole gateway group gets reloaded every 5 seconds.

Is this problem considered to be a GUI/Dashboard problem only? (visual bug) ?

Yes. Even after several hours of waiting and monitoring the Dashboard, every VPN daemon is still considered "Offline" as shown in the GUI. Additionally the gateways are still flapping, switching from "online" to "offline" every few seconds, altough the connection is already considered stable and working again. Restarting the apinger Daemon does not change the flapping of the gateways happening in the dashboard overview aswell. Pinging is possible without any packet loss while the dashboard tells me that i am offline.

Is the VPN connection successfully established even though the dashboard says otherwise?

Yes. The VPN connection used to browse the internet for example is working perfectly and can be pinged aswell, although the dashboard shows every VPN daemon as "offline".
If the dashboard were right then i would not be able to write this comment right now, since every VPN gateway is "offline" right now. ;)

Any Error messages appearing when the Latency spikes are attacking my connection?

On the right top corner of my GUI i have 12+ unread notices all saying the same:
<Time Date> [There were error(s) loading the rules: no IP address found for ovpncX]
The notices are not increasing any further the moment my ISP connection is stable again.


So in conclusion / in short:

  • I have: 3 VPN daemons / 3 VPN gateways / 3 VPN interfaces
  • All VPN Daemons are down in the dashboard
  • All VPN gateways are flapping constantly aslong as the VPN daemons are "down" in the dashboard
  • 3 out of 3 VPN Interfaces are online in the dashboard.

Dashboard: Every part of my VPN connection is down, except the VPN interfaces.
Reality: everything works fine and all 3 VPN daemons have their own PID.


Best regards,
Oxy / PitchBendStretch

@fichtner fichtner self-assigned this Nov 18, 2017
@fichtner fichtner added the cleanup Low impact changes label Nov 18, 2017
@fichtner fichtner added this to the 18.1 milestone Nov 18, 2017
@visualstation
Copy link

Hello team,
I have the same issue with a NanoBSD fresh installation.

OpenVPN Client is still restarting all the time.

When I restart the firewall and I'm connected in console.

The VPN is stable till (Starting NTP service...deferred.
Generating RRD graphs...done. ):

Configuring CRON...done.
Setting up routes...done.
Starting Unbound DNS...

done.
Generating /etc/hosts...done.
Configuring firewall......done.
Starting NTP service...deferred.
Generating RRD graphs...done.
Starting syslog...done.

After that, the client service is restarting all the time.

@ghost
Copy link
Author

ghost commented Nov 23, 2017

Hey @visualstation,

Your description is a completely different error.
My error is not based on any NTP failure behaviour and my VPN is NOT restarting because the Firewall fails to reboot correctly.
My VPN Connections are flapping because my ISP WAN connection is not stable all the time.
Even after the connection is stable again, the VPN services are marked "down" in the dashboard but successfully connected and working in the "background".

@fichtner
Copy link
Member

Sorry no time at the moment trying to get 18.1 out the door with heavy QA...

@ghost
Copy link
Author

ghost commented Jan 19, 2018

No worries. 18.1 should be prio first please. :)
Just so glad my two issues are still on your list. :)

@ghost
Copy link
Author

ghost commented Apr 24, 2018

Hi Franco,

i monitored this issue way to long and it never happened again since 18.1.
Additionally i am currently not able to reproduce this error anymore. I can shutdown and re-enable all VPN Gateways without any more problems. (Currently running VER.18.1.6)
I saw that @MUSHROOMHOME is still facing this issue at #2243 but for me it's "solved".
You can close this ticket if you want or keep it open if it's still on your "to-do" list. :)
Ask me anything if you need some more answers or want me to try something.

@fichtner fichtner added this to the 18.7 milestone Jun 12, 2018
@fichtner
Copy link
Member

@PitchBendStretch alright, let's close this for now. A couple of service handling related changes went into OpenVPN code for 18.7 and it has indeed been more quiet, maybe also to do with recent OpenVPN software updates.

@esoleyman
Copy link

I'd like to re-open this ticket and I just ran into this issue about 5 minutes ago. I'm running opnsense 18.7.8. Please let me know what details you require.

@MUSHROOMHOME
Copy link

MUSHROOMHOME commented Dec 13, 2018 via email

@mimugmail
Copy link
Member

@MUSHROOMHOME sorry, but did you provide any logs or something else? On all issues I saw there was plenty of feedback from the dev's.

So when you want to invest some time, try to set it up again with current version and reproduce.

I'm (and thousand others) run OpenVPN fine, also with multiple instances, also with multiple WANs, also with Failover, also for Site2Site or Remote Access.

@ghost
Copy link
Author

ghost commented Dec 15, 2018

Hi everyone,

to everyone still having issues. Please make sure that your problem can't be easily fixed by just using dpinger, instead of apinger. Atleast for me after completely abandoning apinger, i never experienced any more problems. This should be your first and "easy" fix to try out. Most of the times the problem is not OpenVPN but a/dpinger related.

Firewall > Settings > Advanced > Gateway Monitoring > Monitoring daemon [ X ] Prefer Dpinger over Apinger

@esoleyman
Copy link

esoleyman commented Dec 15, 2018 via email

@topaDev
Copy link

topaDev commented Jan 11, 2019

Confirmed for 18.7.10, but I will give Dpinger a try

@Alphakilo
Copy link

Switching to dpinger worked for me on 18.7.10

@topaDev
Copy link

topaDev commented Feb 16, 2019

No luck at my side, I'm afraid (dpinger & 19.1.1)

@fichtner
Copy link
Member

Hello, this bug is still closed. See #3223 or provide a new full detail bug report...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup Low impact changes
Development

No branches or pull requests

7 participants