Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenVPN process status says not running when it is. #3223

Closed
marjohn56 opened this issue Feb 11, 2019 · 30 comments
Closed

OpenVPN process status says not running when it is. #3223

marjohn56 opened this issue Feb 11, 2019 · 30 comments
Assignees
Labels
bug Production bug
Milestone

Comments

@marjohn56
Copy link
Member

https://forum.opnsense.org/index.php?topic=11562.0

This seems to be a stick pid issue. We are testing a solution for it.

@AdSchellevis
Copy link
Member

the status check connects to the socket, if that doesn't work, this might happen. (I've seen something similar over here too)

function openvpn_get_server_status($server, $socket)

If it reports an error, we might want to check for a running pid before returning not active, if I'm not mistaken, you can't restart the server in these cases (since reported running), right?

@marjohn56
Copy link
Member Author

The pid's don't match. The running process pid and the one in var/run are different. Thus the dashboard says nay but openvpn says yay!

You can't restart no, as the pid is wrong. Manually killing it will not help either as you also need to delete the stuck pid, do that and all is good. What I've done in my test is to force a clean up of the stuck pid in the openvpn restart function, at present it just deletes the stuck pid. What really should be done is to check that the process is actually not running - THEN delete the pid, however lets see what happens with my tester and see what he reports.

@AdSchellevis
Copy link
Member

sounds like a plan :)

@fichtner
Copy link
Member

fichtner commented Feb 12, 2019 via email

@marjohn56
Copy link
Member Author

No and I cannot reproduce it either, or at least only on rare occasions. Ned however can reproduce it at will. He's had the issue since the early 18 series, but now it's starting to annoy him so I said I'd tale a look.

I think it only happens on restart, not sure if it does it on IF down/up. Whichever way this occurs, the only usual way a process would leave its pid behind is if it exits abnormally.

@Space2Man
Copy link

Hi,

some things I found out. This happens always after a restart:

root@OPNvirt:~ # ls -lT /var/run/openvpn_server1.pid 
-rw-r--r--  1 root  wheel  6 Feb 12 20:31:58 2019 /var/run/openvpn_server1.pid

After reboot I see three times a Resync

Feb 12 20:31:45 OPNvirt opnsense: /usr/local/etc/rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN_DHCP6.
Feb 12 20:31:45 OPNvirt opnsense: /usr/local/etc/rc.openvpn: OpenVPN: Resync server1 SpaceNet OpenVPN Server
Feb 12 20:31:47 OPNvirt opnsense: /usr/local/etc/rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN_IPv4_Dokom21.
Feb 12 20:31:51 OPNvirt opnsense: /usr/local/etc/rc.openvpn: OpenVPN: Resync server1 SpaceNet OpenVPN Server
Feb 12 20:31:57 OPNvirt opnsense: /usr/local/etc/rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN_DHCP6.
Feb 12 20:31:57 OPNvirt opnsense: /usr/local/etc/rc.openvpn: OpenVPN: Resync server1 SpaceNet OpenVPN Server

My guess is that the restarts are too close to each other so one get's started while the previous one still did not finish. And maybe the third one created the PID but then failed to start ...

@marjohn56
Copy link
Member Author

#3229 possible fix ... needs testing

fichtner added a commit that referenced this issue Feb 14, 2019
@fichtner fichtner added the bug Production bug label Feb 14, 2019
@fichtner fichtner added this to the 19.7 milestone Feb 14, 2019
@fichtner
Copy link
Member

@BPtLNfxZWo @Space2Man can you please try c217bee ?

# opnsense-patch c217bee

@fichtner
Copy link
Member

Some patches from pfSense:

pfsense/pfsense@8845e13
pfsense/pfsense@ce98375

@BPtLNfxZWo
Copy link

Seems to work right now.
GUI information correct, error messages gone, tunnel could be established
Thanks

@Space2Man
Copy link

Seems to work fine! After reboot service is listed as started!

@fichtner
Copy link
Member

Ok, I cannot guarantee that this will not fail later anyway. Patches posted earlier suggest that OpenVPN may be issuing its PID files too late, but we'll see when we get there.

Thanks everyone! ❤️

fichtner added a commit that referenced this issue Feb 15, 2019
(cherry picked from commit c217bee)
(cherry picked from commit 156d6f7)
(cherry picked from commit f10b710)
@BPtLNfxZWo
Copy link

Hi,
checked today, same issue again :-(

@marjohn56
Copy link
Member Author

@Space2Man & @BPtLNfxZWo

Looking at some debug info that I have been sent. Are you using a Dyn DNS service and are you using dhcp6 for WAN?

@BPtLNfxZWo
Copy link

Hi, in my case I'm using Dynamic DNS but no dhcpv6 on WAN.

@nivek1612
Copy link

nivek1612 commented Feb 20, 2019

@marjohn56 I'll be in France from Friday my set up there is Dynamic DNS and dhcp6c on WAN. Want me to test anything ? Currently on 18.7.9 without issues planning to take the test system out and try 19.1.1

@marjohn56
Copy link
Member Author

marjohn56 commented Feb 20, 2019 via email

@marjohn56
Copy link
Member Author

marjohn56 commented Feb 20, 2019

@BPtLNfxZWo > Hi, in my case I'm using Dynamic DNS but no dhcpv6 on WAN.

so no ipv6?

@BPtLNfxZWo
Copy link

Hi, I'm have ipv6 activated, but only for (internal) testing purposes and not on the WAN Interface.

@marjohn56
Copy link
Member Author

Thanks, that rules something out, so no ipv6. Can you disable your Dyn DNS update and see if that has any effect on your openvpn issue. I'm trying to rule things out, I have no issues at all and I'm trying to replicate it.

@kkohio
Copy link

kkohio commented Feb 21, 2019

I just wanted to add that I am on a fresh setup on 19.1 LibreSSL side with very little configured. No IPv6 or DynDNS... just two WANs with failover, ipv4 DHCP on the LAN. I single port forwarding rule and pretty much everything else is pristine.

@Benqer0
Copy link

Benqer0 commented Mar 7, 2019

I have the same problem:

Mar 7 21:14:33 openvpn[34459]: Exiting due to fatal error
Mar 7 21:14:33 openvpn[34459]: Cannot open TUN/TAP dev /dev/tun1: Device busy (errno=16)
Mar 7 21:14:33 openvpn[34459]: TUN/TAP device ovpns1 exists previously, keep at program end
Mar 7 21:14:33 openvpn[34459]: NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
Mar 7 21:14:33 openvpn[34138]: library versions: OpenSSL 1.0.2q 20 Nov 2018, LZO 2.10
Mar 7 21:14:33 openvpn[34138]: OpenVPN 2.4.7 amd64-portbld-freebsd11.2 [SSL (OpenSSL)] [LZO] [LZ4] [MH/RECVDA] [AEAD] built on Feb 27 2019

OPNsense 19.1.2-amd64
FreeBSD 11.2-RELEASE-p9-HBSD
OpenSSL 1.0.2q 20 Nov 2018

@JasMan78
Copy link

Same issue here. It suddendly appeared after I've changed all my LAN subnet addresses by replacing them in the config backup file and restore it back.

OPNsense 19.1.4-amd64
Intel(R) Celeron(R) CPU J1900 @ 1.99GHz (4 cores)

@Space2Man
Copy link

FYI: I am on 19.1.4 and issue is back ... yes, I am using IPv6 on WAN but no DynDNS.

@Space2Man
Copy link

Funny, right after posting this 19.1.5 was released ... I just updated and OpenVPN is in status green again ... I will monitor.

Thanks!

@fichtner
Copy link
Member

fichtner commented Apr 5, 2019

I'll be back eventually. This is a race condition inside OpenVPN and its PID file creation in which OpenVPN backgrounds itself before the PID file is written. Someone needs to write safeguard code here in order to fix this. I don't think OpenVPN will....

@Space2Man
Copy link

Hi, can confirm ... after "saving" WAN interface to trigger reload issue is back ... really seems to be some race condition. Nevertheless thanks for all the support ... I mean, it's working ... so it's only a display issue.

@fichtner
Copy link
Member

fichtner commented Apr 5, 2019

I'll try to look at it for 19.7. At the moment, however, priorities lie elsewhere.

fichtner added a commit that referenced this issue Apr 29, 2019
EugenMayer pushed a commit to KontextWork/opnsense_core that referenced this issue Jul 22, 2019
EugenMayer pushed a commit to KontextWork/opnsense_core that referenced this issue Jul 22, 2019
EugenMayer pushed a commit to KontextWork/opnsense_core that referenced this issue Jul 22, 2019
EugenMayer pushed a commit to KontextWork/opnsense_core that referenced this issue Jul 22, 2019
@Seethaar
Copy link

I tested this on a VM. When I built the new OPNsense firewall on version 19.7 (not the latest but just 19.7), there were no issues. But I took a VMware snapshot and spun a new firewall up from that template, which carried the CA, self-signed cert, VPN server and OTP server and I observed that the firewall is exhibiting the same behaviour mentioned on this issue.

  1. Dashboard says OPENVPN not working.
  2. Server error log : openvpn[66505]: Exiting due to fatal error
  3. VPN server 'Connection Status' indeterminable.
  4. Clients can connect successfully.

Thought worth sharing.

@fichtner
Copy link
Member

fichtner commented Jan 27, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Production bug
Development

No branches or pull requests

10 participants