-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remote peer's TCP disconnect is not detected #9
Comments
A fix has been pushed to the master branch. Userspace does not need to be patched, although I have sent a patch over the ml to improve logging in this specific scenario. @lstipakov want to give it a go? |
I can confirm that it works. Userspace After stopping client I got this in server log:
I was able to reconnect without any issues. With this patch I see the proper log message:
|
Hi,
I can also confirm that this looks much better now - I hit the TCP
server with a gremlin test for ~15 minutes, and when I killed all
client side OpenVPN processes, the server very quickly got rid of
most peers (funny enough, the kernel logs show up in syslog only
*after* openvpn has logged "peer gone")
```
Jan 12 21:55:48 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: ovpn-dco: received CMD_DEL_PEER, ifindex: 4, peer-id 22, reason: 4
Jan 12 21:55:48 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin57686/194.97.140.21:44160 peer-id=22 SIGTERM[soft,ovpn-dco: transport disconnected] received, client-instance exiting
Jan 12 21:55:48 ubuntu2004 kernel: [ 693.974398] tun0: deleting peer with id 22, reason 4
Jan 12 21:55:50 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: ovpn-dco: received CMD_DEL_PEER, ifindex: 4, peer-id 2, reason: 4
Jan 12 21:55:50 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin52972/2001:608:0:814::f000:21 peer-id=2 SIGTERM[soft,ovpn-dco: transport disconnected] received, client-instance exiting
Jan 12 21:55:50 ubuntu2004 kernel: [ 695.527141] tun0: deleting peer with id 2, reason 4
Jan 12 21:55:52 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: ovpn-dco: received CMD_DEL_PEER, ifindex: 4, peer-id 21, reason: 4
Jan 12 21:55:52 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin60162/194.97.140.21:44159 peer-id=21 SIGTERM[soft,ovpn-dco: transport disconnected] received, client-instance exiting
Jan 12 21:55:52 ubuntu2004 kernel: [ 697.191423] tun0: deleting peer with id 21, reason 4
Jan 12 21:55:54 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: ovpn-dco: received CMD_DEL_PEER, ifindex: 4, peer-id 3, reason: 4
Jan 12 21:55:54 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin58919/2001:608:0:814::f000:21 peer-id=3 SIGTERM[soft,ovpn-dco: transport disconnected] received, client-instance exiting
Jan 12 21:55:54 ubuntu2004 kernel: [ 699.631608] tun0: deleting peer with id 3, reason 4
Jan 12 21:55:55 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: ovpn-dco: received CMD_DEL_PEER, ifindex: 4, peer-id 19, reason: 4
Jan 12 21:55:55 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin60164/194.97.140.21:44157 peer-id=19 SIGTERM[soft,ovpn-dco: transport disconnected] received, client-instance exiting
Jan 12 21:55:55 ubuntu2004 kernel: [ 700.831083] tun0: deleting peer with id 19, reason 4
```
... but... something between OpenVPN and kernel still gets desynched...
```
fbsd-TC.ov:~/gremlin$ ps axwu |grep openvpn
gert 62267 0.0 0.2 11292 2028 0 S+ 21:57 0:00.00 grep openvpn
fbsd-TC.ov:~/gremlin$
```
so, no more openvpn clients...
```
Jan 12 21:57:43 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin49449/2001:608:0:814::f000:21 peer-id=4 dco_update_keys: peer_id=4
Jan 12 21:57:44 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin49590/2001:608:0:814::f000:21 peer-id=8 dco_update_keys: peer_id=8
Jan 12 21:57:51 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin49673/2001:608:0:814::f000:21 peer-id=35 dco_update_keys: peer_id=35
Jan 12 21:57:51 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin49673/2001:608:0:814::f000:21 peer-id=35 TCPv6_SERVER WRITE [405] to [AF_INET6]2001:608:0:814::f000:21:43517: P_CONTROL_V1 kid=0 [ 5 4 3 2 ] pid=5 DATA len=367
Jan 12 21:57:51 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin49673/2001:608:0:814::f000:21 peer-id=35 dco_do_write: peer-id 35, len=405
Jan 12 21:57:51 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin49673/2001:608:0:814::f000:21 peer-id=35 dco_do_write: netlink reports error (-1): Unspecific failure: No such file or directory (errno=2)
Jan 12 21:57:51 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin49673/2001:608:0:814::f000:21 peer-id=35 dco_do_write: failed to send netlink message: No route to host (-113)
Jan 12 21:57:51 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin49673/2001:608:0:814::f000:21 peer-id=35 write TCPv6_SERVER []: No such file or directory (fd=-1,code=2)
Jan 12 21:57:51 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin49673/2001:608:0:814::f000:21 peer-id=35 dco_update_keys: peer_id=35
Jan 12 21:57:54 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin49590/2001:608:0:814::f000:21 peer-id=8 dco_update_keys: peer_id=8
Jan 12 21:57:57 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin49746/2001:608:0:814::f000:21 peer-id=42 dco_update_keys: peer_id=42
Jan 12 21:57:58 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin49449/2001:608:0:814::f000:21 peer-id=4 dco_update_keys: peer_id=4
Jan 12 21:57:59 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin49590/2001:608:0:814::f000:21 peer-id=8 dco_update_keys: peer_id=8
Jan 12 21:58:06 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin49673/2001:608:0:814::f000:21 peer-id=35 dco_update_keys: peer_id=35
Jan 12 21:58:12 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin49746/2001:608:0:814::f000:21 peer-id=42 dco_update_keys: peer_id=42
Jan 12 21:58:13 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin49449/2001:608:0:814::f000:21 peer-id=4 dco_update_keys: peer_id=4
Jan 12 21:58:14 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin49590/2001:608:0:814::f000:21 peer-id=8 dco_update_keys: peer_id=8
Jan 12 21:58:21 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin49673/2001:608:0:814::f000:21 peer-id=35 dco_update_keys: peer_id=35
Jan 12 21:58:27 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin49746/2001:608:0:814::f000:21 peer-id=42 dco_update_keys: peer_id=42
Jan 12 21:58:28 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin49449/2001:608:0:814::f000:21 peer-id=4 dco_update_keys: peer_id=4
Jan 12 21:58:29 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin49590/2001:608:0:814::f000:21 peer-id=8 dco_update_keys: peer_id=8
...
Jan 12 22:00:14 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin49449/2001:608:0:814::f000:21 peer-id=4 dco_update_keys: peer_id=4
Jan 12 22:00:15 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin49590/2001:608:0:814::f000:21 peer-id=8 dco_update_keys: peer_id=8
Jan 12 22:00:23 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin49673/2001:608:0:814::f000:21 peer-id=35 dco_update_keys: peer_id=35
Jan 12 22:00:27 ubuntu2004 tun-tcp-p2mp-username-cn[2988]: gremlin49746/2001:608:0:814::f000:21 peer-id=42 dco_update_keys: peer_id=42
```
... but userland still thinks "oah, lots of them, must log something about
keys!" (but if it goes out and tries to actually send tls renegotiation,
kernel tells it "nah, go away" - so it will eventually recover, but it
*should* not have reached this state ever)
It's only 4 peers out of ~600 connections, so it "mostly works", but
it seems there still is a race condition somewhere.
gert
--
"If was one thing all people took for granted, was conviction that if you
feed honest figures into a computer, honest figures come out. Never doubted
it myself till I met a computer with a sense of humor."
Robert A. Heinlein, The Moon is a Harsh Mistress
Gert Doering - Munich, Germany ***@***.***
|
@cron2 may it be related to the other issue? I.e. netlink buffer got full and some messages got lost? |
Hi,
On Sun, Jan 15, 2023 at 12:16:11PM -0800, Antonio Quartulli wrote:
@cron2 may it be related to the other issue? I.e. netlink buffer got full and some messages got lost?
Might be, but shouldn't we see netlink errors then? Somewhere?
(Which I've never seen)
gert
--
"If was one thing all people took for granted, was conviction that if you
feed honest figures into a computer, honest figures come out. Never doubted
it myself till I met a computer with a sense of humor."
Robert A. Heinlein, The Moon is a Harsh Mistress
Gert Doering - Munich, Germany ***@***.***
|
Good question, but I don't know. DEL_PEER is delivered in multicast, so there is no "read a unicast message on your socket". And I am not sure if we are performing any read since we are not getting notified of any new message. But yeah, I would also expect $some error at that point...So maybe this was not the right hint |
@cron2 from the log, would you be able to check if you received a DEL_PEER for all peers or if indeed we are missing some of them? This would help understanding where the issue is (i.e. event not sent/received at all or event not properly handled) |
The issue has been reported as fixed. The latest ovpn-dco release moves control packets back to the transport sockets rather than using netlink, and thanks to this also diusconnection detection is handled again in userspace directly. |
Steps to reproduce:
0.1.20221107
) and proto tcpExpected results:
OVPN_CMD_DEL_PEER
to userspace with the reasonOVPN_DEL_PEER_REASON_TEARDOWN
orOVPN_DEL_PEER_REASON_TRANSPORT_ERROR
Actual results:
cannot send TCP packet to peer 0: -104
)Client log:
Server log:
As it can be seen from the logs above, client disconnected at
14:19:14
and server removed client instance at14:19:35
when ovpn-dco wasn't able to send keepalive message.The text was updated successfully, but these errors were encountered: