Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

event_wait : Interrupted system call (code=4) #1462

Closed
steve-spiteri opened this issue Nov 5, 2020 · 11 comments
Closed

event_wait : Interrupted system call (code=4) #1462

steve-spiteri opened this issue Nov 5, 2020 · 11 comments

Comments

@steve-spiteri
Copy link

steve-spiteri commented Nov 5, 2020

Describe the problem
Intermitently, but quite frequently, the following error occurs:
event_wait : Interrupted system call (code=4)

Add your docker run command
root@localhost:# /usr/local/emhttp/plugins/dynamix.docker.manager/scripts/docker run -d --name='Transmission_VPN' --net='bridge' --privileged=true -e TZ="America/New_York" -e HOST_OS="Unraid" -e 'OPENVPN_USERNAME'='*********' -e 'OPENVPN_PASSWORD'='*********' -e 'OPENVPN_CONFIG'='CA Toronto' -e 'OPENVPN_PROVIDER'='PIA' -e 'LOCAL_NETWORK'='192.168.1.0/24' -e 'TRANSMISSION_RPC_USERNAME'='admin' -e 'TRANSMISSION_RPC_PASSWORD'='password' -e 'OPENVPN_OPTS'='--inactive 3600 --ping 10 --ping-exit 60 --txqueuelen 10000' -e 'PUID'='99' -e 'PGID'='100' -e 'TRANSMISSION_DOWNLOAD_DIR'='/downloads' -e 'TRANSMISSION_RPC_AUTHENTICATION_REQUIRED'='false' -e 'TRANSMISSION_RATIO_LIMIT'='0' -e 'TRANSMISSION_RATIO_LIMIT_ENABLED'='true' -e 'TRANSMISSION_ALT_SPEED_DOWN'='0' -e 'TRANSMISSION_ALT_SPEED_UP'='0' -e 'TRANSMISSION_ALT_SPEED_TIME_BEGIN'='420' -e 'TRANSMISSION_ALT_SPEED_TIME_END'='120' -e 'TRANSMISSION_ALT_SPEED_TIME_ENABLED'='true' -e 'TRANSMISSION_DOWNLOAD_QUEUE_SIZE'='15' -e 'TRANSMISSION_CACHE_SIZE_MB'='10' -e 'TRANSMISSION_INCOMPLETE_DIR'='/downloads/incomplete' -e 'TRANSMISSION_WEB_UI'='transmission-web-control' -e 'GLOBAL_APPLY_PERMISSIONS'='false' -p '9091:9091/tcp' -p '1198:1198/udp' -v '/mnt/user/downloads/':'/data':'rw' -v '/mnt/user/downloads/':'/downloads':'rw' -v '/mnt/user/downloads/watch/':'/watch':'rw' -v '/mnt/user/downloads/pull/':'/mnt/user/T_Media/Torrent/':'rw' -v '/mnt/user/appdata/Transmission_VPN':'/config':'rw' --restart=always --log-opt max-size=50m --log-opt max-file=1 -v '/etc/localtime':'/etc/localtime':'ro' --dns=8.8.8.8 --dns=1.1.1.1 'haugene/transmission-openvpn'

Logs

Starting container with revision: 1970f54b88424bf92bc782c3df87310336adf8f4
mknod: /dev/net/tun: File exists
Using OpenVPN provider: PIA
Provider PIA has a custom startup script, executing it
Downloading OpenVPN config bundle openvpn-nextgen into temporary file /tmp/tmp.oJjgKO
Extract OpenVPN config bundle into PIA directory /etc/openvpn/pia
Modify configs for this container
Starting OpenVPN using config CA Toronto.ovpn
Setting OpenVPN credentials...
adding route to local network 192.168.1.0/24 via 172.17.0.1 dev eth0
Wed Nov 4 21:46:21 2020 OpenVPN 2.4.9 x86_64-alpine-linux-musl [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [MH/PKTINFO] [AEAD] built on Apr 20 2020
Wed Nov 4 21:46:21 2020 library versions: OpenSSL 1.1.1g 21 Apr 2020, LZO 2.10
Wed Nov 4 21:46:21 2020 NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
Wed Nov 4 21:46:21 2020 CRL: loaded 1 CRLs from file [[INLINE]]
Wed Nov 4 21:46:21 2020 TCP/UDP: Preserving recently used remote address: [AF_INET]154.3.42.52:1198
Wed Nov 4 21:46:21 2020 UDP link local: (not bound)
Wed Nov 4 21:46:21 2020 UDP link remote: [AF_INET]154.3.42.52:1198
Wed Nov 4 21:46:21 2020 WARNING: this configuration may cache passwords in memory -- use the auth-nocache option to prevent this
Wed Nov 4 21:46:21 2020 [toronto423] Peer Connection Initiated with [AF_INET]154.3.42.52:1198
Wed Nov 4 21:46:22 2020 OpenVPN ROUTE6: OpenVPN needs a gateway parameter for a --route-ipv6 option and no default was specified by either --route-ipv6-gateway or --ifconfig-ipv6 options
Wed Nov 4 21:46:22 2020 OpenVPN ROUTE: failed to parse/resolve route for host/network: 2000::/3
Wed Nov 4 21:46:22 2020 TUN/TAP device tun0 opened
Wed Nov 4 21:46:22 2020 /sbin/ip link set dev tun0 up mtu 1500
Wed Nov 4 21:46:22 2020 /sbin/ip addr add dev tun0 10.19.112.181/24 broadcast 10.19.112.255
Wed Nov 4 21:46:22 2020 /etc/openvpn/tunnelUp.sh tun0 1500 1553 10.19.112.181 255.255.255.0 init
Up script executed with tun0 1500 1553 10.19.112.181 255.255.255.0 init
Updating TRANSMISSION_BIND_ADDRESS_IPV4 to the ip of tun0 : 10.19.112.181
Using Transmission Web Control UI, overriding TRANSMISSION_WEB_HOME
Updating Transmission settings.json with values from env variables
Using existing settings.json for Transmission /data/transmission-home/settings.json
Overriding alt-speed-down because TRANSMISSION_ALT_SPEED_DOWN is set to 0
Overriding alt-speed-time-begin because TRANSMISSION_ALT_SPEED_TIME_BEGIN is set to 420
Overriding alt-speed-time-enabled because TRANSMISSION_ALT_SPEED_TIME_ENABLED is set to True
Overriding alt-speed-time-end because TRANSMISSION_ALT_SPEED_TIME_END is set to 120
Overriding alt-speed-up because TRANSMISSION_ALT_SPEED_UP is set to 0
Overriding bind-address-ipv4 because TRANSMISSION_BIND_ADDRESS_IPV4 is set to 10.19.112.181
Overriding cache-size-mb because TRANSMISSION_CACHE_SIZE_MB is set to 10
Overriding download-dir because TRANSMISSION_DOWNLOAD_DIR is set to /downloads
Overriding download-queue-size because TRANSMISSION_DOWNLOAD_QUEUE_SIZE is set to 15
Overriding incomplete-dir because TRANSMISSION_INCOMPLETE_DIR is set to /downloads/incomplete
Overriding ratio-limit because TRANSMISSION_RATIO_LIMIT is set to 0
Overriding ratio-limit-enabled because TRANSMISSION_RATIO_LIMIT_ENABLED is set to True
Overriding rpc-authentication-required because TRANSMISSION_RPC_AUTHENTICATION_REQUIRED is set to False
Overriding rpc-password because TRANSMISSION_RPC_PASSWORD is set to password
Overriding rpc-port because TRANSMISSION_RPC_PORT is set to 9091
Overriding rpc-username because TRANSMISSION_RPC_USERNAME is set to admin
Overriding watch-dir because TRANSMISSION_WATCH_DIR is set to /data/watch
sed'ing True to true
Enforcing ownership on transmission config directories
Applying permissions to transmission config directories

-------------------------------------
Transmission will run as
-------------------------------------
User name: abc
User uid: 99
User gid: 100
-------------------------------------

STARTING TRANSMISSION
Provider PIA has a script for automatic port forwarding. Will run it now.
If you want to disable this, set environment variable DISABLE_PORT_UPDATER=yes
Transmission startup script complete.
Wed Nov 4 21:46:23 2020 WARNING: OpenVPN was configured to add an IPv6 route over tun0. However, no IPv6 has been configured for this interface, therefore the route installation may fail or may not work as expected.
Wed Nov 4 21:46:23 2020 Initialization Sequence Completed



yes: Broken pipe
port is 25820
the port has been bound to 25820 Wed Nov 4 21:46:29 EST 2020
transmission auth not required
waiting for transmission to become responsive
transmission became responsive
ID Done Have ETA Up Down Ratio Status Name
Sum: None 0.0 0.0
setting transmission port to 25820
localhost:9091/transmission/rpc/ responded: "success"
Checking port...
Port is open: Yes

initial setup complete!

waiting for rebind loop.................
token expiry 1609962385
remaining = 5417985
Wed Nov 4 22:00:58 2020 event_wait : Interrupted system call (code=4)
Wed Nov 4 22:00:58 2020 SIGUSR1[hard,] received, process restarting
Wed Nov 4 22:01:03 2020 NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
Wed Nov 4 22:01:08 2020 RESOLVE: Cannot resolve host address: ca-toronto.privacy.network:1198 (Try again)
Wed Nov 4 22:01:13 2020 RESOLVE: Cannot resolve host address: ca-toronto.privacy.network:1198 (Try again)
Wed Nov 4 22:01:13 2020 Could not determine IPv4/IPv6 protocol
Wed Nov 4 22:01:13 2020 SIGUSR1[soft,init_instance] received, process restarting
Wed Nov 4 22:01:13 2020 RESOLVE: Cannot resolve host address: ca-toronto.privacy.network:1198 (Try again)
Wed Nov 4 22:01:13 2020 Could not determine IPv4/IPv6 protocol
Wed Nov 4 22:01:13 2020 SIGUSR1[soft,init_instance] received, process restarting
Wed Nov 4 22:01:18 2020 NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
Wed Nov 4 22:01:23 2020 RESOLVE: Cannot resolve host address: ca-toronto.privacy.network:1198 (Try again)
Wed Nov 4 22:01:23 2020 RESOLVE: Cannot resolve host address: ca-toronto.privacy.network:1198 (Try again)
Wed Nov 4 22:01:28 2020 RESOLVE: Cannot resolve host address: ca-toronto.privacy.network:1198 (Try again)
Wed Nov 4 22:01:28 2020 Could not determine IPv4/IPv6 protocol
Wed Nov 4 22:01:28 2020 SIGUSR1[soft,init_instance] received, process restarting
Wed Nov 4 22:01:33 2020 NOTE: the current --script-security setting may allow this configuration to call user-defined scripts

Host system:
Unraid 6.8.3 with latest pull from haugene/transmission-openvpn

/ # cat etc/resolv.conf
nameserver 8.8.8.8
nameserver 1.1.1.1
/ # ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
^C
--- 8.8.8.8 ping statistics ---
10 packets transmitted, 0 packets received, 100% packet loss
@xoverride
Copy link
Contributor

xoverride commented Nov 5, 2020

Can you post the following?
After openvpn is up, inside container:
ip r
ping google.com
ip r get 8.8.8.8
netstat -an
ifconfig
ip -s link
Also, post the result when the error occurs, the current version seems to have the healthcheck script run in an interval of 1 min.
If you can: try to add a line exec &>>healthcheck.log in the /etc/scripts/healthcheck.sh file after the first line #!/bin/bash
This will redirect the output of the healthcheck script to the healthcheck.log file, post this log after the error occurs as well.
Want to see if this is related to an issue that ipv6 is not compatible with the healthcheck script. As mentioned in #1416

@steve-spiteri
Copy link
Author

steve-spiteri commented Nov 5, 2020

Can you post the following?
After openvpn is up, inside container:
ip r
ping google.com
ip r get 8.8.8.8
netstat -an
ifconfig
ip -s link
Also, post the result when the error occurs, the current version seems to have the healthcheck script run in an interval of 1 min.
If you can: try to add a line exec &>>healthcheck.log in the /etc/scripts/healthcheck.sh file after the first line #!/bin/bash
This will redirect the output of the healthcheck script to the healthcheck.log file, post this log after the error occurs as well.
Want to see if this is related to an issue that ipv6 is not compatible with the healthcheck script. As mentioned in #1416

OpenVPN Up

/ # ip r
0.0.0.0/1 via 10.27.112.1 dev tun0 
default via 172.17.0.1 dev eth0 
10.27.112.0/24 dev tun0 proto kernel scope link src 10.27.112.5 
66.115.142.221 via 172.17.0.1 dev eth0 
128.0.0.0/1 via 10.27.112.1 dev tun0 
172.17.0.0/16 dev eth0 proto kernel scope link src 172.17.0.6 
192.168.1.0/24 via 172.17.0.1 dev eth0 
/ # ping google.com
PING google.com (172.217.1.14): 56 data bytes
64 bytes from 172.217.1.14: seq=0 ttl=120 time=9.029 ms
64 bytes from 172.217.1.14: seq=1 ttl=120 time=8.762 ms
64 bytes from 172.217.1.14: seq=2 ttl=120 time=9.009 ms
64 bytes from 172.217.1.14: seq=3 ttl=120 time=9.218 ms
^C
--- google.com ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 8.762/9.004/9.218 ms
/ # ip r get 8.8.8.8
8.8.8.8 via 10.27.112.1 dev tun0 src 10.27.112.5 uid 0 
    cache 
/ # netstat -an
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       
tcp        0      0 10.27.112.5:26529       0.0.0.0:*               LISTEN      
tcp        0      0 0.0.0.0:9091            0.0.0.0:*               LISTEN      
tcp        0      0 10.27.112.5:60807       87.98.162.88:443        ESTABLISHED 
tcp        0      0 127.0.0.1:43006         127.0.0.1:9091          TIME_WAIT   
tcp        0      0 10.27.112.5:52625       50.67.20.207:56070      ESTABLISHED 
tcp        0      0 172.17.0.6:9091         192.168.1.64:59078      ESTABLISHED 
tcp        0      0 10.27.112.5:51753       79.67.220.215:32391     TIME_WAIT   
tcp        0      0 127.0.0.1:43016         127.0.0.1:9091          TIME_WAIT   
tcp        0      0 10.27.112.5:56957       109.66.196.221:27633    TIME_WAIT   
tcp        0    156 10.27.112.5:58355       62.44.138.145:51413     FIN_WAIT1   
tcp        0      0 127.0.0.1:43012         127.0.0.1:9091          TIME_WAIT   
tcp        0      0 10.27.112.5:44807       74.82.146.151:60541     ESTABLISHED 
tcp        0      0 10.27.112.5:39815       101.174.236.201:57760   ESTABLISHED 
tcp        0      1 10.27.112.5:35867       67.230.48.237:47946     SYN_SENT    
tcp        0      1 10.27.112.5:58873       41.202.219.69:1551      SYN_SENT    
tcp        0      0 10.27.112.5:26529       190.17.64.4:56200       ESTABLISHED 
tcp        0      0 127.0.0.1:43004         127.0.0.1:9091          TIME_WAIT   
tcp        0      0 10.27.112.5:42637       79.67.220.215:32391     ESTABLISHED 
tcp        0      1 10.27.112.5:55945       24.51.67.28:6881        SYN_SENT    
tcp        0      0 10.27.112.5:35579       45.121.88.33:58938      FIN_WAIT2   
tcp        0      0 10.27.112.5:42235       94.250.115.35:39715     ESTABLISHED 
tcp        0      1 10.27.112.5:41533       86.99.152.22:6881       SYN_SENT    
tcp        0      1 10.27.112.5:59071       14.192.213.220:3866     SYN_SENT    
tcp        0      0 10.27.112.5:43229       213.136.79.7:6881       ESTABLISHED 
tcp        0      0 127.0.0.1:43014         127.0.0.1:9091          TIME_WAIT   
tcp        0      0 127.0.0.1:43002         127.0.0.1:9091          TIME_WAIT   
tcp        0      0 10.27.112.5:60553       23.16.67.70:6881        ESTABLISHED 
tcp        0      0 10.27.112.5:26529       125.167.115.213:14619   ESTABLISHED 
tcp        0      0 10.27.112.5:43819       103.247.48.40:35526     FIN_WAIT2   
tcp        0      1 10.27.112.5:35171       67.230.57.36:18550      SYN_SENT    
tcp        0      1 10.27.112.5:55565       188.197.7.187:64076     SYN_SENT    
tcp        0      0 127.0.0.1:43010         127.0.0.1:9091          TIME_WAIT   
tcp        0      0 10.27.112.5:51659       54.86.208.240:49455     TIME_WAIT   
tcp        0      1 10.27.112.5:46089       63.143.100.33:54229     SYN_SENT    
tcp        0      0 10.27.112.5:36459       109.66.196.221:27633    ESTABLISHED 
tcp        0      0 10.27.112.5:52183       24.54.93.138:6881       ESTABLISHED 
tcp        0      1 10.27.112.5:36411       112.135.254.174:6881    SYN_SENT    
tcp        0      1 10.27.112.5:45447       188.128.255.251:2710    SYN_SENT    
tcp        0      0 127.0.0.1:43008         127.0.0.1:9091          TIME_WAIT   
tcp        0      0 :::26529                :::*                    LISTEN      
udp        0      0 10.27.112.5:34261       0.0.0.0:*                           
udp        0      0 0.0.0.0:36909           0.0.0.0:*                           
udp        0      0 0.0.0.0:36987           0.0.0.0:*                           
udp        0      0 0.0.0.0:57128           0.0.0.0:*                           
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags       Type       State         I-Node Path
unix  3      [ ]         STREAM     CONNECTED     11067291 
unix  3      [ ]         STREAM     CONNECTED     11067290 
/ # ifconfig
eth0      Link encap:Ethernet  HWaddr 02:42:AC:11:00:06  
          inet addr:172.17.0.6  Bcast:172.17.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:732 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1110 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:411820 (402.1 KiB)  TX bytes:208981 (204.0 KiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:82 errors:0 dropped:0 overruns:0 frame:0
          TX packets:82 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:12175 (11.8 KiB)  TX bytes:12175 (11.8 KiB)

tun0      Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  
          inet addr:10.27.112.5  P-t-P:10.27.112.5  Mask:255.255.255.0
          UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1500  Metric:1
          RX packets:506 errors:0 dropped:0 overruns:0 frame:0
          TX packets:948 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:10000 
          RX bytes:73537 (71.8 KiB)  TX bytes:83626 (81.6 KiB)
/ # ip -s link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    RX: bytes  packets  errors  dropped overrun mcast   
    12175      82       0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    12175      82       0       0       0       0       
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
    RX: bytes  packets  errors  dropped overrun mcast   
    0          0        0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    0          0        0       0       0       0       
3: gre0@NONE: <NOARP> mtu 1476 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/gre 0.0.0.0 brd 0.0.0.0
    RX: bytes  packets  errors  dropped overrun mcast   
    0          0        0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    0          0        0       0       0       0       
4: gretap0@NONE: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast   
    0          0        0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    0          0        0       0       0       0       
5: erspan0@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast   
    0          0        0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    0          0        0       0       0       0       
6: ip_vti0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
    RX: bytes  packets  errors  dropped overrun mcast   
    0          0        0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    0          0        0       0       0       0       
7: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/sit 0.0.0.0 brd 0.0.0.0
    RX: bytes  packets  errors  dropped overrun mcast   
    0          0        0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    0          0        0       0       0       0       
8: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc fq state UNKNOWN mode DEFAULT group default qlen 10000
    link/none 
    RX: bytes  packets  errors  dropped overrun mcast   
    75386      523      0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    85728      970      0       0       0       0       
550: eth0@if551: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 02:42:ac:11:00:06 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    RX: bytes  packets  errors  dropped overrun mcast   
    419501     753      0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    213456     1134     0       0       0       0       
/ # 

OpenVPN Down

/ # ip r
0.0.0.0/1 via 10.36.112.1 dev tun0 
default via 172.17.0.1 dev eth0 
10.36.112.0/24 dev tun0 proto kernel scope link src 10.36.112.37 
128.0.0.0/1 via 10.36.112.1 dev tun0 
154.3.40.251 via 172.17.0.1 dev eth0 
172.17.0.0/16 dev eth0 proto kernel scope link src 172.17.0.6 
192.168.1.0/24 via 172.17.0.1 dev eth0 
/ # ping google.com
ping: bad address 'google.com'
/ # ip r get 8.8.8.8
8.8.8.8 via 10.36.112.1 dev tun0 src 10.36.112.37 uid 0 
    cache 
/ # netstat -an
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       
tcp        0      0 10.36.112.37:34261      0.0.0.0:*               LISTEN      
tcp        0      0 0.0.0.0:9091            0.0.0.0:*               LISTEN      
tcp        0  17112 10.36.112.37:34261      41.202.219.69:10053     ESTABLISHED 
tcp        0    655 10.36.112.37:34261      120.29.79.53:41985      ESTABLISHED 
tcp        0    171 10.36.112.37:34261      89.99.133.140:56489     ESTABLISHED 
tcp        0    842 10.36.112.37:34261      99.151.203.185:56930    ESTABLISHED 
tcp        0    289 10.36.112.37:34261      110.74.214.126:2086     ESTABLISHED 
tcp        0    188 10.36.112.37:34261      101.174.236.201:62028   ESTABLISHED 
tcp        0    239 10.36.112.37:34261      111.223.170.184:25227   ESTABLISHED 
tcp        0    263 10.36.112.37:34261      41.217.79.209:30978     ESTABLISHED 
tcp        0    783 10.36.112.37:34261      130.105.239.55:64287    ESTABLISHED 
tcp        0    187 10.36.112.37:34261      125.167.115.213:51319   ESTABLISHED 
tcp        0    326 10.36.112.37:42953      62.44.138.55:51413      FIN_WAIT1   
tcp        0    851 10.36.112.37:34261      119.94.5.112:23097      ESTABLISHED 
tcp        0    178 10.36.112.37:34261      223.25.63.214:2177      ESTABLISHED 
tcp        0    574 10.36.112.37:45465      197.210.47.94:27364     FIN_WAIT1   
tcp        0    434 10.36.112.37:41049      24.54.93.138:6881       ESTABLISHED 
tcp        0    752 10.36.112.37:34261      139.255.55.10:63383     ESTABLISHED 
tcp        0    209 10.36.112.37:56209      94.250.115.35:39715     ESTABLISHED 
tcp        0    297 10.36.112.37:34261      95.168.121.88:55396     ESTABLISHED 
tcp        0    309 10.36.112.37:34261      49.145.3.114:20726      ESTABLISHED 
tcp        0      1 10.36.112.37:52587      197.246.84.186:49526    SYN_SENT    
tcp        0    196 10.36.112.37:34261      190.17.64.4:57848       ESTABLISHED 
tcp        0   7350 10.36.112.37:34261      156.0.230.7:9660        ESTABLISHED 
tcp        0    535 10.36.112.37:34261      124.83.45.91:60234      ESTABLISHED 
tcp        0      1 10.36.112.37:44927      213.136.79.238:6881     FIN_WAIT1   
tcp        0    204 10.36.112.37:34261      37.37.41.112:52997      ESTABLISHED 
tcp        0      1 10.36.112.37:51531      79.67.220.215:32391     FIN_WAIT1   
tcp        0    622 10.36.112.37:34261      120.29.101.229:36866    ESTABLISHED 
tcp        0    222 10.36.112.37:40881      125.238.170.80:11766    ESTABLISHED 
tcp        0    331 10.36.112.37:34261      152.32.111.60:63210     ESTABLISHED 
tcp        0      1 10.36.112.37:58009      50.67.20.207:56070      SYN_SENT    
tcp        0      1 10.36.112.37:57335      50.67.20.207:56070      LAST_ACK    
tcp        0      1 10.36.112.37:58203      89.147.163.25:15496     SYN_SENT    
tcp        0      1 10.36.112.37:34513      103.247.48.40:35526     FIN_WAIT1   
tcp        0    179 10.36.112.37:34261      49.147.199.4:48129      ESTABLISHED 
tcp        0    426 10.36.112.37:34261      197.232.29.138:56051    ESTABLISHED 
tcp        0    130 10.36.112.37:36195      62.44.138.145:51413     FIN_WAIT1   
tcp        0    399 10.36.112.37:34261      117.20.70.246:14102     ESTABLISHED 
tcp        0      1 10.36.112.37:36929      112.198.204.184:64457   SYN_SENT    
tcp        0     52 10.36.112.37:37479      74.82.146.151:60541     ESTABLISHED 
tcp        0 118208 10.36.112.37:34261      61.245.171.5:47828      ESTABLISHED 
tcp        0      1 10.36.112.37:44063      50.67.20.207:56070      LAST_ACK    
tcp        0      1 10.36.112.37:44447      172.103.151.38:62282    SYN_SENT    
tcp        0    215 10.36.112.37:34261      180.253.165.53:13968    ESTABLISHED 
tcp        0    213 10.36.112.37:34261      37.111.228.76:29969     ESTABLISHED 
tcp        0      0 172.17.0.6:9091         192.168.1.64:58489      ESTABLISHED 
tcp        0    366 10.36.112.37:54139      197.210.64.129:1718     FIN_WAIT1   
tcp        0      0 :::34261                :::*                    LISTEN      
udp        0      0 0.0.0.0:51474           0.0.0.0:*                           
udp        0      0 0.0.0.0:37223           0.0.0.0:*                           
udp        0      0 10.36.112.37:44628      0.0.0.0:*                           
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags       Type       State         I-Node Path
unix  3      [ ]         STREAM     CONNECTED     10968151 
unix  3      [ ]         STREAM     CONNECTED     10968152 
/ # ifconfig
eth0      Link encap:Ethernet  HWaddr 02:42:AC:11:00:06  
          inet addr:172.17.0.6  Bcast:172.17.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:376111 errors:0 dropped:0 overruns:0 frame:0
          TX packets:286873 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:460240865 (438.9 MiB)  TX bytes:88039947 (83.9 MiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:86 errors:0 dropped:0 overruns:0 frame:0
          TX packets:86 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:12241 (11.9 KiB)  TX bytes:12241 (11.9 KiB)

tun0      Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  
          inet addr:10.36.112.37  P-t-P:10.36.112.37  Mask:255.255.255.0
          UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1500  Metric:1
          RX packets:373218 errors:0 dropped:0 overruns:0 frame:0
          TX packets:284800 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:10000 
          RX bytes:430953669 (410.9 MiB)  TX bytes:67529182 (64.4 MiB)
/ # ip -s link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    RX: bytes  packets  errors  dropped overrun mcast   
    12241      86       0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    12241      86       0       0       0       0       
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
    RX: bytes  packets  errors  dropped overrun mcast   
    0          0        0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    0          0        0       0       0       0       
3: gre0@NONE: <NOARP> mtu 1476 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/gre 0.0.0.0 brd 0.0.0.0
    RX: bytes  packets  errors  dropped overrun mcast   
    0          0        0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    0          0        0       0       0       0       
4: gretap0@NONE: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast   
    0          0        0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    0          0        0       0       0       0       
5: erspan0@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast   
    0          0        0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    0          0        0       0       0       0       
6: ip_vti0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
    RX: bytes  packets  errors  dropped overrun mcast   
    0          0        0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    0          0        0       0       0       0       
7: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/sit 0.0.0.0 brd 0.0.0.0
    RX: bytes  packets  errors  dropped overrun mcast   
    0          0        0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    0          0        0       0       0       0       
8: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc fq state UNKNOWN mode DEFAULT group default qlen 10000
    link/none 
    RX: bytes  packets  errors  dropped overrun mcast   
    430953669  373218   0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    67529182   284800   0       0       0       0       
486: eth0@if487: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 02:42:ac:11:00:06 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    RX: bytes  packets  errors  dropped overrun mcast   
    460281533  376143   0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    88056754   286899   0       0       0       0       
/ #

healthcheck.log after failure

--- google.com ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 293.573/293.573/293.573 ms
Network is up
Openvpn and transmission-daemon processes are running
PING google.com (172.217.165.14): 56 data bytes

--- google.com ping statistics ---
1 packets transmitted, 0 packets received, 100% packet loss
Network is down
Resetting TUN
Error: either "dev" is duplicate, or "tunl0" is a garbage.
Error: either "dev" is duplicate, or "tunl0" is a garbage.
Sent kill SIGUSR1 to openvpn
ping: bad address 'google.com'
Network is still down
ping: bad address 'google.com'
Network is down
Resetting TUN
Error: either "dev" is duplicate, or "tunl0" is a garbage.
Error: either "dev" is duplicate, or "tunl0" is a garbage.
Sent kill SIGUSR1 to openvpn
ping: bad address 'google.com'
Network is still down
ping: bad address 'google.com'
Network is down
Resetting TUN
Error: either "dev" is duplicate, or "tunl0" is a garbage.
Error: either "dev" is duplicate, or "tunl0" is a garbage.
Sent kill SIGUSR1 to openvpn

@xoverride
Copy link
Contributor

xoverride commented Nov 5, 2020

Ok, so somehow you have 2 tun dev, tun0 and tunl0 where tunl0 is down and not used. The healthcheck script did not pick up the correct interface and didn't bring down the tun0 before restart using SIGUSR1 so it will not recover.
Workaround for now:
For this, you should either remove the unused interface tun0 or edit the healthcheck.sh change INTERFACE=$(ls /sys/class/net | grep tun) to INTERFACE=tun0
You may also take a look at the updated version of the healthcheck.sh in #1459 and maybe use that. (Still need the interface change even with the updated version)
If not you can also change pkill -SIGUSR1 openvpn to pkill -HUP openvpn in the current version, this should make the recover work better.

Now it would still be an issue that the ping failed, need to find out why it's failing, it is not because of ipv6 given by the result you posted. If this is not addressed, your container will still keep restarting. Tying the new script or change ping -c 1 $HOST to ping -c 5 $HOST maybe a good start to see if it's only a glitch in the network, but I don't think that is the reason with --txqueuelen 10000 being set and no dropped packet seen. But the one above the failed ping, the success one took a long time 293.573ms, so maybe it timeout. The ping you did to google.com when you start the openvpn seems to be less than 10ms, does your container running heavy traffic? Can you upload the full log from healthcheck.log.

In your first log, I can also see the PIA's update-port.sh script did not finish the port rebinding loop and OpenVPN is restarted by the health check. May need to fix this add a delay or something, but I am not sure if the rebinding process is the reason that caused ping failed. Well, at least I think it should not.

@steve-spiteri
Copy link
Author

steve-spiteri commented Nov 5, 2020

Ok, so somehow you have 2 tun dev, tun0 and tunl0 where tunl0 is down and not used. The healthcheck script did not pick up the correct interface and didn't bring down the tun0 before restart using SIGUSR1 so it will not recover.
Workaround for now:
For this, you should either remove the unused interface tun0 or edit the healthcheck.sh change INTERFACE=$(ls /sys/class/net | grep tun) to INTERFACE=tun0
You may also take a look at the updated version of the healthcheck.sh in #1459 and maybe use that. (Still need the interface change even with the updated version)
If not you can also change pkill -SIGUSR1 openvpn to pkill -HUP openvpn in the current version, this should make the recover work better.

Now it would still be an issue that the ping failed, need to find out why it's failing, it is not because of ipv6 given by the result you posted. If this is not addressed, your container will still keep restarting. Tying the new script or change ping -c 1 $HOST to ping -c 5 $HOST maybe a good start to see if it's only a glitch in the network, but I don't think that is the reason with --txqueuelen 10000 being set and no dropped packet seen. But the one above the failed ping, the success one took a long time 293.573ms, so maybe it timeout. The ping you did to google.com when you start the openvpn seems to be less than 10ms, does your container running heavy traffic? Can you upload the full log from healthcheck.log.

In your first log, I can also see the PIA's update-port.sh script did not finish the port rebinding loop and OpenVPN is restarted by the health check. May need to fix this add a delay or something, but I am not sure if the rebinding process is the reason that caused ping failed. Well, at least I think it should not.

Using the latest health check script with tun0 explicitly defined

/ # cat logfile.txt 
2020-11-05 13:00:32: Starting Health check script 
PING google.com (172.217.23.46): 56 data bytes
64 bytes from 172.217.23.46: seq=0 ttl=114 time=195.454 ms
64 bytes from 172.217.23.46: seq=1 ttl=114 time=202.339 ms
64 bytes from 172.217.23.46: seq=2 ttl=114 time=195.422 ms
64 bytes from 172.217.23.46: seq=3 ttl=114 time=195.677 ms
64 bytes from 172.217.23.46: seq=4 ttl=114 time=195.615 ms

--- google.com ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 195.422/196.901/202.339 ms
Ping success, checking if using tun
RTNETLINK answers: Network unreachable
Not using tun0!
Sent kill HUP to openvpn

Using updated health check script with no modifications

/ # cat logfile.txt 
2020-11-05 13:05:04: Starting Health check script 
PING google.com (172.217.20.238): 56 data bytes
64 bytes from 172.217.20.238: seq=0 ttl=114 time=197.570 ms
64 bytes from 172.217.20.238: seq=1 ttl=114 time=197.466 ms
64 bytes from 172.217.20.238: seq=2 ttl=114 time=197.412 ms
64 bytes from 172.217.20.238: seq=3 ttl=114 time=197.218 ms
64 bytes from 172.217.20.238: seq=4 ttl=114 time=201.325 ms

--- google.com ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 197.218/198.198/201.325 ms
Ping success, checking if using tun
RTNETLINK answers: Network unreachable
Not using tun0
tunl0!
Error: argument "tun0
tunl0" is wrong: "dev" not a valid ifname
Error: argument "tun0
tunl0" is wrong: "dev" not a valid ifname
Sent kill HUP to openvpn

For completion, updated health check script with tunl0 explicitly defined

/ # cat logfile.txt 
2020-11-05 13:42:43: Starting Health check script 
PING google.com (172.217.23.46): 56 data bytes
64 bytes from 172.217.23.46: seq=0 ttl=114 time=196.919 ms
64 bytes from 172.217.23.46: seq=1 ttl=114 time=198.207 ms
64 bytes from 172.217.23.46: seq=2 ttl=114 time=444.277 ms
64 bytes from 172.217.23.46: seq=3 ttl=114 time=195.945 ms
64 bytes from 172.217.23.46: seq=4 ttl=114 time=195.455 ms

--- google.com ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 195.455/246.160/444.277 ms
Ping success, checking if using tun
RTNETLINK answers: Network unreachable
Not using tunl0!
Sent kill HUP to openvpn

Still trying to outright remove tunl0 or find out where its coming from.

@xoverride
Copy link
Contributor

xoverride commented Nov 5, 2020

Hmm, so with the updated health check script with tun0 explicitly defined, the ping went though but the checking of tun route did not. When ping google.com is ok, can you do getent hosts google.com and see what it give? Expected output would be like 172.217.13.238 google.com google.com, also do getent ahosts google.com see if any ipv6 address show up then do ip r get 172.217.13.238 change the ip to what you get from the getent output.

@steve-spiteri
Copy link
Author

steve-spiteri commented Nov 5, 2020

Hmm, so with the updated health check script with tun0 explicitly defined, the ping went though but the checking of tun route did not. When ping google.com is ok, can you do getent hosts google.com and see what it give? Expected output would be like 172.217.13.238 google.com google.com, also do getent ahosts google.com see if any ipv6 address show up then do ip r get 172.217.13.238 change the ip to what you get from the getent output.

/ # getent hosts google.com
2a00:1450:4001:81f::200e  google.com  google.com

/ # getent ahosts google.com
172.217.23.174  STREAM google.com
172.217.23.174  DGRAM  google.com

/ # ip r get 2a00:1450:4001:81f::200e
RTNETLINK answers: Network unreachable

/ # ip r get 172.217.13.238
172.217.13.238 via 10.7.112.1 dev tun0 src 10.7.112.7 uid 0 
    cache 

@xoverride
Copy link
Contributor

xoverride commented Nov 5, 2020

Ok, it is the ipv6 dns causing the issue here, now as posted in #1459 you can change the precedence in the gai.conf (should add an option to make this automatic in a later update, https://askubuntu.com/questions/32298/prefer-a-ipv4-dns-lookups-before-aaaaipv6-lookups suggest look at the man page ofgai.conf, some say for it to work properly needs to un-comment all the lines if precedence is added), I didn't test that yet so I am not sure or for now change the line IP=$(getent hosts ${HOST} | awk '{print $1}' | head -1) to IP=$(getent ahostsv4 ${HOST} | awk '{print $1}' | head -1) and see if the updated health check script runs ok now.

@steve-spiteri
Copy link
Author

steve-spiteri commented Nov 5, 2020

Ok, it is the ipv6 dns causing the issue here, now as posted in #1459 you can change the precedence in the gai.conf (should add an option to make this automatic in a later update, https://askubuntu.com/questions/32298/prefer-a-ipv4-dns-lookups-before-aaaaipv6-lookups suggest look at the man page ofgai.conf, some say for it to work properly needs to un-comment all the lines if precedence is added), I didn't test that yet so I am not sure or for now change the line IP=$(getent hosts ${HOST} | awk '{print $1}' | head -1) to IP=$(getent ahostsv4 ${HOST} | awk '{print $1}' | head -1) and see if the updated health check script runs ok now.

I have explicitly defined tun0 and added IP=$(getent ahostsv4 ${HOST} | awk '{print $1}' | head -1) and it looks promising.
I will monitor and report back.

/ # cat logfile.txt 
2020-11-05 17:23:53: Starting Health check script 
PING google.com (172.217.19.110): 56 data bytes
64 bytes from 172.217.19.110: seq=0 ttl=117 time=212.509 ms
64 bytes from 172.217.19.110: seq=1 ttl=117 time=171.939 ms
64 bytes from 172.217.19.110: seq=2 ttl=117 time=187.219 ms
64 bytes from 172.217.19.110: seq=3 ttl=117 time=176.366 ms
64 bytes from 172.217.19.110: seq=4 ttl=117 time=176.166 ms

--- google.com ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 171.939/184.839/212.509 ms
Ping success, checking if using tun
172.217.19.110 via 10.9.112.1 dev tun0 src 10.9.112.2 uid 0 
Using tun0, OK!
PING google.com (172.217.19.110): 56 data bytes
64 bytes from 172.217.19.110: seq=0 ttl=117 time=219.108 ms
64 bytes from 172.217.19.110: seq=1 ttl=117 time=222.338 ms
64 bytes from 172.217.19.110: seq=3 ttl=117 time=162.810 ms
64 bytes from 172.217.19.110: seq=4 ttl=117 time=188.785 ms

--- google.com ping statistics ---
5 packets transmitted, 4 packets received, 20% packet loss
round-trip min/avg/max = 162.810/198.260/222.338 ms
Network is up
Openvpn and transmission-daemon processes are running

@steve-spiteri
Copy link
Author

Changes made to healthcheck.sh resolved the issue.
Specifically changing INTERFACE=$(ls /sys/class/net | grep tun) to INTERFACE=tun0 due to my container inexplicably creating multiple tunnel interfaces and IP=$(getent hosts ${HOST} | awk '{print $1}' | head -1) to IP=$(getent ahostsv4 ${HOST} | awk '{print $1}' | head -1) due to IPv6 address being returned from getent.

@xoverride
Copy link
Contributor

xoverride commented Nov 6, 2020

Glad that works. I will update the health check script for the ipv6 issue, and post a warning for multiple tun device detected. I will be trying to find a way to get the correct tun interface being used to the script.

@whatserface
Copy link

For anyone having this problem. You might want to look into the latest running processes through "ps -aux" on Linux. Some of them might be trying to kill openvpn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants