Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2023.1.1] Tunneldigger causes high CPU Load when enabled but no WAN connection available #3024

Open
dariks opened this issue Oct 15, 2023 · 3 comments

Comments

@dariks
Copy link

dariks commented Oct 15, 2023

Bug report

What is the problem?
With enabled Tunneldigger but no WAN connection (e.g. Cable is disconnected on WAN) the node is experiencing high CPU Load Average (1.5-2.0) because of Tunneldigger's effort to restart and reconnect. This also causes super high flowrate of logs.
At the moment we have recreated this issue on TP-Link Archer C7 v2, Archer C7 v5 and Xiaomi Mi 4a Gigabit, so this seems to be independent of the device.

Log output (this log absolutely SPAMS logread...):

Sun Oct 15 14:35:46 2023 daemon.err tunneldigger[10258]: td-client: [gw05.ffnordhessen.net:20008] Reinitializing tunnel context.
Sun Oct 15 14:35:46 2023 daemon.err td-client: [gw05.ffnh.net:20008] Failed to connect to remote endpoint - check WAN connectivity!
Sun Oct 15 14:35:46 2023 daemon.err tunneldigger[10258]: td-client: [gw05.ffnh.net:20008] Failed to connect to remote endpoint - check WAN connectivity!
Sun Oct 15 14:35:46 2023 daemon.err td-client: [gw05.ffnordhessen.net:20008] Failed to connect to remote endpoint - check WAN connectivity!
Sun Oct 15 14:35:46 2023 daemon.info td-client: [gw05.ffnh.net:20008] Reinitializing tunnel context.
Sun Oct 15 14:35:46 2023 daemon.err tunneldigger[10258]: td-client: [gw05.ffnordhessen.net:20008] Failed to connect to remote endpoint - check WAN connectivity!
Sun Oct 15 14:35:46 2023 daemon.err tunneldigger[10258]: td-client: [gw05.ffnh.net:20008] Reinitializing tunnel context.
Sun Oct 15 14:35:46 2023 daemon.err td-client: No suitable brokers found. Retrying in 5 seconds
Sun Oct 15 14:35:46 2023 daemon.err tunneldigger[10258]: td-client: No suitable brokers found. Retrying in 5 seconds

Devices with said issues:
https://map.freifunk-nordhessen.de/#!/de/map/60e32727c0fe
https://map.freifunk-nordhessen.de/#!/de/map/8cdef9a91277

What is the expected behaviour?
Tunneldigger should retry, but it could make sense to reduce the number of retries after X amount of tries (e.g. every 5 minutes after 10 fails).

Gluon Version:
Gluon v2023.1.1
seems like Commit ID 7feebe9

Site Configuration:
https://gitlab.com/freifunk-nordhessen/ffnh-site

Custom patches:
no custom patches applied to Tunneldigger and/or dnsmasq-wan.

@dariks
Copy link
Author

dariks commented Oct 15, 2023

Link to Screen capture of the issue:
https://rippen-cloud.de/index.php/s/XK3RztbWmMWHQDN

Screenshot of this issue:
Bildschirmfoto 2023-10-15 um 15 42 18

Screenshot of Grafana Monitoring when this issue occurs:
ffnh-firmware-dri-03-high-load-tunneldigger

@neocturne
Copy link
Member

Thanks for the report.

Is tunneldigger retrying more frequently than every 5 seconds as printed in the logs, or does it cause such high load even with when only retrying every 5 seconds? What is the load average when you stop tunneldigger (/etc/init.d/tunneldigger stop?

@dariks
Copy link
Author

dariks commented Oct 16, 2023

Thank you for taking your time to create such an useful tool.

It doesn't seem to run more often than in the 5 second interval. It rather seems as if the "trying to connect - erroring out ("Failed to connect to remote endpoint - check WAN connectivity!") - Reinitializing tunnel context - trying to connect next gateway" process is running way too quick and causing the load (take a look at the attached link to see a video of the issue.). It seems as if there should be some kind of rate limit at which the Tunneldigger should connect.
I can confirm running /etc/init.d/tunneldigger stop mitigates the high load average for a short period of time (Tunneldigger gets restartet automatically, probably by the Tunneldigger-Watchdog).
The 1min load average dropped to 0.02 after 3 minutes of running with stopped Tunneldigger.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants