New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Statuspage-error on some TP-Link WR1043v1 with gluon-v2021.1 #2256
Comments
Question is if we want to relax the timeout a bit, or if that has downsides. gluon/package/gluon-status-page/luasrc/lib/gluon/status-page/controller/status-page.lua Line 64 in ab4c998
|
I was about to report a similar bug on my wr841n v13, but those might be related. Whenever the router boots without WiFi (disabled via the physical button), I am unable to find any status-page/provider processes:
That, in turn, causes the gluon-neighbour-info to return error code 1 and print no output
In comparison to a working machine, I can see plenty of status-page/provider processes:
Where is this status page provider started? How can I continue debugging here? There is actually a second bug overlapping with the above, causing the browser error message to be so cryptic at first. The HTML is malformed and contains half an HTML page which ends where the nodeinfo call is and a second (well-formed) HTML page after that, containing the error message (probably this one). The embedding of both together cannot be rendered of course. But I would open a second bug report for this if there are no objections. |
@mweinelt et al Can you open a PR for increasing the timeout? So we can evaluate on a common base whether or not it improves our situation. This issue is now tracked as #2260 Please try 19381a2, it should fix the segfault. freifunk-gluon/packages@825aa0c#commitcomment-53000911 is still valid Please try 19381a2, it should fix the segfault. freifunk-gluon/packages@825aa0c#commitcomment-53000911 is still valid |
It was found that a one second timeout for nodeinfo data may be too low, so that when a node is otherwise occupied that timeout may be reached too often. The nodeinfo query response is also vital to the status-page base template, so that when it times out, the site will be turned in a broken state, that it cannot recover from. Fixes: #2256
@blocktrron The problem with the missing nodeinfo is fixed with the most recent commit. I was also able to confirm that it was all due to a crashing respondd. |
It was found that a one second timeout for nodeinfo data may be too low, so that when a node is otherwise occupied that timeout may be reached too often. The nodeinfo query response is also vital to the status-page base template, so that when it times out, the site will be turned in a broken state, that it cannot recover from. Fixes: #2256
It was found that a one second timeout for nodeinfo data may be too low, so that when a node is otherwise occupied that timeout may be reached too often. The nodeinfo query response is also vital to the status-page base template, so that when it times out, the site will be turned in a broken state, that it cannot recover from. Fixes: #2256 (cherry picked from commit 76185e3)
With TP-Lnik WR1043v1-nodes on gluon-v2021.1-1-g0f9a6334+ the call for the statuspage randomly ends with a:
curl tells:
Seems as if nodeinfo sometimes is not defined because of a timeout of a "gluon-neighbour-info"-call.
Usually we see this only on small and older nodes-hardware with a higher load. But this one was deeply relaxed.
The text was updated successfully, but these errors were encountered: