uchiwa needs restarting to update results #438
Comments
What is your refresh parameter in the uchiwa config set to? http://docs.uchiwa.io/en/latest/configuration/uchiwa/ |
@jaxxstorm Not sure what you mean by until I start the daemon. Are you referring to the Uchiwa daemon? I also recommend you to check your browser console and the Uchiwa logs. |
@jaxxstorm As suggested, try to consult your browser logs and the Uchiwa logs. You might also want to try a lower log level (see the |
@jaxxstorm - I've seen this issue too and find that I have to restart Uchiwa before I start seeing latest results. The data looks okay if I drill in to a client but the summaries seem out of whack. We've just restarted our instance of Uchiwa to clear the problem but I'll play around with log levels and browser logs to see whether we can find anything out. Might be worth noting it only started happening in the last few versions, though I'm unable to pin it down to appearing in a specific version. |
Thanks @neilprosser glad I'm not the only one experiencing this. |
@palourde this is still ongoing for me and I see nothing in the logs to indicate why this is happening. I have a screencap of the issue happening and then being resolved by restart, is there somewhere I can send it to you that's off the issue? I don't want the hostnames in the issue being made public |
I am having the same issue. I have to restart uchiwa for the dashboard to refresh. Dashboard is not refreshing as configured by uchiwa.json file! Nothing in the log indicating any issues. |
@jaxxstorm You can send me an email at simon.plourde@gmail.com |
Im having the same issue as well. And nothing helpful in the logs... |
Hi @palourde I've sent over the video of this as promised, sorry for the delay |
Any update on this? I'm having to restart uchiwa once an hour at the moment |
I too have to restart Uchiwa more frequently these days. There are stale alerts and after restart works fine. I checked logs but nothing helpful. |
We ran into the issue today as well. We have it fronting 3 Datacenters and it didn't refresh for over 16 hours. When looking at specific clients it was ok, but the 'checks' and 'clients' dashboard were stale. Did not have this issue until upgrading from 12.x to the 14.x |
I experienced this issue too, but was able to resolve it by reducing the During my testing i cleared the Sensu Redis instance, and reinstalled the uchiwa deb package. |
+1 also experiencing this issue, will see if I can reproduce it in my dev environment and shim in some hopefully useful debugging text |
The issue reoccurred for me today. Both prod and dev instances stopped updating at roughly the same time, around 16:20 CDT. Both instances are configured with the same Sensu datacenter, so it's plausible that they were receiving the same data at the time that they stopped updating. I have the @palourde, Any recommendations on debugging text or similar that might help us diagnose this? Some additional information:
|
Another datapoint: In the logs of one of our Sensu API servers, we can see that every 5 seconds it makes two paginated /clients API calls, one for the first thousand clients, the second for the second thousand. This is as expected, as this datacenter has just over one thousand Sensu clients in it. However, at 16:19:30, the same time that we see the logs for the Uchiwa daemon stop updating, we see the following in the Sensu API logs:
It continues to flood the API server with scores of requests per second until Uchiwa (in this case, the dev instance) was restarted at 17:30:51. By that time, it was attempting to get clients 192402000 through 192403000 :)
EDIT: We see the same behavior on another Sensu API (there are multiple behind haproxy) from the production instance, which got to fetching clients 100873000 to 100874000 before it was restarted at 16:55:
|
I've added the following as debugging text for now to see if there's some unexpected value being returned in the initial call that breaks the
Hopefully we'll get to reproduce the issue soon :) |
dev stopped updating again about 4 hours ago. Prod didn't go with it, this time. The logs got spammed incredibly hard by the debugging text I added, so I don't have the beginning where the infinite loop of /client GETs started. What I have is filled with this:
It looks like Not sure what could have caused this. Perhaps a client was in the process of being added or removed and this caused the API to very briefly return inconsistent data. It being a brief, transient state might also explain why the prod instance didn't experience the issue at the same time Any thoughts on how we might mitigate this, or further troubleshooting steps? EDIT: I should clarify that the location of those debugging lines was moved into the
|
As a temporary solution, running with this change to disable paging of client GETs. |
It seems like a slightly more elegant solution might be to break out of the loop if the subsequent paginated requests do not contain any additional elements. It's not pretty, but it would mitigate the infinite loop scenario while still allowing paginated requests. |
Been running with Contegix@2f4def0 in prod and #464 in dev for over two weeks now. Neither have locked up. In the meantime, since @palourde looks to be MIA, I've gone ahead and updated master of my fork to run with #464 and with the necessary import string changes to point the imports to the right namespace. https://github.com/rmc3/uchiwa If you need a fix before #464 gets merged and a new version is pushed, you might want to try running Uchiwa from source with my fork. See http://docs.uchiwa.io/en/latest/getting-started/#from-source Just keep in mind that anywhere it says "github.com/sensu/uchiwa", you'll use "github.com/rmc3/uchiwa" instead. |
First of all, thanks so much @rmc3 for supplying a fix for this. I built a binary like so, it basically consists of the instructions to build from source, but minimal impact on my current system Ensure This should create a file
Hopefully someone from sensu sees this soon. Ping @palourde @calebhailey or @portertech |
Good thought @jaxxstorm. It's my understanding that you should just be able to use
This should create an executable binary named "uchiwa" in your current working directory, which you can then put in place as suggested by @jaxxstorm |
Great job everyone for troubleshooting this issue, I'll take a look at the related PR today and hopefully have a new release ready ASAP. |
I pushed a simple fix for this issue without disabling pagination: #478 |
+1 having the same issue on the latest release 0.14.5-1. Will try and implement the fix outlined by @palourde |
@harishbsrinivas I don't know if there have been any packages built for 0.14.4, but the release was tagged 10 days ago and it's the only release since @palourde committed the fix. If packages are available, upgrading to that version should fix the issue. |
@harishbsrinivas This fix is included in 0.14.5-1; make sure to restart Uchiwa after the upgrade. |
This has regressed in 0.19.0-1 Exact behavior with the individual client showing correct data while the summary page is stale until restart. |
👍 Also having this same issue with 0.22.0. The main "events" page shows some stale checks (check results I have deleted). If I click on the stale result, it takes me to the client summary page instead of the alert page (indicating it was actually deleted). Restarting uchiwa clears this up. |
Also experiencing the same issue as @hany |
@hany @user9384732902 Thank you for reporting this problem. Could you open a new issue so we can properly track it and include both browser console logs and Uchiwa logs. Also, if possible, could you include some screenshots of the experienced behaviour? I'm unable to reproduce this problem so it would help us a lot! Thanks |
I am currently experiencing the issue in EDIT: Nevermind, it was a very high refresh that came from somewhere. |
This is really hard for me to pin down, but I want to open this to see if anyone else is having a similar problem.
Uchiwa is showing me stale results until I start the daemon. Even if I shift+refresh the page and completely clear my browser cache, the stale results still remain in the client.
Is there anything I can do to give some debug output for this? It's very frustrating for my users who think there's issues, login to a box to debug it and find the issue was resolved ages ago.
The text was updated successfully, but these errors were encountered: