-
-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Monitoring sentinel when "Transitions are disabled" is difficult #8162
Comments
This can be fixed in app configuration by disabling the muting transition. There are no muting forms for this project.
Is sentinel doing nothing in this state? What is the purpose of having sentinel run but do nothing like this? The log output makes things look pretty healthy. Would a better pattern be to fail fast? Or could we at least re-print this error each 5 minutes so this isn't a needle in a haystack? |
This is correct, Sentinel is doing no transition processing in this state. The backlog value is true. |
I think this is a great suggestion. Having a way to quickly check whether sentinel is indeed processing transitions over documents is very important. My preferred solution for this is monitoring service logs, which is on the Infrastructure Focus Group OKRs as a goal for the near future. |
Another option is a boolean flag for
Are there more details on what this mean? Even high level vision would be helpful. |
That works too. Adding a health check to Sentinel has been something we've discussed several times in the past. Right now, Sentinel doesn't communicate its state except for the two "seq" docs, which are already used by the monitoring API.
So far, nothing is settled. High level vision would be that all apps logs (api, sentinel, haproxy, nginx, couchdb, etc) are scanned for errors and other significant entries, with filtered entries being funneled into a digestible dashboard of sorts. |
Thanks for the clean up of title and labels of this ticket @kennsippell - much appreciated! |
Ready for AT on To test this - update an |
@kennsippell - to be a bit more explicit for QA on how to test, "unknown or crashing transition" could be achieved with
|
How hard would it be to add a small e2e tests to transitions tests @kennsippell ? It would go along way towards our new quality assistance regime. |
Testing detailsConfig: Default Test scenario
Fixed on
|
Merged. Spoke with @ngaruko about the e2e test suggestion. He said he'd look at existing tests and file a followup ticket as needed. |
Describe the bug
On a production instance we're seeing the
/api/v2/monitoring
return an ever growing value for sentinel backlog. The sentinel container logs are showing that the backlog is up to do date though.To Reproduce
Steps to reproduce the behavior:
/api/v2/monitoring
shows backlog going upExpected behavior
/api/v2/monitoring
and actual back log are in syncLogs
Environment
Additional context
App Monitoring showed (private GH issue) that this happened EXACTLY after we upgrade from 4.0.1 to 4.1.0 on Mar 9th:
Possibly related to the last time this bui happened in #7113 ?
The text was updated successfully, but these errors were encountered: