New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(apps/mysql): add error-state to non-responsive mysql-servers #13993
feat(apps/mysql): add error-state to non-responsive mysql-servers #13993
Conversation
@@ -99,6 +99,9 @@ | |||
]; | |||
|
|||
$data = explode("\n", $mysql); | |||
if (sizeof($data) > 0 && strpos($data[0], 'ERROR') === 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably the correct place to update is https://github.com/librenms/librenms/blob/master/includes/polling/functions.inc.php#L447
and change it to check if the string starts with the error message and then mark it as ERROR. That way we don't delete the additional context after the ERROR message.
The other option would be to set $state directly to the update_application() call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From how I understand that function there is no real way to save an error-message in the application
table. Also the app_state
would only hold 32 chars and the app_state
8... So that'd not fit with a full error-message.
We could modify the log_event function that is later called to include the error-message from the response, but i'm not sure what implication that would have for all the other apps really.
If I move it to the $status parameter of the update_application function it would not trigger the alerts (I think) since there is no comparison here https://github.com/librenms/librenms/blob/master/includes/polling/functions.inc.php#L474 that references $status. Should I add that so it also triggers when I update the $status?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm saying inside update_applicatuon(), status is detected. You need to improve the detection to see if it starts with one of the listed strings set the status to the exact string. The full message should be in the response field.
Hope that is clear enough, I'm on my phone.
includes/polling/functions.inc.php
Outdated
$data['app_status'] = $response; | ||
$data['app_state'] = 'ERROR'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now would result in an Error-State even if the application returns an UNSUPPORTED
or LEGACY
. Is this something we can live with or should that also be set correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to bring this up, but it seems as though you have bumped into some deep issues in the application code.
I just looked through the code a bit and currently it seems to confuse app_state and app_status a lot.
app_state should be a fixed set of values 'OK', 'ERROR', 'LEGACY', 'UNSUPPORTED', and 'UNKNOWN' (maybe more) to make alerting easy.
app_status seems mostly to be the output of the script, but not 100%
The update_application() function does not line up with that, confusing things.
/**
* ...
* @param array $app app from the db, including app_id
* @param string $response This should be the return state of Application polling
* @param array $metrics an array of additional metrics to store in the database for alerting
* @param string $status This is the current value for alerting
*/
function update_application($app, $response, $metrics = [], $status = '')
It probably should be this:
/**
* ...
* @param array $app app from the db, including app_id (Note: we should probably be using models instead of arrays, but that is too big of change)
* @param string $status This should be the output from the application script. Shown when an ERROR state is set.
* @param array $metrics a key value array of additional metrics to store in the database for alerting. Must manually set _prev fields if you want them.
* @param string $state The state for alerting. If not set, will attempt to detect from $status, or set to UNKNOWN
*/
function update_application($app, $status, $metrics = [], $state = 'UNKNOWN')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! That's what I was thinking... Should I rebuild that and make a sepearate PR based on this one that updates the behaviour for all other applications?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look at doing this, and it did not work out nicely. Another idea may to have a separate function to set an error state to an application. Putting both in the same function might be why it is so messy.
This pull request has been mentioned on LibreNMS Community. There might be relevant details there: |
…brenms#13993) * feat(apps/mysql): add error-state to non-responsive mysql-servers * feat(alerting): add alert-rule for offline mysql-servers * fixup! feat(alerting): add alert-rule for offline mysql-servers * chore(style): replaced ticks * fixup! chore(style): replaced ticks * feat: added migration to make app_status field bigger * feat: moved error-detection to the application_update function * fixup! feat: moved error-detection to the application_update function * chore: updated schema * chore: fix styling * Cleaner way to handle the error states * regex should not include OK Co-authored-by: Tony Murray <murraytony@gmail.com>
…brenms#13993) * feat(apps/mysql): add error-state to non-responsive mysql-servers * feat(alerting): add alert-rule for offline mysql-servers * fixup! feat(alerting): add alert-rule for offline mysql-servers * chore(style): replaced ticks * fixup! chore(style): replaced ticks * feat: added migration to make app_status field bigger * feat: moved error-detection to the application_update function * fixup! feat: moved error-detection to the application_update function * chore: updated schema * chore: fix styling * Cleaner way to handle the error states * regex should not include OK Co-authored-by: Tony Murray <murraytony@gmail.com>
This PR updates the
app_state
field for mysql-Apps that returned an error during polling.This can be caused by an offline server or invalid user configuration.
I also added an alert template to show how to filter these incidents.
Please note
Testers
If you would like to test this pull request then please run:
./scripts/github-apply <pr_id>
, i.e./scripts/github-apply 5926
After you are done testing, you can remove the changes with
./scripts/github-remove
. If there are schema changes, you can ask on discord how to revert.