Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(apps/mysql): add error-state to non-responsive mysql-servers #13993

Merged
merged 12 commits into from Jun 10, 2022

Conversation

bennet-esyoil
Copy link
Contributor

@bennet-esyoil bennet-esyoil commented May 25, 2022

This PR updates the app_state field for mysql-Apps that returned an error during polling.
This can be caused by an offline server or invalid user configuration.

I also added an alert template to show how to filter these incidents.

Please note

Please read this information carefully. You can run ./lnms dev:check to check your code before submitting.

  • Have you followed our code guidelines?
  • If my Pull Request does some changes/fixes/enhancements in the WebUI, I have inserted a screenshot of it.
  • If my Pull Request makes discovery/polling/yaml changes, I have added/updated test data.

Testers

If you would like to test this pull request then please run: ./scripts/github-apply <pr_id>, i.e ./scripts/github-apply 5926
After you are done testing, you can remove the changes with ./scripts/github-remove. If there are schema changes, you can ask on discord how to revert.

ottorei
ottorei previously approved these changes May 27, 2022
@@ -99,6 +99,9 @@
];

$data = explode("\n", $mysql);
if (sizeof($data) > 0 && strpos($data[0], 'ERROR') === 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the correct place to update is https://github.com/librenms/librenms/blob/master/includes/polling/functions.inc.php#L447

and change it to check if the string starts with the error message and then mark it as ERROR. That way we don't delete the additional context after the ERROR message.

The other option would be to set $state directly to the update_application() call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From how I understand that function there is no real way to save an error-message in the application table. Also the app_state would only hold 32 chars and the app_state 8... So that'd not fit with a full error-message.

We could modify the log_event function that is later called to include the error-message from the response, but i'm not sure what implication that would have for all the other apps really.

If I move it to the $status parameter of the update_application function it would not trigger the alerts (I think) since there is no comparison here https://github.com/librenms/librenms/blob/master/includes/polling/functions.inc.php#L474 that references $status. Should I add that so it also triggers when I update the $status?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm saying inside update_applicatuon(), status is detected. You need to improve the detection to see if it starts with one of the listed strings set the status to the exact string. The full message should be in the response field.

Hope that is clear enough, I'm on my phone.

Comment on lines 472 to 473
$data['app_status'] = $response;
$data['app_state'] = 'ERROR';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now would result in an Error-State even if the application returns an UNSUPPORTED or LEGACY. Is this something we can live with or should that also be set correctly?

Copy link
Member

@murrant murrant Jun 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to bring this up, but it seems as though you have bumped into some deep issues in the application code.

I just looked through the code a bit and currently it seems to confuse app_state and app_status a lot.

app_state should be a fixed set of values 'OK', 'ERROR', 'LEGACY', 'UNSUPPORTED', and 'UNKNOWN' (maybe more) to make alerting easy.

app_status seems mostly to be the output of the script, but not 100%

The update_application() function does not line up with that, confusing things.

/**
 * ...
 * @param  array  $app  app from the db, including app_id
 * @param  string  $response  This should be the return state of Application polling
 * @param  array  $metrics  an array of additional metrics to store in the database for alerting
 * @param  string  $status  This is the current value for alerting
 */
function update_application($app, $response, $metrics = [], $status = '')

It probably should be this:

/**
 * ...
 * @param  array  $app  app from the db, including app_id (Note: we should probably be using models instead of arrays, but that is too big of change)
 * @param  string  $status  This should be the output from the application script. Shown when an ERROR state is set.
 * @param  array  $metrics  a key value array of additional metrics to store in the database for alerting.  Must manually set _prev fields if you want them.
 * @param  string  $state The state for alerting. If not set, will attempt to detect from $status, or set to UNKNOWN
 */
function update_application($app, $status, $metrics = [], $state = 'UNKNOWN')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! That's what I was thinking... Should I rebuild that and make a sepearate PR based on this one that updates the behaviour for all other applications?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at doing this, and it did not work out nicely. Another idea may to have a separate function to set an error state to an application. Putting both in the same function might be why it is so messy.

@murrant murrant merged commit 8fdd1be into librenms:master Jun 10, 2022
@librenms-bot
Copy link

This pull request has been mentioned on LibreNMS Community. There might be relevant details there:

https://community.librenms.org/t/22-6-0-changelog/18968/1

@bennet-esyoil bennet-esyoil deleted the feat/mysql-error-state branch June 15, 2022 06:25
Npeca75 pushed a commit to Npeca75/librenms that referenced this pull request Jun 22, 2022
…brenms#13993)

* feat(apps/mysql): add error-state to non-responsive mysql-servers

* feat(alerting): add alert-rule for offline mysql-servers

* fixup! feat(alerting): add alert-rule for offline mysql-servers

* chore(style): replaced ticks

* fixup! chore(style): replaced ticks

* feat: added migration to make app_status field bigger

* feat: moved error-detection to the application_update function

* fixup! feat: moved error-detection to the application_update function

* chore: updated schema

* chore: fix styling

* Cleaner way to handle the error states

* regex should not include OK

Co-authored-by: Tony Murray <murraytony@gmail.com>
Npeca75 pushed a commit to Npeca75/librenms that referenced this pull request Jul 25, 2022
…brenms#13993)

* feat(apps/mysql): add error-state to non-responsive mysql-servers

* feat(alerting): add alert-rule for offline mysql-servers

* fixup! feat(alerting): add alert-rule for offline mysql-servers

* chore(style): replaced ticks

* fixup! chore(style): replaced ticks

* feat: added migration to make app_status field bigger

* feat: moved error-detection to the application_update function

* fixup! feat: moved error-detection to the application_update function

* chore: updated schema

* chore: fix styling

* Cleaner way to handle the error states

* regex should not include OK

Co-authored-by: Tony Murray <murraytony@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants