-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Uptime] simple monitor status alert fix for page duty and other connectors #87460
Conversation
@elasticmachine merge upstream |
Pinging @elastic/uptime (Team:uptime) |
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - everything seems to be working ok. Had a few questions about the future of the code we're adding here.
Smoke testing process
- I created connectors for PagerDuty, ServiceNow, Slack, and an extra server log connector to use for control purposes.
- I defined a local monitor that was up, and allowed data to accumulate.
- I killed the process that was running and waited for the alert to fire
Slack:
Server log:
PagerDuty:
ServiceNow: I must have misconfigured something. I got an error like:server log [13:48:19.750] [warning][actions][actions][plugins] action execution failure: .servicenow:01f234e0-5505-11eb-ab10-f71ddcabade6: service-now: an error occurred while running the action executor: [Action][ServiceNow]: Unable to create incident. Error: Request failed with status code 401
- I attempted to create a new user in ServiceNow.
- I re-started my service to resolve the current alert instance, and then stopped the service once more.
- After re-configuring the ServiceNow connector and testing it in the management portal, it was a "successful" run:
TeamsActionTypeId, | ||
WebhookActionTypeId, | ||
// eslint-disable-next-line @kbn/eslint/no-restricted-paths | ||
} from '../../../../actions/server/builtin_action_types'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we maybe ask Alerting to export these in the future to avoid busting the linter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah another idea was to make an an uptime endpoint for this.
incident: { | ||
short_description: MonitorStatusTranslations.defaultActionMessage, | ||
description: MonitorStatusTranslations.defaultActionMessage, | ||
impact: '2', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we plan to make these fields configurable for this, Jira, PagerDuty, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah make sense, let's follow up on this.
💚 Build SucceededMetrics [docs]Module Count
Async chunks
History
To update your PR or re-run it, just comment with: |
…ectors (elastic#87460) Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
…ectors (elastic#87460) Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
Summary
Fixes: #84459
This PR makes sure, that auto generated alert by one click uses correct params for each connectors.
Params validity is made sure by type checking.
Testing:
To test this PR, you will need to create a trial page duty and service now accounts and make sure alert works with those connectors.
After adding connectors, add those in uptime settings for default connectors. And then try creating alert from monitor list for few monitors and see if it get's triggered for down time.
Connect me if you want to test those connectors, i can provide my credentials for those connectors.
Note: i wanted to write functional tests but it turns out mocking them isn't that simple, considiring limited time frame, i have created a follow up issue
elastic/uptime#279