Skip to content

Commit

Permalink
Rewriting alerts best practices doc
Browse files Browse the repository at this point in the history
  • Loading branch information
kaileyhaynes committed Jul 12, 2021
1 parent fafc9ef commit 76acd90
Showing 1 changed file with 74 additions and 23 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -16,36 +16,86 @@ redirects:
- /docs/alerts-applied-intelligence/new-relic-alerts/get-started/alerts-best-practices
---

This document describes some best practices and examples to help you get the most out of your alerts configuration.
Improve your alert coverage by implementing the following recommendations.

Read on to learn the best practices for:
- [Policies](/docs/new-relic-solutions/best-practices-guides/alerts-applied-intelligence/alerts-best-practices/#policy-practices)
- [Notification channels](/docs/new-relic-solutions/best-practices-guides/alerts-applied-intelligence/alerts-best-practices/#channel-practices)
- [Incident preferences](/docs/new-relic-solutions/best-practices-guides/alerts-applied-intelligence/alerts-best-practices/#incident-practices)
- [Thresholds and violations](/docs/new-relic-solutions/best-practices-guides/alerts-applied-intelligence/alerts-best-practices/#threshold-practices)
- [Muting roles](/docs/new-relic-solutions/best-practices-guides/alerts-applied-intelligence/alerts-best-practices/#mute-practices)

<Callout variant="important">
Before reading this, we recommend you read [Alerts concepts and workflow](/docs/alerts/new-relic-alerts/getting-started/alert-policy-workflow).
Before reading this, we recommend you also read [Alerts concepts and workflow](/docs/alerts/new-relic-alerts/getting-started/alert-policy-workflow).
</Callout>

## Define policies for entities or people [#entities-people]
## Recommended alerts [#recommend-alerts]

Use [recommended practices](https://discuss.newrelic.com/t/announcing-alert-recommendations-for-apm-applications/154542) if you are new to Alerts or if you want suggestions that optimize your alert coverage.

## Policies [#policy-practices]

A policy is a container for alike conditions.

<Callout variant="tip">
Learn how to [create, edit, or find policies](/docs/alerts-applied-intelligence/new-relic-alerts/alert-policies/create-edit-or-find-alert-policy/) if you’re new to Alerts.
</Callout>

When designing your policies, consider:
Organize your policy by surrounding its conditions to a single entity and to the specific team that needs to be notified when a problem arises within that entity.

* The parts of your architecture that need personnel to be responsible for them
* The individuals who are responsible for one or more parts of your infrastructure
If a team is monitoring several groups of the same entity type, combine those entity clusters (like servers) together into one policy. This way, your team can be notified from one policy rather than navigating several policies at once.

An organization may have multiple entities monitored by APM, browser, infrastructure, and synthetic monitoring. Examples of considerations for different teams:
Consider your team’s role when assigning them to policies:

* **Software developers** may need [notifications](/docs/alerts/new-relic-alerts/managing-notification-channels/notification-channels-controlling-where-send-alerts) for both front-end and back-end performance, such as webpage response time and page load JavaScript errors.
* **Operations personnel** may need notifications for poor back-end performance, such as server memory and load averages.
* **The product owner** may need notifications for positive front-end performance, such as improved end user Apdex scores or sales being monitored in [dashboards](/docs/query-your-data/explore-query-data/dashboards/introduction-new-relic-one-dashboards).

By following these best practices, key personnel will receive actionable notifications for the metrics that matter to them, and overall, the organization will be able to identify and respond to trends or patterns more efficiently.
Refrain from creating policies that include multiple teams to keep policies centralized.

## Notification channels [#channel-practices]

Tailor notifications to the most useful channel and policy so you can avoid alert fatigue and help the right personnel receive and respond to incidents they care about in a systematic way.

<Callout variant="tip">
Learn how to [set up notification channels](/docs/alerts-applied-intelligence/new-relic-alerts/alert-notifications/notification-channels-control-where-send-alerts/) if you’re new to Alerts.
</Callout>

Notify teams and individuals who:
- Can resolve a problem when an incident arises
- Need to stay updated when an incident arises

For staying updated, select a notification channel that is less intrusive like email.

## Control how many notifications you get [#more-or-less]
For vital notifications and responding to the incident, select a notification channel that is immediate and reliable like PagerDuty or Slack. You are given a lot more control over who gets notified and when with PagerDuty and OpsGenie.

The more [conditions](/docs/alerts/new-relic-alerts-beta/configuring-alert-policies/define-alert-conditions) you define, the more [incidents](/docs/alerts/new-relic-alerts-beta/reviewing-alert-incidents/explore-incident-history) can be triggered and monitored. For example, your organization may need an alerting solution to accommodate extensive IT systems. Create policies with multiple conditions for [multiple monitored entities](/docs/alerts/new-relic-alerts-beta/configuring-alert-policies/select-product-targets-alert-condition) that notify you through one or more notification channels. Set [incident preference](/docs/alerts/new-relic-alerts/configuring-alert-policies/specify-when-new-relic-creates-incidents) to determine how violations lead to notifications.
Do not rely on email for quick notifications in case of mail delays. An email list will also bounce if one is no longer valid.

On the other hand, your organization may not need an extensive alerting structure. The fewer conditions you define and the more minimal your incident preference is, the fewer violations will be opened. For example, for a simple alerting solution, you could create a simple policy with only an email [notification channel](/docs/alerts/new-relic-alerts/managing-notification-channels/notification-channels-controlling-where-send-alerts).

## Set thresholds for conditions [#conditions]
## Incident preferences [#incident-practices]

Set the [thresholds](/docs/alerts/new-relic-alerts-beta/configuring-alert-policies/define-thresholds-trigger-alert) for your policy's conditions to meaningful levels for your environment. Here are some suggested guidelines:
Decide when you get incident notifications so you can respond to incidents when they happen.

<Callout variant="tip">
Learn more about your [incident preferences options](https://discuss.newrelic.com/t/relic-solution-alert-incident-preferences-are-the-key-to-consistent-alert-notifications/40867) if you’re new to Alerts.
</Callout>

The default incident preference setting combines all conditions within a policy into one incident. Change your default incident preference setting to better understand the exact scope and number of all violations.

Each organization will have different needs, and each policy within an organization will have different needs too. Ask your team two important questions when deciding your incident preferences:
- Do we want to be notified every time something goes wrong?
- Do we want to group all alike notifications together and be notified once?

When a policy and its conditions have a broader scope (like managing several entities), increase your incident preference sensitivity. You will need more notifications because two incidents will not necessarily relate to each other.

When a policy and its conditions have a focused scope (like managing one entity), opt for one notification. You will need less notifications when two incidents are related to each other or when the team is already notified and fixing an existing problem.

Decide how you get incident notifications by using our best [notification channel practices](/docs/new-relic-solutions/best-practices-guides/alerts-applied-intelligence/alerts-best-practices/#channel-practices).


## Thresholds and violations [#threshold-practices]

Set meaningful [threshold](/docs/alerts/new-relic-alerts-beta/configuring-alert-policies/define-thresholds-trigger-alert) levels to optimize Alerts to your business. Here are some suggested guidelines:

<table>
<thead>
Expand Down Expand Up @@ -106,21 +156,22 @@ Set the [thresholds](/docs/alerts/new-relic-alerts-beta/configuring-alert-polici
</tbody>
</table>

In most of our products (except Infrastructure), the color-coded [health status indicator](https://docs.newrelic.com/docs/accounts-partnerships/education/getting-started-new-relic/glossary#health-status) in the user interface changes as the alerting threshold escalates or returns to normal. This allows you to monitor a situation through our UI before a critical threshold passes, without needing to receive specific notifications about it.
In most of our products (except Infrastructure), the color-coded [health status indicator](/docs/using-new-relic/welcome-new-relic/get-started/glossary/#health-status) in the user interface changes as the alerting threshold escalates or returns to normal. This allows you to monitor a situation through our UI before a critical threshold passes, without needing to receive specific notifications about it.

There are two violation thresholds: critical (red) and warning (yellow). Define these thresholds with different criteria keeping in mind the suggestions above.

For example, you can define a critical (red) threshold that notifies you when the error percentage for your app is above 10 percent at least once in any five minute period. You can also define an optional warning (yellow) threshold with different criteria.

## Select notification channels [#channels]
<Callout variant='important'>
Warning violations do **not** open incidents. A critical violation can open incidents, but you must define that decision through your [incident preferences](/docs/new-relic-solutions/best-practices-guides/alerts-applied-intelligence/alerts-best-practices/#incident-practices).
</Callout>

You can create [notification channels](/docs/alerts/new-relic-alerts-beta/managing-notification-channels/notification-channels-controlling-where-send-alerts) first and then assign policies to them. You can also create policies first and then assign notification channels to them. This flexibility allows you to tailor who gets notified, using the method that is most useful to them.
## Muting roles [#mute-practices]

For example, you could:
Mute alerts during routined events such as maintenance or planned downtime.

* Identify your operations team's Slack channel as a general level of alerting, and use the on-call PagerDuty contact as an after-hours or escalated level of alerting.
* Create webhooks with customized messages for a variety of situations or personnel.
You can also silence another aspect of Alerts (like a policy, a specific entity, a condition) when needed. Incidents can still be opened, but you will not be notified.

<Callout variant="tip">
By tailoring notifications to the most useful channel and policy, you can avoid alert fatigue and help the right personnel receive and respond to incidents they care about in a systematic way.
<Callout variant='tip'>
Learn how to [create and manage muting roles](/docs/alerts-applied-intelligence/new-relic-alerts/alert-notifications/muting-rules-suppress-notifications/) if you’re new to Alerts.
</Callout>

## What's next?
Expand Down

0 comments on commit 76acd90

Please sign in to comment.