Skip to content

Latest commit

 

History

History
93 lines (71 loc) · 6.14 KB

File metadata and controls

93 lines (71 loc) · 6.14 KB
title tags metaDescription redirects
Alert on infrastructure processes
Infrastructure
Infrastructure alerts
Infrastructure alert conditions
How to create an alerting condition to notify you when processes have stopped reporting or are not running as expected.
/docs/infrastructure/new-relic-infrastructure/infrastructure-alert-conditions/alert-when-infrastructure-processes-stop-reporting
/docs/infrastructure/new-relic-infrastructure/infrastructure-alert-conditions/alert-when-infrastructure-processes-not-running
/docs/infrastructure/new-relic-infrastructure/infrastructure-alert-conditions/alert-infrastructure-process-running

Use New Relic infrastructure's Process running alert condition to be notified when a set of processes on your filtered hosts stop running for a configurable number of minutes. This is useful, for example, when:

  • Any of the processes on the hosts stop reporting
  • A process is running too many instances on one host

This feature's flexibility allows you to easily filter what hosts and processes to monitor and when to notify selected individuals or teams. In addition, the email notification includes links to help you quickly troubleshoot the situation.

By default, the infrastructure agent doesn't send [data about the operating system's processes](https://docs.newrelic.com/attribute-dictionary?attribute_name=&events_tids%5B%5D=8412). To enable the sending of process data set [`enable_process_metrics`](https://docs.newrelic.com/docs/infrastructure/install-configure-manage-infrastructure/configuration/infrastructure-configuration-settings#enable-process-metrics) to `true`. To fine-tune which processes you want to monitor, configure [`include_matching_metrics`](https://docs.newrelic.com/docs/infrastructure/install-configure-manage-infrastructure/configuration/infrastructure-configuration-settings#include-matching-metrics).

Examples [#features]

By applying filters to the hosts and processes that are important to your business, you can define alerting thresholds to decide when violations open and New Relic sends an email notification to you depending on the policy's incident preferences. These examples illustrate how to use infrastructure monitoring's Process running condition to monitor your processes.

**Problem:** Some load balancers and application servers work by running many worker processes in parallel. Here, for example, you may want an alert violation when fewer than eight processes are running for a service like gunicorn.
**Solution:** Depending on the situation, use any of these **Process running** thresholds options as needed:

* **More than** the defined number of processes are running
* **Exactly** the defined number of processes are running
* **Fewer than** the defined number of processes are running

<Collapser id="critical-services" title="Ensure that critical services run constantly"

**Problem:** A service, such as a database or application server, is expected to be running constantly on certain hosts, and you need to know when it has stopped.

**Solution:** Use the **No processes are running** (default) threshold.

<Collapser id="one-critical" title="Monitor startup for critical processes that require special attention"

**Problem:** You have processes requiring special attention due to security or potential performance impact.

**Solution:** Use the **At least one process is running** threshold with condition filters set to a username and specific executable so that New Relic can open a violation when the process is running.

<Collapser id="one-job-length" title="Make sure a job doesn't take too long"

**Problem:** You have a job that runs periodically, and you want to open a violation when it has been running longer than an expected number of minutes.

**Solution:** Use the **At least one process is running** threshold.

<Collapser id="multiple-runaway" title="Watch for runaway processes or configuration problems"

**Problem:** Sometimes problems with processes can be solved with changes to your configuration. For example, you have more than one Chef process running, and you may need to address an issue with how that service is configured.

**Solution:** Depending on the situation, use any of these **Process running** thresholds options as needed:

* **More than** the defined number of processes are running
* **Exactly** the defined number of processes are running
* **Fewer than** the defined number of processes are running

Create an infrastructure process running condition [#create-condition]

To define the Process running alert criteria:

  1. Follow standard procedures to create an infrastructure alert condition.
  2. Select Process running as the Alert type.
  3. Filter what hosts and processes you want the alert condition to apply to.
  4. Define the Critical threshold for triggering the alert notification: minimum 1 minute, default 5 minutes, maximum 60 minutes.

If you create the alert condition directly with infrastructure monitoring, New Relic will send an email notification when the defined threshold for the alert condition passes depending on the policy's incident preferences. Your alert policy defines which personnel or teams and which notification channels we use.