Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 68 additions & 0 deletions host/systemd/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
<!-- BEGIN_TF_DOCS -->
## Requirements

| Name | Version |
|------|---------|
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | ~> 1.5 |
| <a name="requirement_datadog"></a> [datadog](#requirement\_datadog) | >= 3.37 |
| <a name="requirement_null"></a> [null](#requirement\_null) | >= 3.1.0 |

## Providers

| Name | Version |
|------|---------|
| <a name="provider_datadog"></a> [datadog](#provider\_datadog) | >= 3.37 |

## Modules

No modules.

## Resources

| Name | Type |
|------|------|
| [datadog_monitor.systemd_unit](https://registry.terraform.io/providers/datadog/datadog/latest/docs/resources/monitor) | resource |

## Inputs

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_additional_tags"></a> [additional\_tags](#input\_additional\_tags) | Additional tags to apply to all monitors | `list(string)` | `[]` | no |
| <a name="input_alert_critical_priority"></a> [alert\_critical\_priority](#input\_alert\_critical\_priority) | Priority for alerts within critical threshold (P1-P5, uses monitor defaults if not specified) | `string` | `null` | no |
| <a name="input_alert_message"></a> [alert\_message](#input\_alert\_message) | Message to prepend to alert notifications | `string` | `"Alert"` | no |
| <a name="input_alert_nodata_priority"></a> [alert\_nodata\_priority](#input\_alert\_nodata\_priority) | Priority for alerts within warning threshold (P1-P5, uses monitor defaults if not specified) | `string` | `null` | no |
| <a name="input_base_tags"></a> [base\_tags](#input\_base\_tags) | Base tags to apply to all monitors | `list(string)` | `[]` | no |
| <a name="input_cost_center"></a> [cost\_center](#input\_cost\_center) | Cost Center of the monitored resource (leave blank to omit tag) | `string` | `null` | no |
| <a name="input_dashboard_link"></a> [dashboard\_link](#input\_dashboard\_link) | Dashboard link to include in message | `string` | `null` | no |
| <a name="input_env"></a> [env](#input\_env) | Environment the monitored resource is in (leave blank to omit tag) | `string` | `null` | no |
| <a name="input_evaluation_delay"></a> [evaluation\_delay](#input\_evaluation\_delay) | Monitor evaluation delay (see [https://docs.datadoghq.com/monitors/configuration/?tab=thresholdalert#set-alert-conditions](Datadog Docs)) | `number` | `900` | no |
| <a name="input_monitor_exclude_tags"></a> [monitor\_exclude\_tags](#input\_monitor\_exclude\_tags) | Tags to be excluded in the monitoring query. Specify in key:value format | `list(string)` | `[]` | no |
| <a name="input_monitor_include_tags"></a> [monitor\_include\_tags](#input\_monitor\_include\_tags) | Tags to be included in the monitoring query. Specify in key:value format | `list(string)` | `[]` | no |
| <a name="input_new_group_delay"></a> [new\_group\_delay](#input\_new\_group\_delay) | Delay in seconds before generating alerts for a new resource | `number` | `300` | no |
| <a name="input_notify_alert_override"></a> [notify\_alert\_override](#input\_notify\_alert\_override) | List of notifications for alerts in critical threshold (uses `notify_default` otherwise) | `list(string)` | `[]` | no |
| <a name="input_notify_crit_override"></a> [notify\_crit\_override](#input\_notify\_crit\_override) | List of notifications for 24x7 alerts in critical threshold (uses `notify_default` otherwise) | `list(string)` | `[]` | no |
| <a name="input_notify_default"></a> [notify\_default](#input\_notify\_default) | List of alert notifications (can be overridden based on alert type) | `list(string)` | n/a | yes |
| <a name="input_notify_no_data"></a> [notify\_no\_data](#input\_notify\_no\_data) | Alert if no matching data is found | `bool` | `false` | no |
| <a name="input_notify_nodata_override"></a> [notify\_nodata\_override](#input\_notify\_nodata\_override) | List of notifications for no data (uses `notify_default` otherwise) | `list(string)` | `[]` | no |
| <a name="input_notify_nonprod_override"></a> [notify\_nonprod\_override](#input\_notify\_nonprod\_override) | List of notifications for non-prod alerts in critical threshold (uses `notify_default` otherwise) | `list(string)` | `[]` | no |
| <a name="input_notify_prod_override"></a> [notify\_prod\_override](#input\_notify\_prod\_override) | List of notifications for 12x5 prod alerts in critical threshold (uses `notify_default` otherwise) | `list(string)` | `[]` | no |
| <a name="input_notify_recovery_override"></a> [notify\_recovery\_override](#input\_notify\_recovery\_override) | List of notifications for alert recovery (uses `notify_default` otherwise) | `list(string)` | `[]` | no |
| <a name="input_notify_warn_override"></a> [notify\_warn\_override](#input\_notify\_warn\_override) | List of notifications for alerts in warning threshold (uses `notify_default` otherwise) | `list(string)` | `[]` | no |
| <a name="input_renotify_interval"></a> [renotify\_interval](#input\_renotify\_interval) | Interval in minutes to re-send notifications about an alert | `number` | `60` | no |
| <a name="input_runbook_link"></a> [runbook\_link](#input\_runbook\_link) | Runbook link to include in message | `string` | `null` | no |
| <a name="input_service"></a> [service](#input\_service) | Service associated with the monitored resource (leave blank to omit tag) | `string` | `null` | no |
| <a name="input_systemd_unit_alert_enabled"></a> [systemd\_unit\_alert\_enabled](#input\_systemd\_unit\_alert\_enabled) | Enable or disable the Systemd service alert monitor | `bool` | `true` | no |
| <a name="input_systemd_unit_alert_threshold_critical"></a> [systemd\_unit\_alert\_threshold\_critical](#input\_systemd\_unit\_alert\_threshold\_critical) | Critical threshold for the Systemd service alert (count of services not running/failed) | `number` | `2` | no |
| <a name="input_systemd_unit_alert_threshold_warning"></a> [systemd\_unit\_alert\_threshold\_warning](#input\_systemd\_unit\_alert\_threshold\_warning) | Warning threshold for the Systemd service alert (count of services not running/failed) | `number` | `1` | no |
| <a name="input_systemd_unit_alert_use_message"></a> [systemd\_unit\_alert\_use\_message](#input\_systemd\_unit\_alert\_use\_message) | Whether to use the base message for the Systemd service alert | `bool` | `true` | no |
| <a name="input_systemd_units_filter"></a> [systemd\_units\_filter](#input\_systemd\_units\_filter) | List of specific systemd units (services) to monitor. If empty, monitors all. | `list(string)` | `[]` | no |
| <a name="input_team"></a> [team](#input\_team) | Team supporting the monitored resource (leave blank to omit tag) | `string` | `null` | no |
| <a name="input_timeout_h"></a> [timeout\_h](#input\_timeout\_h) | Auto-resolve alert in specified hours if condition no longer matches | `number` | `0` | no |
| <a name="input_title_prefix"></a> [title\_prefix](#input\_title\_prefix) | Prefix all alerts with specified value in brackets | `string` | `null` | no |
| <a name="input_title_suffix"></a> [title\_suffix](#input\_title\_suffix) | Suffix all alerts with specified value in parenthesis | `string` | `null` | no |
| <a name="input_warn_priority"></a> [warn\_priority](#input\_warn\_priority) | Priority for alerts with no data (P1-P5, uses monitor defaults if not specified) | `string` | `null` | no |

## Outputs

No outputs.
<!-- END_TF_DOCS -->
1 change: 1 addition & 0 deletions host/systemd/common.tf
34 changes: 34 additions & 0 deletions host/systemd/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
locals {
monitor_alert_default_priority = null
monitor_warn_default_priority = null
monitor_nodata_default_priority = null

title_prefix = var.title_prefix == null ? "" : "[${var.title_prefix}]"
title_suffix = var.title_suffix == null ? "" : " (${var.title_suffix})"
}

resource "datadog_monitor" "systemd_unit" {
count = var.systemd_unit_alert_enabled ? 1 : 0

name = join("", [local.title_prefix, "Systemd Unit Status - {{host.name}}", local.title_suffix])
type = "service check"
message = var.systemd_unit_alert_use_message ? local.query_alert_base_message : ""
tags = concat(local.common_tags, var.base_tags, var.additional_tags)

evaluation_delay = var.evaluation_delay
notify_no_data = false
notify_audit = false
renotify_interval = 60
timeout_h = var.timeout_h
include_tags = false
require_full_window = false

query = <<EOT
"systemd.unit.state"${local.service_filter}.by("host","unit").last(3).count_by_status()
EOT

monitor_thresholds {
critical = var.systemd_unit_alert_threshold_critical
warning = var.systemd_unit_alert_threshold_warning
}
}
41 changes: 41 additions & 0 deletions host/systemd/variables.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
variable "systemd_unit_alert_enabled" {
description = "Enable or disable the Systemd service alert monitor"
type = bool
default = true
}

variable "systemd_unit_alert_use_message" {
description = "Whether to use the base message for the Systemd service alert"
type = bool
default = true
}

variable "systemd_unit_alert_threshold_critical" {
description = "Critical threshold for the Systemd service alert (count of services not running/failed)"
type = number
default = 2
}

variable "systemd_unit_alert_threshold_warning" {
description = "Warning threshold for the Systemd service alert (count of services not running/failed)"
type = number
default = 1
}

variable "systemd_units_filter" {
description = "List of specific systemd units (services) to monitor. If empty, monitors all."
type = list(string)
default = []
}

variable "base_tags" {
description = "Base tags to apply to all monitors"
type = list(string)
default = []
}

variable "additional_tags" {
description = "Additional tags to apply to all monitors"
type = list(string)
default = []
}
1 change: 1 addition & 0 deletions host/systemd/versions.tf
2 changes: 1 addition & 1 deletion host/windows/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ resource "datadog_monitor" "windows_service" {

evaluation_delay = var.evaluation_delay
notify_no_data = false
renotify_interval = 0
renotify_interval = 60
notify_audit = false
timeout_h = var.timeout_h
include_tags = false
Expand Down
Loading